Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
29 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

a.kondratov
Hi Hackers,

I would like to propose a change, which allow CLUSTER, VACUUM FULL and
REINDEX to modify relation tablespace on the fly. Actually, all these
commands rebuild relation filenodes from the scratch, thus it seems
natural to allow specifying them a new location. It may be helpful, when
a server went out of disk, so you can attach new partition and perform
e.g. VACUUM FULL, which will free some space and move data to a new
location at the same time. Otherwise, you cannot complete VACUUM FULL
until you have up to x2 relation disk space on a single partition.

Please, find attached a patch, which extend CLUSTER, VACUUM FULL and
REINDEX with additional options:

REINDEX [ ( VERBOSE ) ] { INDEX | TABLE } name [ SET TABLESPACE
new_tablespace ]

CLUSTER [VERBOSE] table_name [ USING index_name ] [ SET TABLESPACE
new_tablespace ]
CLUSTER [VERBOSE] [ SET TABLESPACE new_tablespace ]

VACUUM ( FULL [, ...] ) [ SET TABLESPACE new_tablespace ] [
table_and_columns [, ...] ]
VACUUM FULL [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ SET TABLESPACE
new_tablespace ] [ table_and_columns [, ...] ]

Thereby I have a few questions:

1) What do you think about this concept in general?

2) Is SET TABLESPACE an appropriate syntax for this functionality? I
thought also about a plain TABLESPACE keyword, but it seems to be
misleading, and WITH (options) clause like in CREATE SUBSCRIPTION ...
WITH (options). So I preferred SET TABLESPACE, since the same syntax is
used currently in ALTER to change tablespace, but maybe someone will
have a better idea.

3) I was not able to update the lexer for VACUUM FULL to use SET
TABLESPACE after table_and_columns and completely get rid of
shift/reduce conflicts. I guess it happens, since table_and_columns is
optional and may be of variable length, but have no idea how to deal
with it. Any thoughts?


Regards

--
Alexey Kondratov

Postgres Professionalhttps://www.postgrespro.com
Russian Postgres Company


0001-Allow-CLUSTER-VACUUM-FULL-and-REINDEX-to-change-tablespace.patch (44K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

Robert Haas
On Mon, Dec 24, 2018 at 6:08 AM Alexey Kondratov
<[hidden email]> wrote:
> I would like to propose a change, which allow CLUSTER, VACUUM FULL and
> REINDEX to modify relation tablespace on the fly.

ALTER TABLE already has a lot of logic that is oriented towards being
able to do multiple things at the same time.  If we added CLUSTER,
VACUUM FULL, and REINDEX to that set, then you could, say, change a
data type, cluster, and change tablespaces all in a single SQL
command.

That would be cool, but probably a lot of work.  :-(

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Reply | Threaded
Open this post in threaded view
|

Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

Alvaro Herrera-9
On 2018-Dec-26, Robert Haas wrote:

> On Mon, Dec 24, 2018 at 6:08 AM Alexey Kondratov
> <[hidden email]> wrote:
> > I would like to propose a change, which allow CLUSTER, VACUUM FULL and
> > REINDEX to modify relation tablespace on the fly.
>
> ALTER TABLE already has a lot of logic that is oriented towards being
> able to do multiple things at the same time.  If we added CLUSTER,
> VACUUM FULL, and REINDEX to that set, then you could, say, change a
> data type, cluster, and change tablespaces all in a single SQL
> command.

That's a great observation.

> That would be cool, but probably a lot of work.  :-(

But is it?  ALTER TABLE is already doing one kind of table rewrite
during phase 3, and CLUSTER is just a different kind of table rewrite
(which happens to REINDEX), and VACUUM FULL is just a special case of
CLUSTER.  Maybe what we need is an ALTER TABLE variant that executes
CLUSTER's table rewrite during phase 3 instead of its ad-hoc table
rewrite.

As for REINDEX, I think it's valuable to move tablespace together with
the reindexing.  You can already do it with the CREATE INDEX
CONCURRENTLY recipe we recommend, of course; but REINDEX CONCURRENTLY is
not going to provide that, and it seems worth doing.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

Michael Paquier-2
On Wed, Dec 26, 2018 at 03:19:06PM -0300, Alvaro Herrera wrote:
> As for REINDEX, I think it's valuable to move tablespace together with
> the reindexing.  You can already do it with the CREATE INDEX
> CONCURRENTLY recipe we recommend, of course; but REINDEX CONCURRENTLY is
> not going to provide that, and it seems worth doing.

Even for plain REINDEX that seems useful.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

a.kondratov
In reply to this post by Alvaro Herrera-9
Hi,

Thank you all for replies.

>> ALTER TABLE already has a lot of logic that is oriented towards being
>> able to do multiple things at the same time.  If we added CLUSTER,
>> VACUUM FULL, and REINDEX to that set, then you could, say, change a
>> data type, cluster, and change tablespaces all in a single SQL
>> command.
> That's a great observation.

Indeed, I thought that ALTER TABLE executes all actions sequentially one
by one, e.g. in the case of

ALTER TABLE test_int CLUSTER ON test_int_idx, SET TABLESPACE test_tblspc;

it executes CLUSTER and THEN executes SET TABLESPACE. However, if I get
it right, ALTER TABLE is rather smart, so in such a case it follows the
steps:

1) Only saves new tablespace Oid during prepare phase 1 without actual work;

2) Only executes mark_index_clustered during phase 2, again without
actual work done;

3) And finally rewrites relation during phase 3, where CLUSTER and SET
TABLESPACE are effectively performed.

>> That would be cool, but probably a lot of work.  :-(
> But is it?  ALTER TABLE is already doing one kind of table rewrite
> during phase 3, and CLUSTER is just a different kind of table rewrite
> (which happens to REINDEX), and VACUUM FULL is just a special case of
> CLUSTER.  Maybe what we need is an ALTER TABLE variant that executes
> CLUSTER's table rewrite during phase 3 instead of its ad-hoc table
> rewrite.

According to the ALTER TABLE example above, it is already exist for CLUSTER.

> As for REINDEX, I think it's valuable to move tablespace together with
> the reindexing.  You can already do it with the CREATE INDEX
> CONCURRENTLY recipe we recommend, of course; but REINDEX CONCURRENTLY is
> not going to provide that, and it seems worth doing.

Maybe I am missing something, but according to the docs REINDEX
CONCURRENTLY does not exist yet, DROP then CREATE CONCURRENTLY is
suggested instead. Thus, we have to add REINDEX CONCURRENTLY first, but
it is a matter of different patch, I guess.

>> Even for plain REINDEX that seems useful.
>> --
>> Michael

To summarize:

1) Alvaro and Michael agreed, that REINDEX with tablespace move may be
useful. This is done in the patch attached to my initial email. Adding
REINDEX to ALTER TABLE as new action seems quite questionable for me and
not completely semantically correct. ALTER already looks bulky.

2) If I am correct, 'ALTER TABLE ... CLUSTER ON ..., SET TABLESPACE ...'
does exactly what I wanted to add to CLUSTER in my patch. So probably no
work is necessary here.

3) VACUUM FULL. It seems, that we can add special case 'ALTER TABLE ...
VACUUM FULL, SET TABLESPACE ...', which will follow relatively the same
path as with CLUSTER ON, but without any specific index. Relation should
be rewritten in the new tablespace during phase 3.

What do you think?


Regards

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company


Reply | Threaded
Open this post in threaded view
|

Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

Alvaro Herrera-9
On 2018-Dec-27, Alexey Kondratov wrote:

> To summarize:
>
> 1) Alvaro and Michael agreed, that REINDEX with tablespace move may be
> useful. This is done in the patch attached to my initial email. Adding
> REINDEX to ALTER TABLE as new action seems quite questionable for me and not
> completely semantically correct. ALTER already looks bulky.

Agreed on these points.

> 2) If I am correct, 'ALTER TABLE ... CLUSTER ON ..., SET TABLESPACE ...'
> does exactly what I wanted to add to CLUSTER in my patch. So probably no
> work is necessary here.

Well, ALTER TABLE CLUSTER ON does not really cluster the table; it only
indicates which index to cluster on, for the next time you run
standalone CLUSTER.  I think it would be valuable to have those ALTER
TABLE variants that rewrite the table do so using the cluster order, if
there is one, instead of the heap order, which is what it does today.

> 3) VACUUM FULL. It seems, that we can add special case 'ALTER TABLE ...
> VACUUM FULL, SET TABLESPACE ...', which will follow relatively the same path
> as with CLUSTER ON, but without any specific index. Relation should be
> rewritten in the new tablespace during phase 3.

Well, VACUUM FULL is just a table rewrite using the CLUSTER code that
doesn't cluster on any index: it just uses the heap order.  So in
essence it's the same as a table-rewriting ALTER TABLE.  In other words,
if you get the index-ordered table rewriting in ALTER TABLE, I don't
think this part adds anything useful; and it seems very confusing.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

Masahiko Sawada
On Thu, Dec 27, 2018 at 10:24 PM Alvaro Herrera
<[hidden email]> wrote:

>
> On 2018-Dec-27, Alexey Kondratov wrote:
>
> > To summarize:
> >
> > 1) Alvaro and Michael agreed, that REINDEX with tablespace move may be
> > useful. This is done in the patch attached to my initial email. Adding
> > REINDEX to ALTER TABLE as new action seems quite questionable for me and not
> > completely semantically correct. ALTER already looks bulky.
>
> Agreed on these points.

As an alternative idea, I think we can have a new ALTER INDEX variants
that rebuilds the index while moving tablespace, something like ALTER
INDEX ... REBUILD SET TABLESPACE ....

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Reply | Threaded
Open this post in threaded view
|

Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

Alexander Korotkov
On Fri, Dec 28, 2018 at 11:32 AM Masahiko Sawada <[hidden email]> wrote:

> On Thu, Dec 27, 2018 at 10:24 PM Alvaro Herrera
> <[hidden email]> wrote:
> >
> > On 2018-Dec-27, Alexey Kondratov wrote:
> >
> > > To summarize:
> > >
> > > 1) Alvaro and Michael agreed, that REINDEX with tablespace move may be
> > > useful. This is done in the patch attached to my initial email. Adding
> > > REINDEX to ALTER TABLE as new action seems quite questionable for me and not
> > > completely semantically correct. ALTER already looks bulky.
> >
> > Agreed on these points.
>
> As an alternative idea, I think we can have a new ALTER INDEX variants
> that rebuilds the index while moving tablespace, something like ALTER
> INDEX ... REBUILD SET TABLESPACE ....

+1

It seems the easiest way to have feature-full commands. If we put
functionality of CLUSTER and VACUUM FULL to ALTER TABLE, and put
functionality of REINDEX to ALTER INDEX, then CLUSTER, VACUUM FULL and
REINDEX would be just syntax sugar.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Reply | Threaded
Open this post in threaded view
|

Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

a.kondratov
Hi hackers,

On 2018-12-27 04:57, Michael Paquier wrote:
> On Wed, Dec 26, 2018 at 03:19:06PM -0300, Alvaro Herrera wrote:
>> As for REINDEX, I think it's valuable to move tablespace together with
>> the reindexing.  You can already do it with the CREATE INDEX
>> CONCURRENTLY recipe we recommend, of course; but REINDEX CONCURRENTLY
>> is
>> not going to provide that, and it seems worth doing.
>
> Even for plain REINDEX that seems useful.
>

I've rebased the patch and put it on the closest commitfest. It is
updated to allow user to do REINDEX CONCURRENTLY + SET TABLESPACE
altogether, since plain REINDEX CONCURRENTLY became available this year.

On 2019-06-07 21:27, Alexander Korotkov wrote:

> On Fri, Dec 28, 2018 at 11:32 AM Masahiko Sawada
> <[hidden email]> wrote:
>> On Thu, Dec 27, 2018 at 10:24 PM Alvaro Herrera
>> <[hidden email]> wrote:
>> >
>> > On 2018-Dec-27, Alexey Kondratov wrote:
>> >
>> > > To summarize:
>> > >
>> > > 1) Alvaro and Michael agreed, that REINDEX with tablespace move may be
>> > > useful. This is done in the patch attached to my initial email. Adding
>> > > REINDEX to ALTER TABLE as new action seems quite questionable for me and not
>> > > completely semantically correct. ALTER already looks bulky.
>> >
>> > Agreed on these points.
>>
>> As an alternative idea, I think we can have a new ALTER INDEX variants
>> that rebuilds the index while moving tablespace, something like ALTER
>> INDEX ... REBUILD SET TABLESPACE ....
>
> +1
>
> It seems the easiest way to have feature-full commands. If we put
> functionality of CLUSTER and VACUUM FULL to ALTER TABLE, and put
> functionality of REINDEX to ALTER INDEX, then CLUSTER, VACUUM FULL and
> REINDEX would be just syntax sugar.
>
I definitely bought into the idea of 'change a data type, cluster, and
change tablespace all in a single SQL command', but stuck with some
architectural questions, when it got to the code.

Currently, the only one kind of table rewrite is done by ALTER TABLE. It
is preformed by simply reading tuples one by one via
table_scan_getnextslot and inserting into the new table via tuple_insert
table access method (AM). In the same time, CLUSTER table rewrite is
implemented as a separated table AM relation_copy_for_cluster, which is
actually a direct link to the heap AM heapam_relation_copy_for_cluster.
Basically speaking, CLUSTER table rewrite happens 2 abstraction layers
lower than ALTER TABLE one. Furthermore, CLUSTER seems to be a
heap-specific AM and may be meaningless for some other storages, which
is even more important because of coming pluggable storages, isn't it?

Maybe I overly complicate the problem, but to perform a data type change
(or any other ALTER TABLE modification), cluster, and change tablespace
in a row we have to bring all this high-level stuff done by ALTER TABLE
to heapam_relation_copy_for_cluster. But is it even possible without
leaking abstractions?

I'm working toward adding REINDEX to ALTER INDEX, so it was possible to
do 'ALTER INDEX ... REINDEX CONCURRENTLY SET TABLESPACE ...', but ALTER
TABLE + CLUSTER/VACUUM FULL is quite questionable for me now.

Anyway, new patch, which adds SET TABLESPACE to REINDEX is attached and
this functionality seems really useful, so I will be very appreciate if
someone will take a look on it.


Regards
--
Alexey Kondratov
Postgres Professional https://www.postgrespro.com
The Russian Postgres Company

v1-0001-Allow-REINDEX-and-REINDEX-CONCURRENTLY-to-SET-TAB.patch (41K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

Surafel Temesgen
Hi Alexey
Here are a few comment
On Sat, Aug 31, 2019 at 11:54 PM <[hidden email]> wrote:
Hi hackers,


Anyway, new patch, which adds SET TABLESPACE to REINDEX is attached and
this functionality seems really useful, so I will be very appreciate if
someone will take a look on it.

* There are NOWAIT option in alter index, is there a reason not to have similar option here?
* SET TABLESPACE command is not documented
* There are multiple checking for whether the relation is temporary tables of other sessions, one in check_relation_is_movable and other independently

*+ char *tablespacename;

calling it new_tablespacename will make it consistent with other places

*The patch did't applied cleanly http://cfbot.cputube.org/patch_24_2269.log

regards

Surafel

Reply | Threaded
Open this post in threaded view
|

Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

a.kondratov
Hi Surafel,

Thank you for looking at the patch!

On 17.09.2019 14:04, Surafel Temesgen wrote:
> * There are NOWAIT option in alter index, is there a reason not to
> have similar option here?

Currently in Postgres SET TABLESPACE always comes with [ NOWAIT ]
option, so I hope it worth adding this option here for convenience.
Added in the new version.

> * SET TABLESPACE command is not documented

Actually, new_tablespace parameter was documented, but I've added a more
detailed section for SET TABLESPACE too.

> * There are multiple checking for whether the relation is temporary
> tables of other sessions, one in check_relation_is_movable and other
> independently

Yes, and there is a comment section in the code describing why. There is
a repeatable bunch of checks for verification whether relation movable
or not, so I put it into a separated function --
check_relation_is_movable. However, if we want to do only REINDEX, then
some of them are excess, so the only one RELATION_IS_OTHER_TEMP is used.
Thus, RELATION_IS_OTHER_TEMP is never executed twice, just different
code paths.

> *+ char *tablespacename;
>
> calling it new_tablespacename will make it consistent with other places
>

OK, changed, although I don't think it is important, since this is the
only one tablespace variable there.

> *The patch did't applied cleanly
> http://cfbot.cputube.org/patch_24_2269.log
>

Patch is rebased and attached with all the fixes described above.


Regards

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company


v2-0001-Allow-REINDEX-and-REINDEX-CONCURRENTLY-to-SET-TAB.patch (32K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

Michael Paquier-2
On Wed, Sep 18, 2019 at 03:46:20PM +0300, Alexey Kondratov wrote:
> Currently in Postgres SET TABLESPACE always comes with [ NOWAIT ] option, so
> I hope it worth adding this option here for convenience. Added in the new
> version.

It seems to me that it would be good to keep the patch as simple as
possible for its first version, and split it into two if you would
like to add this new option instead of bundling both together.  This
makes the review of one and the other more simple.  Anyway, regarding
the grammar, is SET TABLESPACE really our best choice here?  What
about:
- TABLESPACE = foo, in parenthesis only?
- Only using TABLESPACE, without SET at the end of the query?

SET is used in ALTER TABLE per the set of subqueries available there,
but that's not the case of REINDEX.

+-- check that all relations moved to new tablespace
+SELECT relname FROM pg_class
+WHERE reltablespace=(SELECT oid FROM pg_tablespace WHERE
spcname='regress_tblspace')
+AND relname IN ('regress_tblspace_test_tbl_idx');
+            relname
+-------------------------------
+ regress_tblspace_test_tbl_idx
+(1 row)
Just to check one relation you could use \d with the relation (index
or table) name.

-   if (RELATION_IS_OTHER_TEMP(iRel))
-       ereport(ERROR,
-               (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-                errmsg("cannot reindex temporary tables of other
-       sessions")))
I would keep the order of this operation in order with
CheckTableNotInUse().
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

a.kondratov
Hi Michael,

Thank you for your comments.

On 19.09.2019 7:43, Michael Paquier wrote:
> On Wed, Sep 18, 2019 at 03:46:20PM +0300, Alexey Kondratov wrote:
>> Currently in Postgres SET TABLESPACE always comes with [ NOWAIT ] option, so
>> I hope it worth adding this option here for convenience. Added in the new
>> version.
> It seems to me that it would be good to keep the patch as simple as
> possible for its first version, and split it into two if you would
> like to add this new option instead of bundling both together.  This
> makes the review of one and the other more simple.

OK, it makes sense. I would also prefer first patch as simple as
possible, but adding this NOWAIT option required only a few dozens of
lines, so I just bundled everything together. Anyway, I will split
patches if we decide to keep [ SET TABLESPACE ... [NOWAIT] ] grammar.

> Anyway, regarding
> the grammar, is SET TABLESPACE really our best choice here?  What
> about:
> - TABLESPACE = foo, in parenthesis only?
> - Only using TABLESPACE, without SET at the end of the query?
>
> SET is used in ALTER TABLE per the set of subqueries available there,
> but that's not the case of REINDEX.

I like SET TABLESPACE grammar, because it already exists and used both
in ALTER TABLE and ALTER INDEX. Thus, if we once add 'ALTER INDEX
index_name REINDEX SET TABLESPACE' (as was proposed earlier in the
thread), then it will be consistent with 'REINDEX index_name SET
TABLESPACE'. If we use just plain TABLESPACE, then it may be misleading
in the following cases:

- REINDEX TABLE table_name TABLESPACE tablespace_name
- REINDEX (TABLESPACE = tablespace_name) TABLE table_name

since it may mean 'Reindex all indexes of table_name, that stored in the
tablespace_name', doesn't it?

However, I have rather limited experience with Postgres, so I doesn't
insist.

> +-- check that all relations moved to new tablespace
> +SELECT relname FROM pg_class
> +WHERE reltablespace=(SELECT oid FROM pg_tablespace WHERE
> spcname='regress_tblspace')
> +AND relname IN ('regress_tblspace_test_tbl_idx');
> +            relname
> +-------------------------------
> + regress_tblspace_test_tbl_idx
> +(1 row)
> Just to check one relation you could use \d with the relation (index
> or table) name.

Yes, \d outputs tablespace name if it differs from pg_default, but it
shows other information in addition, which is not necessary here. Also
its output has more chances to be changed later, which may lead to the
failed tests. This query output is more or less stable and new relations
may be easily added to tests if we once add tablespace change to
CLUSTER/VACUUM FULL. I can change test to use \d, but not sure that it
would reduce test output length or will be helpful for a future tests
support.

> -   if (RELATION_IS_OTHER_TEMP(iRel))
> -       ereport(ERROR,
> -               (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> -                errmsg("cannot reindex temporary tables of other
> -       sessions")))
> I would keep the order of this operation in order with
> CheckTableNotInUse().

Sure, I haven't noticed that reordered these operations, thanks.


--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company



Reply | Threaded
Open this post in threaded view
|

Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

Robert Haas
In reply to this post by Michael Paquier-2
On Thu, Sep 19, 2019 at 12:43 AM Michael Paquier <[hidden email]> wrote:

> It seems to me that it would be good to keep the patch as simple as
> possible for its first version, and split it into two if you would
> like to add this new option instead of bundling both together.  This
> makes the review of one and the other more simple.  Anyway, regarding
> the grammar, is SET TABLESPACE really our best choice here?  What
> about:
> - TABLESPACE = foo, in parenthesis only?
> - Only using TABLESPACE, without SET at the end of the query?
>
> SET is used in ALTER TABLE per the set of subqueries available there,
> but that's not the case of REINDEX.

So, earlier in this thread, I suggested making this part of ALTER
TABLE, and several people seemed to like that idea. Did we have a
reason for dropping that approach?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

a.kondratov
On 19.09.2019 16:21, Robert Haas wrote:

> On Thu, Sep 19, 2019 at 12:43 AM Michael Paquier <[hidden email]> wrote:
>> It seems to me that it would be good to keep the patch as simple as
>> possible for its first version, and split it into two if you would
>> like to add this new option instead of bundling both together.  This
>> makes the review of one and the other more simple.  Anyway, regarding
>> the grammar, is SET TABLESPACE really our best choice here?  What
>> about:
>> - TABLESPACE = foo, in parenthesis only?
>> - Only using TABLESPACE, without SET at the end of the query?
>>
>> SET is used in ALTER TABLE per the set of subqueries available there,
>> but that's not the case of REINDEX.
> So, earlier in this thread, I suggested making this part of ALTER
> TABLE, and several people seemed to like that idea. Did we have a
> reason for dropping that approach?

If we add this option to REINDEX, then for 'ALTER TABLE tb_name action1,
REINDEX SET TABLESPACE tbsp_name, action3' action2 will be just a direct
alias to 'REINDEX TABLE tb_name SET TABLESPACE tbsp_name'. So it seems
practical to do this for REINDEX first.

The only one concern I have against adding REINDEX to ALTER TABLE in
this context is that it will allow user to write such a chimera:

ALTER TABLE tb_name REINDEX SET TABLESPACE tbsp_name, SET TABLESPACE
tbsp_name;

when they want to move both table and all the indexes. Because simple

ALTER TABLE tb_name REINDEX, SET TABLESPACE tbsp_name;

looks ambiguous. Should it change tablespace of table, indexes or both?


--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company



Reply | Threaded
Open this post in threaded view
|

Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

Michael Paquier-2
On Thu, Sep 19, 2019 at 05:40:41PM +0300, Alexey Kondratov wrote:
> On 19.09.2019 16:21, Robert Haas wrote:
>> So, earlier in this thread, I suggested making this part of ALTER
>> TABLE, and several people seemed to like that idea. Did we have a
>> reason for dropping that approach?

Personally, I don't find this idea very attractive as ALTER TABLE is
already complicated enough with all the subqueries we already support
in the command, all the logic we need to maintain to make combinations
of those subqueries in a minimum number of steps, and also the number
of bugs we have seen because of the amount of complication present.

> If we add this option to REINDEX, then for 'ALTER TABLE tb_name action1,
> REINDEX SET TABLESPACE tbsp_name, action3' action2 will be just a direct
> alias to 'REINDEX TABLE tb_name SET TABLESPACE tbsp_name'. So it seems
> practical to do this for REINDEX first.
>
> The only one concern I have against adding REINDEX to ALTER TABLE in this
> context is that it will allow user to write such a chimera:
>
> ALTER TABLE tb_name REINDEX SET TABLESPACE tbsp_name, SET TABLESPACE
> tbsp_name;
>
> when they want to move both table and all the indexes. Because simple
> ALTER TABLE tb_name REINDEX, SET TABLESPACE tbsp_name;
> looks ambiguous. Should it change tablespace of table, indexes or both?
Tricky question, but we don't change the tablespace of indexes when
using an ALTER TABLE, so I would say no on compatibility grounds.
ALTER TABLE has never touched the tablespace of indexes, and I don't
think that we should begin to do so.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

Jose Luis Tallon
On 20/9/19 4:06, Michael Paquier wrote:

> On Thu, Sep 19, 2019 at 05:40:41PM +0300, Alexey Kondratov wrote:
>> On 19.09.2019 16:21, Robert Haas wrote:
>>> So, earlier in this thread, I suggested making this part of ALTER
>>> TABLE, and several people seemed to like that idea. Did we have a
>>> reason for dropping that approach?
> Personally, I don't find this idea very attractive as ALTER TABLE is
> already complicated enough with all the subqueries we already support
> in the command, all the logic we need to maintain to make combinations
> of those subqueries in a minimum number of steps, and also the number
> of bugs we have seen because of the amount of complication present.

Yes, but please keep the other options: At it is, cluster, vacuum full
and reindex already rewrite the table in full; Being able to write the
result to a different tablespace than the original object was stored in
enables a whole world of very interesting possibilities.... including a
quick way out of a "so little disk space available that vacuum won't
work properly" situation --- which I'm sure MANY users will appreciate,
including me

> If we add this option to REINDEX, then for 'ALTER TABLE tb_name action1,
> REINDEX SET TABLESPACE tbsp_name, action3' action2 will be just a direct
> alias to 'REINDEX TABLE tb_name SET TABLESPACE tbsp_name'. So it seems
> practical to do this for REINDEX first.
>
> The only one concern I have against adding REINDEX to ALTER TABLE in this
> context is that it will allow user to write such a chimera:
>
> ALTER TABLE tb_name REINDEX SET TABLESPACE tbsp_name, SET TABLESPACE
> tbsp_name;
>
> when they want to move both table and all the indexes. Because simple
> ALTER TABLE tb_name REINDEX, SET TABLESPACE tbsp_name;
> looks ambiguous. Should it change tablespace of table, indexes or both?

Indeed.

IMHO, that form of the command should not allow that much flexibility...
even on the "principle of least surprise" grounds :S

That is, I'd restrict the ability to change (output) tablespace to the
"direct" form --- REINDEX name, VACUUM (FULL) name, CLUSTER name ---
whereas the ALTER table|index SET TABLESPACE would continue to work.

Now that I come to think of it, maybe saying "output" or "move to"
rather than "set tablespace" would make more sense for this variation of
the commands? (clearer, less prone to confusion)?

> Tricky question, but we don't change the tablespace of indexes when
> using an ALTER TABLE, so I would say no on compatibility grounds.
> ALTER TABLE has never touched the tablespace of indexes, and I don't
> think that we should begin to do so.

Indeed.


I might be missing something, but is there any reason to not *require* a
explicit transaction for the above multi-action commands? I mean, have
it be:

BEGIN;

ALTER TABLE tb_name SET TABLESPACE tbsp_name;    -- moves the table ....
but possibly NOT the indexes?

ALTER TABLE tb_name REINDEX [OUTPUT TABLESPACE tbsp_name];    --
REINDEX, placing the resulting index on tbsp_name instead of the
original one

COMMIT;

... and have the parser/planner combine the steps if it'd make sense (it
probably wouldn't in this example)?


Just my .02€


Thanks,

     / J.L.




Reply | Threaded
Open this post in threaded view
|

Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

Alvaro Herrera-9
In reply to this post by Robert Haas
On 2019-Sep-19, Robert Haas wrote:

> So, earlier in this thread, I suggested making this part of ALTER
> TABLE, and several people seemed to like that idea. Did we have a
> reason for dropping that approach?

Hmm, my own reading of that was to add tablespace changing abilities to
ALTER TABLE *in addition* to this patch, not instead of it.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

a.kondratov
On 20.09.2019 19:38, Alvaro Herrera wrote:
> On 2019-Sep-19, Robert Haas wrote:
>
>> So, earlier in this thread, I suggested making this part of ALTER
>> TABLE, and several people seemed to like that idea. Did we have a
>> reason for dropping that approach?
> Hmm, my own reading of that was to add tablespace changing abilities to
> ALTER TABLE *in addition* to this patch, not instead of it.

That was my understanding too.

On 20.09.2019 11:26, Jose Luis Tallon wrote:

> On 20/9/19 4:06, Michael Paquier wrote:
>> Personally, I don't find this idea very attractive as ALTER TABLE is
>> already complicated enough with all the subqueries we already support
>> in the command, all the logic we need to maintain to make combinations
>> of those subqueries in a minimum number of steps, and also the number
>> of bugs we have seen because of the amount of complication present.
>
> Yes, but please keep the other options: At it is, cluster, vacuum full
> and reindex already rewrite the table in full; Being able to write the
> result to a different tablespace than the original object was stored
> in enables a whole world of very interesting possibilities....
> including a quick way out of a "so little disk space available that
> vacuum won't work properly" situation --- which I'm sure MANY users
> will appreciate, including me
Yes, sure, that was my main motivation. The first message in the thread
contains a patch, which adds SET TABLESPACE support to all of CLUSTER,
VACUUM FULL and REINDEX. However, there came up an idea to integrate
CLUSTER/VACUUM FULL with ALTER TABLE and do their work + all the ALTER
TABLE stuff in a single table rewrite. I've dig a little bit into this
and ended up with some architectural questions and concerns [1]. So I
decided to start with a simple REINDEX patch.

Anyway, I've followed Michael's advice and split the last patch into two:

1) Adds all the main functionality, but with simplified 'REINDEX INDEX [
CONCURRENTLY ] ... [ TABLESPACE ... ]' grammar;

2) Adds a more sophisticated syntax with '[ SET TABLESPACE ... [ NOWAIT
] ]'.

Patch 1 contains all the docs and tests and may be applied/committed
separately or together with 2, which is fully optional.

Recent merge conflicts and reindex_index validations order are also
fixed in the attached version.

[1]
https://www.postgresql.org/message-id/6b2a5c4de19f111ef24b63428033bb67%40postgrespro.ru


Regards

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company


v3-0001-Allow-REINDEX-and-REINDEX-CONCURRENTLY-to-change-.patch (30K) Download Attachment
v3-0002-Use-SET-TABLESPACE-syntax-with-NOWAIT-option.patch (13K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

Steve Singer-2
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:       tested, failed
Spec compliant:           not tested
Documentation:            tested, failed

* I had to replace heap_open/close with table_open/close to get the
patch to compile against master

In the documentation

+     <para>
+      This specifies a tablespace, where all rebuilt indexes will be created.
+      Can be used only with <literal>REINDEX INDEX</literal> and
+      <literal>REINDEX TABLE</literal>, since the system indexes are not
+      movable, but <literal>SCHEMA</literal>, <literal>DATABASE</literal> or
+      <literal>SYSTEM</literal> very likely will has one.
+     </para>

I found the "SCHEMA,DATABASE or SYSTEM very likely will has one." portion confusing and would be inclined to remove it or somehow reword it.

Consider the following

-------------
 create index foo_bar_idx on foo(bar) tablespace pg_default;
CREATE INDEX
reindex=# \d foo
                Table "public.foo"
 Column |  Type   | Collation | Nullable | Default
--------+---------+-----------+----------+---------
 id     | integer |           | not null |
 bar    | text    |           |          |
Indexes:
    "foo_pkey" PRIMARY KEY, btree (id)
    "foo_bar_idx" btree (bar)

reindex=# reindex index foo_bar_idx tablespace tst1;
REINDEX
reindex=# reindex index foo_bar_idx tablespace pg_default;
REINDEX
reindex=# \d foo
                Table "public.foo"
 Column |  Type   | Collation | Nullable | Default
--------+---------+-----------+----------+---------
 id     | integer |           | not null |
 bar    | text    |           |          |
Indexes:
    "foo_pkey" PRIMARY KEY, btree (id)
    "foo_bar_idx" btree (bar), tablespace "pg_default"
--------

It is a bit strange that it says "pg_default" as the tablespace. If I do
this with a alter table to the table, moving the table back to pg_default
makes it look as it did before.

Otherwise the first patch seems fine.


With the second patch(for NOWAIT) I did the following

T1: begin;
T1: insert into foo select generate_series(1,1000);
T2: reindex index foo_bar_idx set tablespace tst1 nowait;

T2 is waiting for a lock. This isn't what I would expect.

The new status of this patch is: Waiting on Author
12