propagating replica identity to partitions

classic Classic list List threaded Threaded
29 messages Options
12
Reply | Threaded
Open this post in threaded view
|

propagating replica identity to partitions

Alvaro Herrera-9
Hello

If you do ALTER TABLE .. REPLICA IDENTITY to a partitioned table, the
command operates on the parent table itself and does not propagate to
partitions.  Why is this?  Maybe not recursing was the right call when
we only had regular inheritance (back in 9.4), but since partitioned
tables got introduced, I think it is completely the other way around:
not recursing is an usability fail.

At the same time, I think that psql failing to display the replica
identity for partitioned tables is just an oversight and not designed
in.

I propose to change the behavior to:

1. When replica identity is changed on a partitioned table, all partitions
   are updated also.  Do we need to care about regular inheritance?
   My inclination is not to touch those; this'd become the first case
   in ATPrepCmd that recurses on partitioning but not inheritance.

2. When a partition is created, the replica identity is set to the
   same that the parent table has.  If it's type index, figure out the
   corresponding index in the partition, set that.  If the index doesn't
   exist, raise an error (i.e. replica identity cannot be set to an
   index until it has propagated to all children).

3. psql should display replica identity for partitioned tables.

Thoughts?

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: propagating replica identity to partitions

Alvaro Herrera-9
> I propose to change the behavior to:

... this patch.

I'll now look more carefully at the cases involving indexes; thus far I
was looking at the ones using FULL.  Those seem to work as intended.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

0001-index_get_partition.patch (4K) Download Attachment
0002-Propagate-replica-identity-to-partitions.patch (15K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: propagating replica identity to partitions

Alvaro Herrera-9
On 2019-Feb-04, Alvaro Herrera wrote:

> I'll now look more carefully at the cases involving indexes; thus far I
> was looking at the ones using FULL.  Those seem to work as intended.

Yeah, that didn't work so well -- it was trying to propagate the command
verbatim to each partition, and obviously the index names did not match.
So this subcommand has to reimplement the recursion internally, as in
the attached.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

v2-0001-index_get_partition.patch (4K) Download Attachment
v2-0002-Propagate-replica-identity-to-partitions.patch (21K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: propagating replica identity to partitions

Alvaro Herrera-9
Actually, index-based replica identities failed in pg_dump: we first
create the index ONLY on the partitioned table (marking it as invalid),
so when we immediately do the ALTER TABLE/REPLICA IDENTITY, it rejects
the invalid index.  If we change the code to allow invalid indexes on
partitioned tables to become replica identities, we hit the problem that
the index didn't exist when processing the partition list.  In order to
fix that, I added a flag so that partitions are allowed not to have the
index, in hopes that the missing index are created in subsequent
commands; those indexes should become valid & identity afterwards.

There's a small emerging problem, which is that if you have an invalid
index marked as replica identity, you cannot create any more partitions;
the reason is that we want to propagate the replica identity to the
partition, but the index is not there (since invalid indexes are not
propagated).  I don't think this case is worth supporting; it can be
fixed but requires some work[1].

New patch attached.

[1] In DefineRelation, we obtain the list of indexes to clone by
RelationGetIndexList, which ignores invalid indexes.  In order to
implement this we'd need to include invalid indexes in that list
(possibly by just not using RelationGetIndexList.)

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

v3-0001-index_get_partition.patch (4K) Download Attachment
v3-0002-Propagate-replica-identity-to-partitions.patch (23K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: propagating replica identity to partitions

Dmitry Dolgov
> On Tue, Feb 5, 2019 at 12:54 AM Alvaro Herrera <[hidden email]> wrote:
>
> Actually, index-based replica identities failed in pg_dump: we first
> create the index ONLY on the partitioned table (marking it as invalid),
> so when we immediately do the ALTER TABLE/REPLICA IDENTITY, it rejects
> the invalid index.  If we change the code to allow invalid indexes on
> partitioned tables to become replica identities, we hit the problem that
> the index didn't exist when processing the partition list.  In order to
> fix that, I added a flag so that partitions are allowed not to have the
> index, in hopes that the missing index are created in subsequent
> commands; those indexes should become valid & identity afterwards.
>
> There's a small emerging problem, which is that if you have an invalid
> index marked as replica identity, you cannot create any more partitions;
> the reason is that we want to propagate the replica identity to the
> partition, but the index is not there (since invalid indexes are not
> propagated).  I don't think this case is worth supporting; it can be
> fixed but requires some work[1].
>
> New patch attached.

Could there be a race condition somewhere? The idea and the code looks
reasonable, but when I'm testing ALTER TABLE ... REPLICA IDENTITY with this
patch and concurrent partition creation, I've got the following error on ATTACH
PARTITION:

ERROR:  42P17: replica index does not exist in partition "test373"
LOCATION:  MatchReplicaIdentity, tablecmds.c:15018

This error seems unstable, other time I've got a deadlock. And I don't observe
this behaviour on the master.

Reply | Threaded
Open this post in threaded view
|

Re: propagating replica identity to partitions

Alvaro Herrera-9
On 2019-Feb-07, Dmitry Dolgov wrote:

> Could there be a race condition somewhere? The idea and the code looks
> reasonable, but when I'm testing ALTER TABLE ... REPLICA IDENTITY with this
> patch and concurrent partition creation, I've got the following error on ATTACH
> PARTITION:
>
> ERROR:  42P17: replica index does not exist in partition "test373"
> LOCATION:  MatchReplicaIdentity, tablecmds.c:15018
>
> This error seems unstable, other time I've got a deadlock. And I don't observe
> this behaviour on the master.

Clearly there is a problem somewhere.  I'll investigate.  Thanks for
testing.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: propagating replica identity to partitions

Alvaro Herrera-9
In reply to this post by Dmitry Dolgov
On 2019-Feb-07, Dmitry Dolgov wrote:

> Could there be a race condition somewhere? The idea and the code looks
> reasonable, but when I'm testing ALTER TABLE ... REPLICA IDENTITY with this
> patch and concurrent partition creation, I've got the following error on ATTACH
> PARTITION:
>
> ERROR:  42P17: replica index does not exist in partition "test373"
> LOCATION:  MatchReplicaIdentity, tablecmds.c:15018
>
> This error seems unstable, other time I've got a deadlock. And I don't observe
> this behaviour on the master.

Can you share your reproducer?

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: propagating replica identity to partitions

Dmitry Dolgov
> On Fri, Feb 15, 2019 at 10:02 PM Alvaro Herrera <[hidden email]> wrote:

>
> On 2019-Feb-07, Dmitry Dolgov wrote:
>
> > Could there be a race condition somewhere? The idea and the code looks
> > reasonable, but when I'm testing ALTER TABLE ... REPLICA IDENTITY with this
> > patch and concurrent partition creation, I've got the following error on ATTACH
> > PARTITION:
> >
> > ERROR:  42P17: replica index does not exist in partition "test373"
> > LOCATION:  MatchReplicaIdentity, tablecmds.c:15018
> >
> > This error seems unstable, other time I've got a deadlock. And I don't observe
> > this behaviour on the master.
>
> Can you share your reproducer?
Sure, I was running concurrently the attached script, that creates a chain of
nested partitions test1,test2,..., and in a separate session:

    alter table test1 replica identity full;

Checking this time, I've got from the script:

    ERROR:  40P01: deadlock detected
    DETAIL:  Process 10547 waits for AccessShareLock on relation 30449
of database
    29898; blocked by process 9714.
    Process 9714 waits for AccessExclusiveLock on relation 30454 of
database 29898;
    blocked by process 10547.
    HINT:  See server log for query details.
    LOCATION:  DeadLockReport, deadlock.c:1140
    Time: 1001.917 ms (00:01.002)

nested_partitions.sh (422 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: propagating replica identity to partitions

Robert Haas
In reply to this post by Alvaro Herrera-9
On Mon, Feb 4, 2019 at 11:30 AM Alvaro Herrera <[hidden email]> wrote:
> If you do ALTER TABLE .. REPLICA IDENTITY to a partitioned table, the
> command operates on the parent table itself and does not propagate to
> partitions.  Why is this?  Maybe not recursing was the right call when
> we only had regular inheritance (back in 9.4), but since partitioned
> tables got introduced, I think it is completely the other way around:
> not recursing is an usability fail.

This sounds an awful like the TABLESPACE case.  I wonder how many more
similar things there are.

It's not unreasonable to use the parent's REPLICA IDENTITY setting as
the default for new partitions, much as we now do for the TABLESPACE,
because the parent's replica identity is otherwise without meaning.
But I'm less convinced that it's reasonable to have changing the
parent's replica identity recurse to existing children, because:

1. That's different from the TABLESPACE case.  We shouldn't just treat
each new instance of this problem as a green field where we can pick
any old behavior at random; it should be consistent as far as
reasonable, and

2. It confuses a change to the default for new partitions with a
change to the value for existing partitions.  Say that you've got 5
existing partitions that use one setting, but now you want new
partitions to use a different setting.  You can't get that with your
proposed semantics, because trying to change the default will change
all the existing partitions to the new value also, even though that's
not what you wanted.

We should really, really try to figure out all of the things that are
like this -- a property that is meaningless for the partitioned table
itself but may serve as a useful default for its children -- and try
to make them all work the same, ideally in one release, rather than
changing them at different times, back-patching sometimes, and having
no consistency in the details.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Reply | Threaded
Open this post in threaded view
|

Re: propagating replica identity to partitions

Alvaro Herrera-9
On 2019-Feb-19, Robert Haas wrote:

> It's not unreasonable to use the parent's REPLICA IDENTITY setting as
> the default for new partitions, much as we now do for the TABLESPACE,
> because the parent's replica identity is otherwise without meaning.
> But I'm less convinced that it's reasonable to have changing the
> parent's replica identity recurse to existing children, because:
>
> 1. That's different from the TABLESPACE case.  We shouldn't just treat
> each new instance of this problem as a green field where we can pick
> any old behavior at random; it should be consistent as far as
> reasonable,

Maybe we should be using the inheritance marker in the command to note
whether to recurse to partitions?  That is, if you say
  ALTER TABLE ONLY parent SET REPLICA IDENTITY
then we don't recurse and just change the parent table and future
partitions, whereas if you leave out the ONLY then it does affect
existing partitions.

However, I think TABLESPACE and any other property that involves
rewriting tables wholesale can reasonably be expected to behave
differently (w.r.t. recursing) from commands that just modify the
catalogs.  I think we already have such a distinction somewhere.

EXPLAIN ALTER TABLE would sure be handy :-)

> 2. It confuses a change to the default for new partitions with a
> change to the value for existing partitions.  Say that you've got 5
> existing partitions that use one setting, but now you want new
> partitions to use a different setting.  You can't get that with your
> proposed semantics, because trying to change the default will change
> all the existing partitions to the new value also, even though that's
> not what you wanted.

If we make sure to heed ONLY (and I admit to not thinking about it in my
original patch) then I think this angle is covered.

> We should really, really try to figure out all of the things that are
> like this -- a property that is meaningless for the partitioned table
> itself but may serve as a useful default for its children -- and try
> to make them all work the same, ideally in one release, rather than
> changing them at different times, back-patching sometimes, and having
> no consistency in the details.

I have no argument against somebody doing that, but I don't think I have
the resources to do in in time for 12; on the other hand, leaving a
known misfeature unfixed because nobody has looked for hypothetical
similar bugs would be bad.

I'm not proposing to back-patch this, but I would love it if I can
borrow somebody's time machine to retroactively fix it for 11.0.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: propagating replica identity to partitions

Robert Haas
On Tue, Feb 19, 2019 at 3:40 PM Alvaro Herrera <[hidden email]> wrote:

> Maybe we should be using the inheritance marker in the command to note
> whether to recurse to partitions?  That is, if you say
>   ALTER TABLE ONLY parent SET REPLICA IDENTITY
> then we don't recurse and just change the parent table and future
> partitions, whereas if you leave out the ONLY then it does affect
> existing partitions.
>
> However, I think TABLESPACE and any other property that involves
> rewriting tables wholesale can reasonably be expected to behave
> differently (w.r.t. recursing) from commands that just modify the
> catalogs.  I think we already have such a distinction somewhere.

I don't really follow why that should be different, or why the user
should be expected to know or care whether a particular property
involves rewriting.

> I have no argument against somebody doing that, but I don't think I have
> the resources to do in in time for 12; on the other hand, leaving a
> known misfeature unfixed because nobody has looked for hypothetical
> similar bugs would be bad.

Oh, come on.  If you don't have time to study the issue and come up
with a plan that can at least in theory be applied to all similar
cases in a principled way, whether or not you have time to apply it to
all of those cases, then you have no business committing anything at
all.  You're basically saying that you don't have time to do this
right, so you're just going to do something at random and hope it
doesn't create too many problems later.  That's terrible.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Reply | Threaded
Open this post in threaded view
|

Re: propagating replica identity to partitions

Alvaro Herrera-9
On 2019-Feb-19, Robert Haas wrote:

> On Tue, Feb 19, 2019 at 3:40 PM Alvaro Herrera <[hidden email]> wrote:
> > Maybe we should be using the inheritance marker in the command to note
> > whether to recurse to partitions?  That is, if you say
> >   ALTER TABLE ONLY parent SET REPLICA IDENTITY
> > then we don't recurse and just change the parent table and future
> > partitions, whereas if you leave out the ONLY then it does affect
> > existing partitions.
> >
> > However, I think TABLESPACE and any other property that involves
> > rewriting tables wholesale can reasonably be expected to behave
> > differently (w.r.t. recursing) from commands that just modify the
> > catalogs.  I think we already have such a distinction somewhere.
>
> I don't really follow why that should be different, or why the user
> should be expected to know or care whether a particular property
> involves rewriting.

OK, let me concede that point -- it's not rewriting that makes things
act differently, but rather TABLESPACE (as well as some other things)
behave that way.  ALTER TABLE ... SET DATA TYPE is the obvious
counterexample.

The Notes section of ALTER TABLE says:

: The actions for identity columns (ADD GENERATED, SET etc., DROP IDENTITY), as
: well as the actions TRIGGER, CLUSTER, OWNER, and TABLESPACE never recurse to
: descendant tables; that is, they always act as though ONLY were specified.
: Adding a constraint recurses only for CHECK constraints that are not marked NO
: INHERIT.

Since REPLICA IDENTITY does not appear in this list, the documented
behavior is to recurse, per the description of the "name" parameter:

: The name (optionally schema-qualified) of an existing table to
: alter. If ONLY is specified before the table name, only that table
: is altered. If ONLY is not specified, the table and all its
: descendant tables (if any) are altered. Optionally, * can be
: specified after the table name to explicitly indicate that
: descendant tables are included.

I didn't come up with this on my own, as you imply.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: propagating replica identity to partitions

Robert Haas
On Tue, Feb 19, 2019 at 5:41 PM Alvaro Herrera <[hidden email]> wrote:

> OK, let me concede that point -- it's not rewriting that makes things
> act differently, but rather TABLESPACE (as well as some other things)
> behave that way.  ALTER TABLE ... SET DATA TYPE is the obvious
> counterexample.
>
> The Notes section of ALTER TABLE says:
>
> : The actions for identity columns (ADD GENERATED, SET etc., DROP IDENTITY), as
> : well as the actions TRIGGER, CLUSTER, OWNER, and TABLESPACE never recurse to
> : descendant tables; that is, they always act as though ONLY were specified.
> : Adding a constraint recurses only for CHECK constraints that are not marked NO
> : INHERIT.

I don't see that in the NOTES section for ALTER TABLE.  Are you
looking at some patch, or at master?

More generally, our documentation in this area seems to be somewhat
lacking.  For instance, the TABLESPACE section of the ALTER TABLE
documentation appears to make no mention of what exactly the behavior
is when applied to a partition table.  It doesn't seem sufficient to
simply say "well, it doesn't recurse," because that would imply that
it doesn't do anything at all, which isn't the case.

> Since REPLICA IDENTITY does not appear in this list, the documented
> behavior is to recurse, per the description of the "name" parameter:
>
> : The name (optionally schema-qualified) of an existing table to
> : alter. If ONLY is specified before the table name, only that table
> : is altered. If ONLY is not specified, the table and all its
> : descendant tables (if any) are altered. Optionally, * can be
> : specified after the table name to explicitly indicate that
> : descendant tables are included.
>
> I didn't come up with this on my own, as you imply.

Fair enough.  I don't think it's entirely useful to think about this
in terms of which operations do and don't recurse; the question in my
mind is WHY some things recurse and other things don't, and what will
be easiest for users to understand.  Obviously any property where the
parents and children have to match, like adding a column or changing a
data type, has to always recurse; nothing else is sensible.  For
others, it seems we have a choice.  It's unclear to me why we'd choose
to make REPLICA IDENTITY recurse by default and, say, OWNER not
recurse.  In both cases, the default assumption is presumably that the
whole partitioning hierarchy should match, but in a particular case a
user could want to do something else.  Consequently, I don't
understand why a user would want one of those operations to descend to
children by default and the other not.  It feels like we're choosing
the behavior in individual cases, as Tom would say, with the aid of a
dartboard.  Maybe there's some coherent principle here that I'm just
missing.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Reply | Threaded
Open this post in threaded view
|

Re: propagating replica identity to partitions

Alvaro Herrera-9
On 2019-Feb-20, Robert Haas wrote:

> On Tue, Feb 19, 2019 at 5:41 PM Alvaro Herrera <[hidden email]> wrote:
> > OK, let me concede that point -- it's not rewriting that makes things
> > act differently, but rather TABLESPACE (as well as some other things)
> > behave that way.  ALTER TABLE ... SET DATA TYPE is the obvious
> > counterexample.
> >
> > The Notes section of ALTER TABLE says:
> >
> > : The actions for identity columns (ADD GENERATED, SET etc., DROP IDENTITY), as
> > : well as the actions TRIGGER, CLUSTER, OWNER, and TABLESPACE never recurse to
> > : descendant tables; that is, they always act as though ONLY were specified.
> > : Adding a constraint recurses only for CHECK constraints that are not marked NO
> > : INHERIT.
>
> I don't see that in the NOTES section for ALTER TABLE.  Are you
> looking at some patch, or at master?

I was looking here:
https://www.postgresql.org/docs/11/sql-altertable.html

> More generally, our documentation in this area seems to be somewhat
> lacking.  For instance, the TABLESPACE section of the ALTER TABLE
> documentation appears to make no mention of what exactly the behavior
> is when applied to a partition table.  It doesn't seem sufficient to
> simply say "well, it doesn't recurse," because that would imply that
> it doesn't do anything at all, which isn't the case.

That's true.  Maybe we can add a short blurb in the stanza for the SET
TABLESPACE form in the Description section.  It currently says:

:   This form changes the table's tablespace to the specified tablespace
:   and moves the data file(s) associated with the table to the new
:   tablespace. Indexes on the table, if any, are not moved; but they
:   can be moved separately with additional SET TABLESPACE commands. All
:   tables in the current database in a tablespace can be moved by using
:   the ALL IN TABLESPACE form, which will lock all tables to be moved
:   first and then move each one. This form also supports OWNED BY,
:   which will only move tables owned by the roles specified. If the
:   NOWAIT option is specified then the command will fail if it is
:   unable to acquire all of the locks required immediately. Note that
:   system catalogs are not moved by this command, use ALTER DATABASE or
:   explicit ALTER TABLE invocations instead if desired. The
:   information_schema relations are not considered part of the system
:   catalogs and will be moved. See also CREATE TABLESPACE.

Maybe the ALL IN TABLESPACE and OWNED BY sub-forms should be split to a
separate para.  I suggest:

:   This form changes the table's tablespace to the specified tablespace
:   and moves the data file(s) associated with the table to the new
:   tablespace. Indexes on the table, if any, are not moved; but they
:   can be moved separately with additional SET TABLESPACE commands.
:   When applied to a partitioned table, nothing is moved, but any
:   partitions created afterwards with CREATE TABLE PARTITION OF
:   will use that tablespace.
:
:   All
:   tables in the current database in a tablespace can be moved by using
:   the ALL IN TABLESPACE form, which will lock all tables to be moved
:   first and then move each one. This form also supports OWNED BY,
:   which will only move tables owned by the roles specified. If the
:   NOWAIT option is specified then the command will fail if it is
:   unable to acquire all of the locks required immediately. Note that
:   system catalogs are not moved by this command, use ALTER DATABASE or
:   explicit ALTER TABLE invocations instead if desired. The
:   information_schema relations are not considered part of the system
:   catalogs and will be moved. See also CREATE TABLESPACE.


> > Since REPLICA IDENTITY does not appear in this list, the documented
> > behavior is to recurse, per the description of the "name" parameter:
> >
> > : The name (optionally schema-qualified) of an existing table to
> > : alter. If ONLY is specified before the table name, only that table
> > : is altered. If ONLY is not specified, the table and all its
> > : descendant tables (if any) are altered. Optionally, * can be
> > : specified after the table name to explicitly indicate that
> > : descendant tables are included.
> >
> > I didn't come up with this on my own, as you imply.
>
> Fair enough.  I don't think it's entirely useful to think about this
> in terms of which operations do and don't recurse; the question in my
> mind is WHY some things recurse and other things don't, and what will
> be easiest for users to understand.

I think the reason tablespace was made not to recurse was to avoid
acquiring access exclusive lock on too many tables at once, but I'm not
sure.  This is very old behavior.

> Obviously any property where the
> parents and children have to match, like adding a column or changing a
> data type, has to always recurse; nothing else is sensible.  For
> others, it seems we have a choice.  It's unclear to me why we'd choose
> to make REPLICA IDENTITY recurse by default and, say, OWNER not
> recurse.

Ah.  I did argue that OWNER should recurse:
https://postgr.es/m/20171017163203.uw7hmlqonidlfeqj@...
but it was too late then to change it for pg10.  We can change it now,
surely.

> In both cases, the default assumption is presumably that the
> whole partitioning hierarchy should match, but in a particular case a
> user could want to do something else.  Consequently, I don't
> understand why a user would want one of those operations to descend to
> children by default and the other not.

I agree that OWNER TO should recurse for partitioned tables.  I'm -0 on
changing it for legacy inheritance, but if the majority is to change it
for both, let's do that.

> It feels like we're choosing the behavior in individual cases, as Tom
> would say, with the aid of a dartboard.  Maybe there's some coherent
> principle here that I'm just missing.

I don't think that's a thing we're doing now.  Rather we all did it as a
group years ago, and we're now finding that some of the darts landed in
the wrong spot of the board.  I don't disagree with fixing all the ones
we know about (I volunteer to do that), but I don't want to conduct a
full audit of tablecmds.c.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: propagating replica identity to partitions

Simon Riggs
On Wed, 20 Feb 2019 at 17:38, Alvaro Herrera <[hidden email]> wrote:
 
> Fair enough.  I don't think it's entirely useful to think about this
> in terms of which operations do and don't recurse; the question in my
> mind is WHY some things recurse and other things don't, and what will
> be easiest for users to understand.

I think the reason tablespace was made not to recurse was to avoid
acquiring access exclusive lock on too many tables at once, but I'm not
sure.  This is very old behavior.

-1 for changing that; here's why:

Partitioning is designed to support very large tables.

Since the table is very large there is a valid use case for moving older partitions to other tablespaces, one at a time.

If we moved all partitions at once, while holding AccessExclusiveLock, it would a) likely fail with out of space errors, b) if you are unlucky enough for it to succeed it would lock the table for a ridiculously long time. The main use case for changing the tablespace is to define where new partitions should be created. So in this specific case only, recursion makes no sense.
 
> Obviously any property where the
> parents and children have to match, like adding a column or changing a
> data type, has to always recurse; nothing else is sensible.  For
> others, it seems we have a choice.  It's unclear to me why we'd choose
> to make REPLICA IDENTITY recurse by default and, say, OWNER not
> recurse.

Ah.  I did argue that OWNER should recurse:
https://postgr.es/m/20171017163203.uw7hmlqonidlfeqj@...
but it was too late then to change it for pg10.  We can change it now,
surely.

> In both cases, the default assumption is presumably that the
> whole partitioning hierarchy should match, but in a particular case a
> user could want to do something else.  Consequently, I don't
> understand why a user would want one of those operations to descend to
> children by default and the other not.

I agree that OWNER TO should recurse for partitioned tables. 

+1

That was clearly just missed. It's a strange thought to have a partitioned table with different pieces owned by different users. So strange that the lack of complaints about the recursion are surely because nobody ever tried it in a real situation. I would prefer to disallow differences in ownership across partitions, since that almost certainly prevents running sensible DDL and its just a huge footgun.

Recursion should be the default for partitioned tables.

I'm -0 on
changing it for legacy inheritance, but if the majority is to change it
for both, let's do that.

-1 for changing legacy inheritance. Two separate features. Inheritance has been around for many years and its feature set is stable. No need to touch it.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Reply | Threaded
Open this post in threaded view
|

Re: propagating replica identity to partitions

Robert Haas
In reply to this post by Alvaro Herrera-9
On Wed, Feb 20, 2019 at 12:38 PM Alvaro Herrera
<[hidden email]> wrote:
> > I don't see that in the NOTES section for ALTER TABLE.  Are you
> > looking at some patch, or at master?
>
> I was looking here:
> https://www.postgresql.org/docs/11/sql-altertable.html

OK, I was looking at the wrong thing.  Not enough caffeine, apparently.

> Maybe the ALL IN TABLESPACE and OWNED BY sub-forms should be split to a
> separate para.  I suggest:
>
> :   This form changes the table's tablespace to the specified tablespace
> :   and moves the data file(s) associated with the table to the new
> :   tablespace. Indexes on the table, if any, are not moved; but they
> :   can be moved separately with additional SET TABLESPACE commands.
> :   When applied to a partitioned table, nothing is moved, but any
> :   partitions created afterwards with CREATE TABLE PARTITION OF
> :   will use that tablespace.
> :
> :   All
> :   tables in the current database in a tablespace can be moved by using
> :   the ALL IN TABLESPACE form, which will lock all tables to be moved
> :   first and then move each one. This form also supports OWNED BY,
> :   which will only move tables owned by the roles specified. If the
> :   NOWAIT option is specified then the command will fail if it is
> :   unable to acquire all of the locks required immediately. Note that
> :   system catalogs are not moved by this command, use ALTER DATABASE or
> :   explicit ALTER TABLE invocations instead if desired. The
> :   information_schema relations are not considered part of the system
> :   catalogs and will be moved. See also CREATE TABLESPACE.

Seems reasonable.

> I think the reason tablespace was made not to recurse was to avoid
> acquiring access exclusive lock on too many tables at once, but I'm not
> sure.  This is very old behavior.
>
> > Obviously any property where the
> > parents and children have to match, like adding a column or changing a
> > data type, has to always recurse; nothing else is sensible.  For
> > others, it seems we have a choice.  It's unclear to me why we'd choose
> > to make REPLICA IDENTITY recurse by default and, say, OWNER not
> > recurse.
>
> Ah.  I did argue that OWNER should recurse:
> https://postgr.es/m/20171017163203.uw7hmlqonidlfeqj@...
> but it was too late then to change it for pg10.  We can change it now,
> surely.

Yeah, we could.  I wonder, though, whether we should just make
everything recurse.  I think that's what people are commonly going to
want, at least for partitioned tables, and it doesn't seem to me that
it would hurt anything to make the inheritance case work that way,
too.  Right now it looks like we have this list of exceptions:

- actions for identity columns (ADD GENERATED, SET etc., DROP IDENTITY)
- TRIGGER
- CLUSTER
- OWNER
- TABLESPACE never recurse to descendant tables
- Adding a constraint recurses only for CHECK constraints that are not
marked NO INHERIT.

I have a feeling we eventually want this list to be empty, right?  We
want a partitioned table to work as much like a non-partitioned table
as possible, unless the user asks for some other behavior.  Going from
six exceptions to four and maybe having some of them depend on whether
it's partitioning or inheritance doesn't seem like it's as good and
clear as trying to adopt a really uniform policy.

I don't buy Simon's argument that we should treat TABLESPACE
differently because the tables might be really big and take a long
time to move.  I agree that this could well be true, but nobody is
proposing to remove the ability to move tables individually or to use
ONLY here.  Allowing TABLESPACE to recurse just gives people one
additional choice that they don't have today: to move everything at
once. We don't lose any functionality by enabling that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Reply | Threaded
Open this post in threaded view
|

Re: propagating replica identity to partitions

Simon Riggs
On Wed, 20 Feb 2019 at 18:51, Robert Haas <[hidden email]> wrote:

I don't buy Simon's argument that we should treat TABLESPACE
differently because the tables might be really big and take a long
time to move.  I agree that this could well be true, but nobody is
proposing to remove the ability to move tables individually or to use
ONLY here.  Allowing TABLESPACE to recurse just gives people one
additional choice that they don't have today: to move everything at
once. We don't lose any functionality by enabling that.

Doing that would add the single largest footgun in Postgres, given that command's current behavior and the size of partitioned tables.

If it moved partitions concurrently I'd feel differently.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Reply | Threaded
Open this post in threaded view
|

Re: propagating replica identity to partitions

Alvaro Herrera-9
In reply to this post by Robert Haas
On 2019-Feb-20, Robert Haas wrote:

> On Wed, Feb 20, 2019 at 12:38 PM Alvaro Herrera
> <[hidden email]> wrote:

> > Maybe the ALL IN TABLESPACE and OWNED BY sub-forms should be split to a
> > separate para.  I suggest:
> >
> > :   This form changes the table's tablespace to the specified tablespace
> > :   and moves the data file(s) associated with the table to the new
> > :   tablespace. Indexes on the table, if any, are not moved; but they
> > :   can be moved separately with additional SET TABLESPACE commands.
> > :   When applied to a partitioned table, nothing is moved, but any
> > :   partitions created afterwards with CREATE TABLE PARTITION OF
> > :   will use that tablespace.
> > :
> > :   All
> > :   tables in the current database in a tablespace can be moved by using
> > :   the ALL IN TABLESPACE form, which will lock all tables to be moved
> > :   first and then move each one. This form also supports OWNED BY,
> > :   which will only move tables owned by the roles specified. If the
> > :   NOWAIT option is specified then the command will fail if it is
> > :   unable to acquire all of the locks required immediately. Note that
> > :   system catalogs are not moved by this command, use ALTER DATABASE or
> > :   explicit ALTER TABLE invocations instead if desired. The
> > :   information_schema relations are not considered part of the system
> > :   catalogs and will be moved. See also CREATE TABLESPACE.
>
> Seems reasonable.

Pushed this bit, thanks.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: propagating replica identity to partitions

Alvaro Herrera-9
In reply to this post by Robert Haas
Added Peter E to CC; question at the very end.

On 2019-Feb-20, Robert Haas wrote:

> Yeah, we could.  I wonder, though, whether we should just make
> everything recurse.  I think that's what people are commonly going to
> want, at least for partitioned tables, and it doesn't seem to me that
> it would hurt anything to make the inheritance case work that way,
> too.  Right now it looks like we have this list of exceptions:
>
> - actions for identity columns (ADD GENERATED, SET etc., DROP IDENTITY)
> - TRIGGER
> - CLUSTER
> - OWNER
> - TABLESPACE never recurse to descendant tables
> - Adding a constraint recurses only for CHECK constraints that are not
> marked NO INHERIT.
>
> I have a feeling we eventually want this list to be empty, right?  We
> want a partitioned table to work as much like a non-partitioned table
> as possible, unless the user asks for some other behavior.  Going from
> six exceptions to four and maybe having some of them depend on whether
> it's partitioning or inheritance doesn't seem like it's as good and
> clear as trying to adopt a really uniform policy.
>
> I don't buy Simon's argument that we should treat TABLESPACE
> differently because the tables might be really big and take a long
> time to move.  I agree that this could well be true, but nobody is
> proposing to remove the ability to move tables individually or to use
> ONLY here.  Allowing TABLESPACE to recurse just gives people one
> additional choice that they don't have today: to move everything at
> once. We don't lose any functionality by enabling that.

Tablespaces already behave a little bit especially in another sense:
it doesn't recurse to indexes.  I think it's not possible to implement a
three-way flag, where you tell it to move only the table, or to recurse
to child tables but leave local indexes alone, or just to move
everything.  If we don't move the child indexes, are we really recursing
in that sense?

I think changing the behavior of tablespaces is likely to cause pain for
people with existing scripts that expect the command not to recurse.  We
would have to add a switch of some kind to provide the old behavior in
order not to break those, and that doesn't seem so good either.

I agree we definitely want to treat a partitioned table as similarly as
possible to a non-partitioned table, but in the particular case of
tablespace it seems to me better not to mess with the behavior.


CLUSTER: I'm not sure about this one.  For legacy inheritance there's
no principled way to figure out the right index to recurse to.  For
partitioning that seems a very simple problem, but I do wonder if
there's any use case at all for ALTER TABLE CLUSTER there.  My
inclination would be to reject ALTER TABLE CLUSTER on a partitioned
table altogether, if it isn't already.

OWNER: let's just change this one to recurse always.  I think we have a
consensus about this one.

TRIGGER: I think it would be good to recurse, based on the trigger name.
I'm not sure what to do about legacy inheritance; the trigger isn't
likely to be there.  An option is to silently ignore each inheritance
child (not partition) if the trigger isn't present on it.

We all seem to agree that REPLICA IDENTITY should recurse.

Maybe ADD GENERATED AS IDENTITY / DROP IDENTITY should recurse; not
really sure about this one.  Peter?

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: propagating replica identity to partitions

Michael Paquier-2
On Thu, Feb 28, 2019 at 07:41:11PM -0300, Alvaro Herrera wrote:
> We all seem to agree that REPLICA IDENTITY should recurse.

(entering the ring)
FWIW, I agree that having REPLICA IDENTITY recurse on partitions feels
more natural, as much as being able to use ALTER TABLE ONLY to only
update the definition on a given partition member.
--
Michael

signature.asc (849 bytes) Download Attachment
12