pg_dump is broken for partition tablespaces

classic Classic list List threaded Threaded
57 messages Options
123
Reply | Threaded
Open this post in threaded view
|

pg_dump is broken for partition tablespaces

David Rowley-3
Over on [1] Andres pointed out that the pg_dump support for the new to
PG12 tablespace inheritance feature is broken.  This is the feature
added in ca4103025dfe26 to allow a partitioned table to have a
tablespace that acts as the default tablespace for newly attached
partitions. The idea being that you can periodically change the
default tablespace for new partitions to a tablespace that sits on a
disk partition with more free space without affecting the default
tablespace for normal non-partitioned tables. Anyway...

pg_dump is broken with this. Consider:

create tablespace newts location '/tmp/newts';
create table listp (a int) partition by list(a) tablespace newts;
create table listp1 partition of listp for values in(1) tablespace pg_default;
create table listp2 partition of listp for values in(2);

select relname,relkind,reltablespace from pg_class where relname like
'listp%' and relkind in('r','p') order by relname;
produces:

 relname | relkind | reltablespace
---------+---------+---------------
 listp   | p       |         16384
 listp1  | r       |             0
 listp2  | r       |         16384
(3 rows)

after dump/restore:

 relname | relkind | reltablespace
---------+---------+---------------
 listp   | p       |         16384
 listp1  | r       |         16384
 listp2  | r       |         16384
(3 rows)

Here the tablespace for listp1 was inherited from listp, but we really
should have restored this to use pg_default like was specified.

The reason this occurs is that in pg_dump we do:

SET default_tablespace = '';

CREATE TABLE public.listp1 PARTITION OF public.listp
FOR VALUES IN (1);

so, since we're creating the table initially as a partition the logic
that applies the default partition from the parent kicks in.

If we instead did:

CREATE TABLE public.listp1 (a integer
);

ALTER TABLE public.list1 ATTACH PARTITION public.listp FOR VALUES IN (1);

then we'd have no issue, as tablespace will be set to whatever
default_tablespace is set to.

Partitioned indexes have this similar inherit tablespace from parent
feature, so ca4103025dfe26 was intended to align the behaviour of the
two. Partitioned indexes happen not to suffer from the same issue as
the indexes are attached after their creation similar to what I
propose above.

Can anyone see any fundamental reason that we should not create a
partitioned table by doing CREATE TABLE followed by ATTACH PARTITION?
 If not, I'll write a patch that fixes it that way.

As far as I can see, the biggest fundamental difference with doing
things this way will be that the column order of partitions will be
preserved, where before it would inherit the order of the partitioned
table.  I'm a little unsure if doing this column reordering was an
intended side-effect or not.

[1] https://www.postgresql.org/message-id/20190305060804.jv5mz4slrnelh3jy@...

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: pg_dump is broken for partition tablespaces

Michael Paquier-2
On Wed, Mar 06, 2019 at 07:45:06PM +1300, David Rowley wrote:
> Partitioned indexes have this similar inherit tablespace from parent
> feature, so ca4103025dfe26 was intended to align the behaviour of the
> two. Partitioned indexes happen not to suffer from the same issue as
> the indexes are attached after their creation similar to what I
> propose above.
>
> Can anyone see any fundamental reason that we should not create a
> partitioned table by doing CREATE TABLE followed by ATTACH PARTITION?
>  If not, I'll write a patch that fixes it that way.

The part for partitioned indexes is already battle-proven, so if the
part for partitioned tables can be consolidated the same way that
would be really nice.

> As far as I can see, the biggest fundamental difference with doing
> things this way will be that the column order of partitions will be
> preserved, where before it would inherit the order of the partitioned
> table.  I'm a little unsure if doing this column reordering was an
> intended side-effect or not.

I don't see any direct issues with that to be honest thinking about
it..
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pg_dump is broken for partition tablespaces

David Rowley-3
On Wed, 6 Mar 2019 at 20:19, Michael Paquier <[hidden email]> wrote:
>
> On Wed, Mar 06, 2019 at 07:45:06PM +1300, David Rowley wrote:
> > Can anyone see any fundamental reason that we should not create a
> > partitioned table by doing CREATE TABLE followed by ATTACH PARTITION?
> >  If not, I'll write a patch that fixes it that way.
>
> The part for partitioned indexes is already battle-proven, so if the
> part for partitioned tables can be consolidated the same way that
> would be really nice.

I think Andres is also going to need it to work this way for the
pluggable storage patch too.

Looking closer at this, I discovered that when pg_dump is in binary
upgrade mode that it crafts the pg_dump output in this way anyway...
Obviously, the column orders can't go changing magically in that case
since we're about to plug the old heap into the new table.  Due to the
removal of the special case, it means this patch turned out to remove
more code than it adds.

Patch attached.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

v1-0001-Make-pg_dump-emit-ATTACH-PARTITION-instead-of-PAR.patch (11K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pg_dump is broken for partition tablespaces

Tom Lane-2
In reply to this post by David Rowley-3
David Rowley <[hidden email]> writes:
> As far as I can see, the biggest fundamental difference with doing
> things this way will be that the column order of partitions will be
> preserved, where before it would inherit the order of the partitioned
> table.  I'm a little unsure if doing this column reordering was an
> intended side-effect or not.

Well, if the normal behavior results in changing the column order,
it'd be necessary to do things differently in --binary-upgrade mode
anyway, because there we *must* preserve column order.  I don't know
if what you're describing represents a separate bug for pg_upgrade runs,
but it might.  Is there any test case for the situation left behind by
the core regression tests?

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: pg_dump is broken for partition tablespaces

Andres Freund
In reply to this post by David Rowley-3
Hi,

On 2019-03-06 19:45:06 +1300, David Rowley wrote:
> Over on [1] Andres pointed out that the pg_dump support for the new to
> PG12 tablespace inheritance feature is broken.  This is the feature
> added in ca4103025dfe26 to allow a partitioned table to have a
> tablespace that acts as the default tablespace for newly attached
> partitions. The idea being that you can periodically change the
> default tablespace for new partitions to a tablespace that sits on a
> disk partition with more free space without affecting the default
> tablespace for normal non-partitioned tables. Anyway...

I'm also concerned that the the current catalog representation isn't
right. As I said:

> I also find it far from clear that:
>     <listitem>
>      <para>
>       The <replaceable class="parameter">tablespace_name</replaceable> is the name
>       of the tablespace in which the new table is to be created.
>       If not specified,
>       <xref linkend="guc-default-tablespace"/> is consulted, or
>       <xref linkend="guc-temp-tablespaces"/> if the table is temporary.  For
>       partitioned tables, since no storage is required for the table itself,
>       the tablespace specified here only serves to mark the default tablespace
>       for any newly created partitions when no other tablespace is explicitly
>       specified.
>      </para>
>     </listitem>
> is handled correctly. The above says that the *specified* tablespaces -
> which seems to exclude the default tablespace - is what's going to
> determine what partitions use as their default tablespace. But in fact
> that's not true, the partitioned table's pg_class.retablespace is set to
> what default_tablespaces was at the time of the creation.

I still think the feature as is doesn't seem to have very well defined
behaviour.



> If we instead did:
>
> CREATE TABLE public.listp1 (a integer
> );
>
> ALTER TABLE public.list1 ATTACH PARTITION public.listp FOR VALUES IN (1);

Isn't that a bit more expensive, because now the table needs to be
scanned for maching the value? That's probably neglegible though, given
it'd probably always empty.


Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: pg_dump is broken for partition tablespaces

David Rowley-3
In reply to this post by Tom Lane-2
On Thu, 7 Mar 2019 at 03:36, Tom Lane <[hidden email]> wrote:

>
> David Rowley <[hidden email]> writes:
> > As far as I can see, the biggest fundamental difference with doing
> > things this way will be that the column order of partitions will be
> > preserved, where before it would inherit the order of the partitioned
> > table.  I'm a little unsure if doing this column reordering was an
> > intended side-effect or not.
>
> Well, if the normal behavior results in changing the column order,
> it'd be necessary to do things differently in --binary-upgrade mode
> anyway, because there we *must* preserve column order.  I don't know
> if what you're describing represents a separate bug for pg_upgrade runs,
> but it might.  Is there any test case for the situation left behind by
> the core regression tests?

After having written the patch, I noticed that binary upgrade mode
does the CREATE TABLE then ATTACH PARTITION in order to preserve the
order.

After changing it nothing failed in make check-world with tap tests enabled.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: pg_dump is broken for partition tablespaces

David Rowley-3
In reply to this post by Andres Freund
On Thu, 7 Mar 2019 at 05:17, Andres Freund <[hidden email]> wrote:

> I'm also concerned that the the current catalog representation isn't
> right. As I said:
>
> > I also find it far from clear that:
> >     <listitem>
> >      <para>
> >       The <replaceable class="parameter">tablespace_name</replaceable> is the name
> >       of the tablespace in which the new table is to be created.
> >       If not specified,
> >       <xref linkend="guc-default-tablespace"/> is consulted, or
> >       <xref linkend="guc-temp-tablespaces"/> if the table is temporary.  For
> >       partitioned tables, since no storage is required for the table itself,
> >       the tablespace specified here only serves to mark the default tablespace
> >       for any newly created partitions when no other tablespace is explicitly
> >       specified.
> >      </para>
> >     </listitem>
> > is handled correctly. The above says that the *specified* tablespaces -
> > which seems to exclude the default tablespace - is what's going to
> > determine what partitions use as their default tablespace. But in fact
> > that's not true, the partitioned table's pg_class.retablespace is set to
> > what default_tablespaces was at the time of the creation.

Do you think it's fine to reword the docs to make this point more
clear, or do you see this as a fundamental problem with the patch?

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: pg_dump is broken for partition tablespaces

Andres Freund
Hi,

On 2019-03-07 11:31:15 +1300, David Rowley wrote:

> On Thu, 7 Mar 2019 at 05:17, Andres Freund <[hidden email]> wrote:
> > I'm also concerned that the the current catalog representation isn't
> > right. As I said:
> >
> > > I also find it far from clear that:
> > >     <listitem>
> > >      <para>
> > >       The <replaceable class="parameter">tablespace_name</replaceable> is the name
> > >       of the tablespace in which the new table is to be created.
> > >       If not specified,
> > >       <xref linkend="guc-default-tablespace"/> is consulted, or
> > >       <xref linkend="guc-temp-tablespaces"/> if the table is temporary.  For
> > >       partitioned tables, since no storage is required for the table itself,
> > >       the tablespace specified here only serves to mark the default tablespace
> > >       for any newly created partitions when no other tablespace is explicitly
> > >       specified.
> > >      </para>
> > >     </listitem>
> > > is handled correctly. The above says that the *specified* tablespaces -
> > > which seems to exclude the default tablespace - is what's going to
> > > determine what partitions use as their default tablespace. But in fact
> > > that's not true, the partitioned table's pg_class.retablespace is set to
> > > what default_tablespaces was at the time of the creation.
>
> Do you think it's fine to reword the docs to make this point more
> clear, or do you see this as a fundamental problem with the patch?

Hm, both? I mean I wouldn't necessarily characterize it as "fundamental"
problem, but ...

I don't think the argument that the user intended to explicitly set a
tablespace holds much water if it was just set via default_tablespace,
rather than an explicit TABLESPACE.  I think iff you really want
something like this feature, you'd have to mark a partition's
reltablespace as 0 unless an *explicit* assignment of the tablespace
happened. In which case you also would need to explicitly emit a
TABLESPACE for the partitioned table in pg_dump, to restore that.

Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: pg_dump is broken for partition tablespaces

David Rowley-3
On Thu, 7 Mar 2019 at 11:37, Andres Freund <[hidden email]> wrote:
>
> On 2019-03-07 11:31:15 +1300, David Rowley wrote:
> > Do you think it's fine to reword the docs to make this point more
> > clear, or do you see this as a fundamental problem with the patch?
>
> Hm, both? I mean I wouldn't necessarily characterize it as "fundamental"
> problem, but ...

Okay, so if I understand you correctly, you're complaining about the
fact that if the user does:

CREATE TABLE p (a int) PARTITION BY LIST(a) TABLESPACE pg_default;

that the user intended that all future partitions go to pg_default and
not whatever default_tablespace is set to at the time?

If so, that seems like a genuine concern.

I see in heap_create() we do;

/*
* Never allow a pg_class entry to explicitly specify the database's
* default tablespace in reltablespace; force it to zero instead. This
* ensures that if the database is cloned with a different default
* tablespace, the pg_class entry will still match where CREATE DATABASE
* will put the physically copied relation.
*
* Yes, this is a bit of a hack.
*/
if (reltablespace == MyDatabaseTableSpace)
reltablespace = InvalidOid;

which will zero pg_class.reltablespace if the specified tablespace
happens to match pg_database.dattablespace. This causes future
partitions to think that no tablespace was specified and therefore
DefineRelation() consults the default_tablespace.

I see that this same problem exists for partitioned indexes too:

create table listp (a int) partition by list(a);
create index on listp (a) tablespace pg_default;
set default_Tablespace = n;
create table listp1 partition of listp for values in(1);
\d listp1
               Table "public.listp1"
 Column |  Type   | Collation | Nullable | Default
--------+---------+-----------+----------+---------
 a      | integer |           |          |
Partition of: listp FOR VALUES IN (1)
Indexes:
    "listp1_a_idx" btree (a), tablespace "n"
Tablespace: "n"

If I understand what you're saying correctly, then the listp1_a_idx
should have been created in pg_default since that's what the default
partitioned index tablespace was set to.

> I don't think the argument that the user intended to explicitly set a
> tablespace holds much water if it was just set via default_tablespace,
> rather than an explicit TABLESPACE.  I think iff you really want
> something like this feature, you'd have to mark a partition's
> reltablespace as 0 unless an *explicit* assignment of the tablespace
> happened. In which case you also would need to explicitly emit a
> TABLESPACE for the partitioned table in pg_dump, to restore that.

I think emitting an explicit tablespace in pg_dump for partitioned
tables (when non-zero) might have issues for pg_restore's
--no-tablespaces option.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: pg_dump is broken for partition tablespaces

Andres Freund
On 2019-03-07 15:25:34 +1300, David Rowley wrote:

> On Thu, 7 Mar 2019 at 11:37, Andres Freund <[hidden email]> wrote:
> >
> > On 2019-03-07 11:31:15 +1300, David Rowley wrote:
> > > Do you think it's fine to reword the docs to make this point more
> > > clear, or do you see this as a fundamental problem with the patch?
> >
> > Hm, both? I mean I wouldn't necessarily characterize it as "fundamental"
> > problem, but ...
>
> Okay, so if I understand you correctly, you're complaining about the
> fact that if the user does:
>
> CREATE TABLE p (a int) PARTITION BY LIST(a) TABLESPACE pg_default;
>
> that the user intended that all future partitions go to pg_default and
> not whatever default_tablespace is set to at the time?

Correct. And also, conversely, if default_tablespace was set to frakbar
at the time of CREATE TABLE, but no explicit TABLESPACE was provided,
that that should *not* be the default for new partitions; rather
default_tablespace should be consulted again.


> If I understand what you're saying correctly, then the listp1_a_idx
> should have been created in pg_default since that's what the default
> partitioned index tablespace was set to.

That's how I understand the intent of the user yes.


> > I don't think the argument that the user intended to explicitly set a
> > tablespace holds much water if it was just set via default_tablespace,
> > rather than an explicit TABLESPACE.  I think iff you really want
> > something like this feature, you'd have to mark a partition's
> > reltablespace as 0 unless an *explicit* assignment of the tablespace
> > happened. In which case you also would need to explicitly emit a
> > TABLESPACE for the partitioned table in pg_dump, to restore that.
>
> I think emitting an explicit tablespace in pg_dump for partitioned
> tables (when non-zero) might have issues for pg_restore's
> --no-tablespaces option.

Yea, there'd probably need to be handling in pg_backup_archiver.c or
such, not sure how to do that best.

Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: pg_dump is broken for partition tablespaces

Alvaro Herrera-9
In reply to this post by Andres Freund
On 2019-Mar-06, Andres Freund wrote:


> > I also find it far from clear that:
> >     <listitem>
> >      <para>
> >       The <replaceable class="parameter">tablespace_name</replaceable> is the name
> >       of the tablespace in which the new table is to be created.
> >       If not specified,
> >       <xref linkend="guc-default-tablespace"/> is consulted, or
> >       <xref linkend="guc-temp-tablespaces"/> if the table is temporary.  For
> >       partitioned tables, since no storage is required for the table itself,
> >       the tablespace specified here only serves to mark the default tablespace
> >       for any newly created partitions when no other tablespace is explicitly
> >       specified.
> >      </para>
> >     </listitem>
> > is handled correctly. The above says that the *specified* tablespaces -
> > which seems to exclude the default tablespace - is what's going to
> > determine what partitions use as their default tablespace. But in fact
> > that's not true, the partitioned table's pg_class.retablespace is set to
> > what default_tablespaces was at the time of the creation.
>
> I still think the feature as is doesn't seem to have very well defined
> behaviour.
I have just started looking into this issue.  I'm not sure yet what's
the best fix; maybe for the specific case of partitioned tables and
indexes we should deviate from the ages-old behavior of storing zero
tablespace, if the tablespace is specified as the default one.  But I
haven't written the code yet.

In the meantime, here's David's patch, rebased to current master and
with the pg_upgrade and pg_dump tests fixed to match the new partition
creation behavior.

> > If we instead did:
> >
> > CREATE TABLE public.listp1 (a integer
> > );
> >
> > ALTER TABLE public.list1 ATTACH PARTITION public.listp FOR VALUES IN (1);
>
> Isn't that a bit more expensive, because now the table needs to be
> scanned for maching the value? That's probably neglegible though, given
> it'd probably always empty.
I think it's always empty.  In the standard case, there are two
transactions rather than one, so yeah it's a little bit more expensive.
Maybe we should make this conditional on there actually being an
important tablespace distinction to preserve.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

v2-0001-Make-pg_dump-emit-ATTACH-PARTITION-instead-of-PAR.patch (10K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pg_dump is broken for partition tablespaces

Alvaro Herrera-9
In reply to this post by David Rowley-3
On 2019-Mar-07, David Rowley wrote:

> On Thu, 7 Mar 2019 at 11:37, Andres Freund <[hidden email]> wrote:
> >
> > On 2019-03-07 11:31:15 +1300, David Rowley wrote:
> > > Do you think it's fine to reword the docs to make this point more
> > > clear, or do you see this as a fundamental problem with the patch?
> >
> > Hm, both? I mean I wouldn't necessarily characterize it as "fundamental"
> > problem, but ...
>
> Okay, so if I understand you correctly, you're complaining about the
> fact that if the user does:
>
> CREATE TABLE p (a int) PARTITION BY LIST(a) TABLESPACE pg_default;
>
> that the user intended that all future partitions go to pg_default and
> not whatever default_tablespace is set to at the time?
>
> If so, that seems like a genuine concern.

So, you don't need a partition in order to see a problem here: pg_dump
doesn't do the right thing for the partitioned table.  It does this:

SET default_tablespace = pg_default;
CREATE TABLE public.p (
    a integer
)
PARTITION BY LIST (a);

which naturally has a different effect on partitions.

Now, I think the problem for partitions can be solved in a simple
manner: we can have this code David quoted ignore the
MyDatabaseTableSpace exception for partitioned rels:

> /*
> * Never allow a pg_class entry to explicitly specify the database's
> * default tablespace in reltablespace; force it to zero instead. This
> * ensures that if the database is cloned with a different default
> * tablespace, the pg_class entry will still match where CREATE DATABASE
> * will put the physically copied relation.
> *
> * Yes, this is a bit of a hack.
> */
> if (reltablespace == MyDatabaseTableSpace)
> reltablespace = InvalidOid;

This gives the right effect AFAICS, namely to store the specified
tablespace regardless of what it is; and there is no problem with the
"physically copied relation" as the comment says, because there *isn't*
a physical relation in the first place.

However, in order to fix the pg_dump behavior for the partitioned rel,
we would need to emit the tablespace differently, i.e. not use SET
default_tablespace, but instead attach the tablespace clause to the
CREATE TABLE line.

I'll go have a look at what problems are there with doing that.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: pg_dump is broken for partition tablespaces

Alvaro Herrera-9
On 2019-Apr-09, Alvaro Herrera wrote:

> However, in order to fix the pg_dump behavior for the partitioned rel,
> we would need to emit the tablespace differently, i.e. not use SET
> default_tablespace, but instead attach the tablespace clause to the
> CREATE TABLE line.
>
> I'll go have a look at what problems are there with doing that.

I tried with the attached tbspc-partitioned.patch (on top of the
previous patch).  It makes several additional cases work correctly, but
causes much more pain than it fixes:

1. if a partitioned table is created without a tablespace spec, then
pg_dump dumps it with a clause like TABLESPACE "".

2. If you fix that to make pg_dump omit the tablespace clause if the
tablespace is default, then there's no SET nor TABLESPACE clauses, so
the table is created in the tablespace of the previously dumped table.
Useless.

3. You can probably make the above work anyway with enough hacks, but
by the time you're finished, the code stinks like months-only fish.

4. Even if you ignore the odor, all the resulting CREATE TABLE commands
will fail if you run them in a database that doesn't contain the
tablespaces in question.  That is, it negates one of the main points of
using "SET default_tablespace" in the first place.

I therefore decree that this approach is not a solution and never will
be.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

tbspc-partitioned.patch (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pg_dump is broken for partition tablespaces

Alvaro Herrera-9
In reply to this post by Andres Freund
On 2019-Mar-06, Andres Freund wrote:

> I don't think the argument that the user intended to explicitly set a
> tablespace holds much water if it was just set via default_tablespace,
> rather than an explicit TABLESPACE.  I think iff you really want
> something like this feature, you'd have to mark a partition's
> reltablespace as 0 unless an *explicit* assignment of the tablespace
> happened. In which case you also would need to explicitly emit a
> TABLESPACE for the partitioned table in pg_dump, to restore that.

Thinking more about this, I think you're wrong about the behavior under
nonempty default_tablespace.  Quoth the fine manual:

default_tablespace:
        [...]
        This variable specifies the default tablespace in which to create
        objects (tables and indexes) when a CREATE command does not explicitly specify
        a tablespace.
        The value is either the name of a tablespace, or an empty string to
        specify using the default tablespace of the current database. [...]
        https://www.postgresql.org/docs/11/runtime-config-client.html#RUNTIME-CONFIG-CLIENT-STATEMENT

What this says to me, if default_tablespace is set, and there is no
TABLESPACE clause, we should regard the default_tablespace just as if it
were an explicitly named tablespace.  Note that the default setting of
default_tablespace is empty, meaning that tables are created in the
database tablespace.

Emerging behavior: default_tablespace is set to A, then partitioned
table T is created, then default_tablespace is changed to B.  Any
partitions of T created afterwards still appear in tablespace A.

If you really intended for new partitions to be created in
default_tablespace (following future changes to that option), then you
should just leave default_tablespace as empty when creating T.

There is one deficiency that needs to be solved in order for this to
work fully: currently there is no way to reset "reltablespace" to 0.

Does that make sense?

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: pg_dump is broken for partition tablespaces

Alvaro Herrera-9
On 2019-Apr-09, Alvaro Herrera wrote:

> There is one deficiency that needs to be solved in order for this to
> work fully: currently there is no way to reset "reltablespace" to 0.

Therefore I propose to add
ALTER TABLE tb ... RESET TABLESPACE;
which sets reltablespace to 0, and it would work only for partitioned
tables and indexes.

That, together with the initial proposal by David, seems to me to solve
the issue at hand.

If no objections, I'll try to come up with a patch tomorrow.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: pg_dump is broken for partition tablespaces

David Rowley-3
On Wed, 10 Apr 2019 at 11:05, Alvaro Herrera <[hidden email]> wrote:

>
> On 2019-Apr-09, Alvaro Herrera wrote:
>
> > There is one deficiency that needs to be solved in order for this to
> > work fully: currently there is no way to reset "reltablespace" to 0.
>
> Therefore I propose to add
> ALTER TABLE tb ... RESET TABLESPACE;
> which sets reltablespace to 0, and it would work only for partitioned
> tables and indexes.
>
> That, together with the initial proposal by David, seems to me to solve
> the issue at hand.
>
> If no objections, I'll try to come up with a patch tomorrow.

I'm starting to wonder if maintaining two separate behaviours here
isn't just to complex.

For example, if I do:

CREATE TABLE a (a INT PRIMARY KEY) TABLESPACE mytablespace;

then a_pkey goes into the default_tablespace, not mytablespace.

Also, is it weird that CLUSTER can move a table into another
tablespace if the database's tablespace has changed?

postgres=# CREATE TABLE a (a INT PRIMARY KEY) TABLESPACE pg_default;
CREATE TABLE
postgres=# SELECT pg_relation_filepath('a'::regclass);
 pg_relation_filepath
----------------------
 base/12702/16444
(1 row)

postgres=# \c n
n=# ALTER DATABASE postgres TABLESPACE mytablespace;
ALTER DATABASE
n=# \c postgres
postgres=# CLUSTER a USING a_pkey;
CLUSTER
postgres=# SELECT pg_relation_filepath('a'::regclass);
            pg_relation_filepath
---------------------------------------------
 pg_tblspc/16415/PG_12_201904072/12702/16449
(1 row)

This one seems very strange to me.

I think to make it work we'd need to modify heap_create() and
heap_create_with_catalog() to add a new bool argument that controls if
the TABLESPACE was defined the calling command then only set the
reltablespace to InvalidOid if the tablespace was not defined and it
matches the database's tablespace.  If we want to treat table
partitions and index partitions in a special way then we'll need to
add a condition to not set InvalidOid if the relkind is one of those.
That feels a bit dirty, but if the above two cases were also deemed
wrong then we wouldn't need the special case.

Another option would be instead of adding a new bool flag, just pass
InvalidOid for the tablespace to heap_create() when TABLESPACE was not
specified then have it lookup GetDefaultTablespace() but keep
pg_class.reltablespace set to InvalidOId. Neither of these would be a
back-patchable fix for index partitions in PG11. Not sure what to do
about that...

Making constraints follow the tablespace specified during CREATE TABLE
would require a bit more work.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: pg_dump is broken for partition tablespaces

Alvaro Herrera-9
In reply to this post by Alvaro Herrera-9
On 2019-Apr-09, Alvaro Herrera wrote:

> There is one deficiency that needs to be solved in order for this to
> work fully: currently there is no way to reset "reltablespace" to 0.

Actually, there is one -- just need to
  ALTER TABLE .. SET TABLESPACE <the database default tablespace>
this took me a bit by surprise, but actually it's quite expected.

So I think that apart from David's patch, we should just document all
these things carefully.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: pg_dump is broken for partition tablespaces

Andres Freund
Hi,

On 2019-04-10 09:28:21 -0400, Alvaro Herrera wrote:
> So I think that apart from David's patch, we should just document all
> these things carefully.

Yea, I think that's the most important part.

I'm not convinced that we should have any inheriting behaviour btw - it
seems like there's a lot of different ways to think this should behave,
with different good reason each.

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: pg_dump is broken for partition tablespaces

Alvaro Herrera-9
On 2019-Apr-10, Andres Freund wrote:

> Hi,
>
> On 2019-04-10 09:28:21 -0400, Alvaro Herrera wrote:
> > So I think that apart from David's patch, we should just document all
> > these things carefully.
>
> Yea, I think that's the most important part.
>
> I'm not convinced that we should have any inheriting behaviour btw - it
> seems like there's a lot of different ways to think this should behave,
> with different good reason each.

So, I ended up with the attached patch.  I think it works pretty well,
and it passes all my POLA tests.

But it doesn't pass pg_upgrade tests!  And investigating closer, it
seems closely related to what David was complaining elsewhere about the
tablespace being improperly set by some rewrite operations.  Here's the
setup as created by regress' create_table.sql:

create table at_partitioned (a int, b text) partition by range (a);
create table at_part_1 partition of at_partitioned for values from (0) to (1000);
insert into at_partitioned values (512, '0.123');
create table at_part_2 (b text, a int);
insert into at_part_2 values ('1.234', 1024);
create index on at_partitioned (b);
create index on at_partitioned (a);

If you examine state at this point, it's all good:
alvherre=# select relname, reltablespace from pg_class where relname like 'at_partitioned%';
       relname        | reltablespace
----------------------+---------------
 at_partitioned       |             0
 at_partitioned_a_idx |             0
 at_partitioned_b_idx |             0

but the test immediately does this:

alter table at_partitioned alter column b type numeric using b::numeric;

and watch what happens!  (1663 is pg_default)

alvherre=# select relname, reltablespace from pg_class where relname like 'at_partitioned%';
       relname        | reltablespace
----------------------+---------------
 at_partitioned       |             0
 at_partitioned_a_idx |             0
 at_partitioned_b_idx |          1663
(3 filas)

Outrageous!

I'm going to have a look at this behavior now.  IMO it's a separate bug,
but with that obviously we cannot fix the other one.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: pg_dump is broken for partition tablespaces

Alvaro Herrera-9
On 2019-Apr-10, Alvaro Herrera wrote:

> but the test immediately does this:
>
> alter table at_partitioned alter column b type numeric using b::numeric;
>
> and watch what happens!  (1663 is pg_default)
>
> alvherre=# select relname, reltablespace from pg_class where relname like 'at_partitioned%';
>        relname        | reltablespace
> ----------------------+---------------
>  at_partitioned       |             0
>  at_partitioned_a_idx |             0
>  at_partitioned_b_idx |          1663
> (3 filas)
>
> Outrageous!
This is because ruleutils.c attaches a TABLESPACE clause when asked to
dump an index definition; and tablecmds.c uses ruleutils to deparse the
index definition into something that can be replayed via CREATE INDEX
commands (or ALTER TABLE ADD CONSTRAINT UNIQUE/PRIMARY KEY, if that's
the case.)

This patch (PoC quality) fixes that behavior, but I'm looking to see
what else it breaks.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

tblsp.patch (2K) Download Attachment
123