handle tablespaces for partitioned tables during ALTER DATABASE

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

handle tablespaces for partitioned tables during ALTER DATABASE

postgres-8
Summary:

Normally, you are not allowed to `ALTER DATABASE ... SET TABLESPACE` when the
target tablespace is in use by a table in the database.  However, this check
doesn't seem to apply to partitioned tables, and that can lead to unexpected
behavior.

PostgreSQL version:

Compiled by source with git checked out at `refs/tags/REL_12_2`.

```sql
SELECT version();
```

```
                                                 version                                                
---------------------------------------------------------------------------------------------------------
 PostgreSQL 12.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit
(1 row)
```

OS: CentOS 7

Repro:

1. Ensure that databases `d` and `e` are not in use.
1. Ensure that tablespaces `fast` and `faster` are not in use.
1. Ensure that directories `/data/fast` and `/data/faster` for tablespaces are
   empty and have correct permissions
1. On one shell, run

   ```sh
   watch psql -c "\
     SELECT relname, reltablespace\
     FROM pg_class\
     WHERE relname LIKE 'scores_%';\
     " -d d
   ```

1. On a second shell in `psql`, run the following commands at your leisure,
   observing the `watch` in the first shell:

   ```sql
   CREATE DATABASE d;
   CREATE DATABASE e;
   \c d
   CREATE TABLE scores
       (rank int, age int, score int)
       PARTITION BY RANGE (rank);
   CREATE TABLE scores_rank_0_to_100
       PARTITION OF scores FOR VALUES FROM (0) TO (100)
       PARTITION BY RANGE (age);
   CREATE TABLE scores_rank_100_to_200
       PARTITION OF scores FOR VALUES FROM (100) TO (200)
       PARTITION BY RANGE (age);
   -- I want rank 0 to 100 lookups to be fast.
   CREATE TABLESPACE fast LOCATION '/data/fast';
   ALTER TABLE scores_rank_0_to_100
       SET TABLESPACE fast;
   -- Actually, I want the fast SSD to be default.
   \c e
   ALTER DATABASE d
       TABLESPACE fast;
   \c d
   -- Let's load in data.
   CREATE TABLE scores_rank_0_to_100_age_20s
       PARTITION OF scores_rank_0_to_100 FOR VALUES FROM (20) TO (30);
   INSERT INTO scores_rank_0_to_100_age_20s
       VALUES (0, 22, 64819);
   -- INSERT ...
   -- I need a new age category now.
   CREATE TABLE scores_rank_0_to_100_age_30s
       PARTITION OF scores_rank_0_to_100 FOR VALUES FROM (30) TO (40);
   INSERT INTO scores_rank_0_to_100_age_30s
       VALUES (51, 30, 9564);
   -- ...
   -- I want to use a faster disk.
   CREATE TABLESPACE faster LOCATION '/data/faster';
   \c e
   ALTER DATABASE d
       TABLESPACE faster;
   \c d
   -- (I verify that data has moved to /data/faster.)
   -- ...
   -- I need a new age category now.
   CREATE TABLE scores_rank_0_to_100_age_40s
       PARTITION OF scores_rank_0_to_100 FOR VALUES FROM (40) TO (50);
   INSERT INTO scores_rank_0_to_100_age_40s
       VALUES (89, 41, 654);
   -- ...
   -- How come my data for the new table is going to the old location
   -- /data/fast?
   ```

The `watch` `SELECT` should end up producing output

```
           relname            | reltablespace
------------------------------+---------------
 scores_rank_0_to_100         |         16512
 scores_rank_0_to_100_age_20s |             0
 scores_rank_0_to_100_age_30s |             0
 scores_rank_0_to_100_age_40s |         16512
 scores_rank_100_to_200       |             0
(5 rows)
```

after all the commands are run.  I believe that the bug begins on the first
`ALTER DATABASE`.  If `scores_rank_0_to_100` were a physical table (i.e. had a
`relfilenode`), the `ALTER DATABASE` would have caught that and errored out,
and that would have been good.  With my plausible set of steps, I demonstrate
that the `ALTER DATABASE ... SET TABLESPACE` on non-physical tables (e.g.
partitioned tables) should go through the tablespace check as well.

Thanks,
Jason


Reply | Threaded
Open this post in threaded view
|

Re: handle tablespaces for partitioned tables during ALTER DATABASE

Andres Freund
Hi,

On 2020-03-23 23:26:08 -0700, [hidden email] wrote:
> Normally, you are not allowed to `ALTER DATABASE ... SET TABLESPACE` when the
> target tablespace is in use by a table in the database.  However, this check
> doesn't seem to apply to partitioned tables, and that can lead to unexpected
> behavior.

I don't immediately see what problem this could cause? There's no
physical file associated with a partitioned table, it's "just" a
template for new partitions of that partitioned table.

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Fwd: Re: handle tablespaces for partitioned tables during ALTER DATABASE

postgres-8
(Forgot to CC pgsql-bugs.)

Jason

----- Forwarded message from [hidden email] -----

Date: Tue, 24 Mar 2020 14:24:33 -0700
From: [hidden email]
To: Andres Freund <[hidden email]>
Subject: Re: handle tablespaces for partitioned tables during ALTER DATABASE
Message-ID: <[hidden email]>
References: <[hidden email]>
        <[hidden email]>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <[hidden email]>

Hi, Andres.

On 2020-03-24T12:23:59-0700, Andres Freund wrote:
> I don't immediately see what problem this could cause? There's no
> physical file associated with a partitioned table, it's "just" a
> template for new partitions of that partitioned table.

I had illustrated a plausible scenario where this could be _unexpected_.  The
state that you can get into after the first `ALTER DATABASE` is strange because
the tablespace of `scores_rank_0_to_100` is not `InvalidOid` yet equal to the
database's tablespace.

You can't get into this state normally: try

```sql
CREATE DATABASE f TABLESPACE fast;
\c f
CREATE TABLE t (i int, j int) PARTITION BY RANGE (i);
CREATE TABLE t0 PARTITION OF t FOR VALUES FROM (0) TO (100) TABLESPACE fast;
SELECT relname, reltablespace FROM pg_class WHERE relname LIKE 't0';
```

```
 relname | reltablespace
---------+---------------
 t0      |             0
(1 row)
```

Notice that it's `InvalidOid`, not the explicit database's tablespace.  But you
can get into that state with some work, like I illustrated in my example in the
first message.

For less confusion, I think it makes sense to prohibit this or seamlessly
convert the appropriate `reltablespace` values to `InvalidOid`.  Enforcing this
may also give me less things to worry about for my own work.

Jason

----- End forwarded message -----