Thinking about EXPLAIN ALTER TABLE

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

Thinking about EXPLAIN ALTER TABLE

Greg Stark
I've been poking around with a feature I've wanted a number of times
in the past, "EXPLAIN ALTER TABLE". The idea is that there are a bunch
of optimizations in ALTER TABLE to minimize the amount of work and
lock levels but it's really hard for users to tell whether they've
written their ALTER TABLE commands carefully enough and properly to
trigger the optimizations. As a result it's really easy for to
accidentally take an exclusive lock and/or do a full table rewrite
when you were expecting to just do a quick catalog update.

The things I want to expose in ALTER TABLE are:

1. The lock level that's going to be taken
2. Whether a full table rewrite is going to happen
3. Whether a full table constraint validation is going to happen
4. Whether any indexes are going to be built or rebuilt
5. Whether the command is going to error out early due to syntax,
permissions, or other inconsistencies

Are there are other aspects of alter table that people would like to
see exposed that I haven't thought of?

For the most part ALTER TABLE is already structured such that this is
pretty easy. It does a lot of preparatory work without doing catalog
updates and I can just call that same preparatory work without calling
the subsequent work phases.

However there are a number of cases where decisions are made only
during the actual work phase, phase 2, and flags are set and work
enqueued for phase 3. In some cases the work that's enqueued would be
hard to predict in advance, for example if a partition is added a new
constraint is added for the partition but if that new constraint is
merged with an existing constraint (which is handled by
AddRelationNewConstraints()) then it doesn't need to be re-validated.

I'm thinking I should try to move all these decisions to phase 1 as
much as possible but I'm not sure how feasible it will be to get the
results exactly correct. Of course the cases where it's hardest to
predict are precisely where users would most like to know what's going
to happen...

If anyone has any ideas or tips on how to avoid these problems I'm all ears.

Currently the output is a bit rough, it looks like:

postgres=# explain alter table x2 add foreign key (i) references x1(i);
┌───────────────────────────────────┐
│            QUERY PLAN             │
├───────────────────────────────────┤
│ Lock Level: ShareRowExclusiveLock │
│ ALTER TABLE: x2                   │
│   Relation: x2                    │
│   Rewrite: none                   │
└───────────────────────────────────┘


postgres***=# explain alter table t add column j integer generated
always as  identity primary key;
┌─────────────────────────────────┐
│           QUERY PLAN            │
├─────────────────────────────────┤
│ Lock Level: AccessExclusiveLock │
│ CREATE SEQUENCE: t_j_seq        │
│ ALTER TABLE: t                  │
│   Relation: t                   │
│   Rewrite: none                 │
│ ALTER SEQUENCE: t_j_seq         │
└─────────────────────────────────┘

postgres***=# explain alter table t set unlogged;
┌─────────────────────────────────────┐
│             QUERY PLAN              │
├─────────────────────────────────────┤
│ Lock Level: AccessExclusiveLock     │
│ ALTER TABLE: t                      │
│   Relation: t                       │
│   Rewrite: Due to ALTER PERSISTENCE │
└─────────────────────────────────────┘

postgres***=# explain alter table t alter column i set not null;
┌─────────────────────────────────┐
│           QUERY PLAN            │
├─────────────────────────────────┤
│ Lock Level: AccessExclusiveLock │
│ ALTER TABLE: t                  │
│   Relation: t                   │
│   Rewrite: none                 │
│   Relation: t2                  │
│   Rewrite: none                 │
└─────────────────────────────────┘


--
greg
Reply | Threaded
Open this post in threaded view
|

Re: Thinking about EXPLAIN ALTER TABLE

Peter Geoghegan-4
On Fri, Dec 7, 2018 at 1:18 PM Greg Stark <[hidden email]> wrote:
> I've been poking around with a feature I've wanted a number of times
> in the past, "EXPLAIN ALTER TABLE". The idea is that there are a bunch
> of optimizations in ALTER TABLE to minimize the amount of work and
> lock levels but it's really hard for users to tell whether they've
> written their ALTER TABLE commands carefully enough and properly to
> trigger the optimizations. As a result it's really easy for to
> accidentally take an exclusive lock and/or do a full table rewrite
> when you were expecting to just do a quick catalog update.

Seems like an important project.

> Are there are other aspects of alter table that people would like to
> see exposed that I haven't thought of?

How about:

* Whether CLUSTER does a sort, or a full index scan.

* How many workers will be used by CREATE INDEX, if any.

(These are both examples of utility statements that make very limited
use of the optimizer -- CREATE TABLE AS SELECT is covered by explain
already, and is closer to a DML statement.)

* Whether an external sort will be used by CREATE INDEX or CLUSTER,
plus other details predicted by cost_sort().

These additional items are just nice-to-haves that I'm throwing out as
suggestions.

> I'm thinking I should try to move all these decisions to phase 1 as
> much as possible but I'm not sure how feasible it will be to get the
> results exactly correct. Of course the cases where it's hardest to
> predict are precisely where users would most like to know what's going
> to happen...

I'm not sure either, but I suspect that some compromise will be
required. For example, it's not at all clear whether or not a lock
will be acquired on TOAST tables in many instances. Also, the planner
itself can acquire ASLs on indexes in order to think about using them,
without execution going on to make use of them, which can matter (e.g.
there could be a REINDEX of an index that the planner isn't going to
use, that nonetheless necessitates that the planner wait).

What you're talking about here isn't really "EXPLAIN, but for DDL" so
much as a way of making EXPLAIN predict relation lock acquisitions,
including for DDL statements. I'm fine with calling that EXPLAIN, but
it's still not quite EXPLAIN as we know it today.

--
Peter Geoghegan

Reply | Threaded
Open this post in threaded view
|

Re: Thinking about EXPLAIN ALTER TABLE

John Naylor
In reply to this post by Greg Stark
On 12/7/18, Greg Stark <[hidden email]> wrote:
> I've been poking around with a feature I've wanted a number of times
> in the past, "EXPLAIN ALTER TABLE".

I believe I've seen your messages to that effect in the archives, so
I've had it in the back of my mind as well. I think it could be very
useful.

> 3. Whether a full table constraint validation is going to happen

Is it possible that this can occur via index-only scan and that this
feature could know that?

> For the most part ALTER TABLE is already structured such that this is
> pretty easy. It does a lot of preparatory work without doing catalog
> updates and I can just call that same preparatory work without calling
> the subsequent work phases.

One thing that came to mind: Since the input is not a plan tree, it
seems it would be more difficult to keep EXPLAIN in sync with
additional ALTER TABLE features. Do you have any thoughts about that?

> postgres***=# explain alter table t alter column i set not null;
> ┌─────────────────────────────────┐
> │           QUERY PLAN            │
> ├─────────────────────────────────┤
> │ Lock Level: AccessExclusiveLock │
> │ ALTER TABLE: t                  │
> │   Relation: t                   │
> │   Rewrite: none                 │
> │   Relation: t2                  │
> │   Rewrite: none                 │
> └─────────────────────────────────┘

In this example, I assume the two relations are partitions? With many
partitions, this could get unwieldy pretty fast. I imagine there could
be a shorthand notation.

-John Naylor
Reply | Threaded
Open this post in threaded view
|

Re: Thinking about EXPLAIN ALTER TABLE

Alvaro Herrera-9
In reply to this post by Greg Stark
Hi Greg

On 2018-Dec-07, Greg Stark wrote:

> I'm thinking I should try to move all these decisions to phase 1 as
> much as possible but I'm not sure how feasible it will be to get the
> results exactly correct. Of course the cases where it's hardest to
> predict are precisely where users would most like to know what's going
> to happen...

Maybe you can move some of these decisions to phase 1, but I'm not sure
it can be done for all of them.  Another possible plan is to add a flag
"dry run" so that phases 2/3 do whatever analysis they need to report
for your EXPLAIN, but not actually carry out their tasks.  (I see two
options to implement this, one is a global flag and the other is a new
argument to all those routines.)

> postgres***=# explain alter table t set unlogged;
> ┌─────────────────────────────────────┐
> │             QUERY PLAN              │
> ├─────────────────────────────────────┤
> │ Lock Level: AccessExclusiveLock     │
> │ ALTER TABLE: t                      │
> │   Relation: t                       │
> │   Rewrite: Due to ALTER PERSISTENCE │
> └─────────────────────────────────────┘

Note there's a relation scan that doesn't rewrite (to verify constraints
IIRC).  That's certainly worth reporting in some form.  Maybe instead of
"Rewrite:" use something like "Scan: read-only / rewrite due to ALTER
PERSISTENCE".

But ... not sure what you propose to print when a table rewrite is
caused by two subcommands, say change persistence at the same time as a
column datatype.  And what if you add a new constraint together with
those two?

> postgres***=# explain alter table t alter column i set not null;
> ┌─────────────────────────────────┐
> │           QUERY PLAN            │
> ├─────────────────────────────────┤
> │ Lock Level: AccessExclusiveLock │
> │ ALTER TABLE: t                  │
> │   Relation: t                   │
> │   Rewrite: none                 │
> │   Relation: t2                  │
> │   Rewrite: none                 │
> └─────────────────────────────────┘

I think putting the "Rewrite:" at the same indentation level as the
relation that it qualifies is confusing.  I'd do it this way:

 ┌─────────────────────────────────┐
 │           QUERY PLAN            │
 ├─────────────────────────────────┤
 │ Lock Level: AccessExclusiveLock │
 │ ALTER TABLE: t                  │
 │   Relation: t                   │
 │     Rewrite: none               │
 │   Relation: t2                  │
 │     Rewrite: none               │
 └─────────────────────────────────┘

Maybe make the output some legible form of YAML or JSON?

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: Thinking about EXPLAIN ALTER TABLE

Simon Riggs
On Mon, 10 Dec 2018 at 16:14, Alvaro Herrera <[hidden email]> wrote:
Hi Greg

On 2018-Dec-07, Greg Stark wrote:

> I'm thinking I should try to move all these decisions to phase 1 as
> much as possible but I'm not sure how feasible it will be to get the
> results exactly correct. Of course the cases where it's hardest to
> predict are precisely where users would most like to know what's going
> to happen...

Maybe you can move some of these decisions to phase 1, but I'm not sure
it can be done for all of them.  Another possible plan is to add a flag
"dry run" so that phases 2/3 do whatever analysis they need to report
for your EXPLAIN, but not actually carry out their tasks.  (I see two
options to implement this, one is a global flag and the other is a new
argument to all those routines.)

You need to take a table lock to find out things about the table.

EXPLAIN seems like the wrong place for this.

I suggest ALTER TABLE should respond to a parameter setting of ddl_dry_run = on, so the whole world doesn't need to rewrite its syntax to support the new option. 

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Reply | Threaded
Open this post in threaded view
|

Re: Thinking about EXPLAIN ALTER TABLE

Tom Lane-2
Simon Riggs <[hidden email]> writes:
> I suggest ALTER TABLE should respond to a parameter setting of ddl_dry_run
> = on, so the whole world doesn't need to rewrite its syntax to support the
> new option.

We were just busy shooting down a different suggestion of
behavior-changing GUCs.  A GUC that turns all ALTERs into no-ops
sure seems like a foot-gun to me.

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: Thinking about EXPLAIN ALTER TABLE

Simon Riggs
On Mon, 10 Dec 2018 at 16:32, Tom Lane <[hidden email]> wrote:
Simon Riggs <[hidden email]> writes:
> I suggest ALTER TABLE should respond to a parameter setting of ddl_dry_run
> = on, so the whole world doesn't need to rewrite its syntax to support the
> new option.

We were just busy shooting down a different suggestion of
behavior-changing GUCs.  A GUC that turns all ALTERs into no-ops
sure seems like a foot-gun to me.

How would you test a script? Manually edit each one with the new option? Manual editing is more of a foot gun.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Reply | Threaded
Open this post in threaded view
|

Re: Thinking about EXPLAIN ALTER TABLE

Tom Lane-2
Simon Riggs <[hidden email]> writes:
> On Mon, 10 Dec 2018 at 16:32, Tom Lane <[hidden email]> wrote:
>> We were just busy shooting down a different suggestion of
>> behavior-changing GUCs.  A GUC that turns all ALTERs into no-ops
>> sure seems like a foot-gun to me.

> How would you test a script? Manually edit each one with the new option?
> Manual editing is more of a foot gun.

I don't see how EXPLAIN ALTER TABLE would meaningfully be something
you use in a script.  If you have a script with many steps including
ALTER TABLEs, it's likely that each ALTER depends on the effects of
prior steps, and it's even more likely that the steps after an ALTER
depend on it having executed.

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: Thinking about EXPLAIN ALTER TABLE

Simon Riggs
On Mon, 10 Dec 2018 at 16:52, Tom Lane <[hidden email]> wrote:
Simon Riggs <[hidden email]> writes:
> On Mon, 10 Dec 2018 at 16:32, Tom Lane <[hidden email]> wrote:
>> We were just busy shooting down a different suggestion of
>> behavior-changing GUCs.  A GUC that turns all ALTERs into no-ops
>> sure seems like a foot-gun to me.

> How would you test a script? Manually edit each one with the new option?
> Manual editing is more of a foot gun.

I don't see how EXPLAIN ALTER TABLE would meaningfully be something
you use in a script.  If you have a script with many steps including
ALTER TABLEs, it's likely that each ALTER depends on the effects of
prior steps, and it's even more likely that the steps after an ALTER
depend on it having executed.

Agreed, but that's why I suggested an alternative.

Remembering, the best approach is one we have already taken.... but maybe we have forgotten.

An event trigger with a table_rewrite event, allows you to scan a whole script for objectionable activity on a test server before you put it into production.

Perhaps we just need a few extra events.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Reply | Threaded
Open this post in threaded view
|

Re: Thinking about EXPLAIN ALTER TABLE

Greg Stark
On Mon, 10 Dec 2018 at 12:27, Simon Riggs <[hidden email]> wrote:
>
> An event trigger with a table_rewrite event, allows you to scan a whole script for objectionable activity on a test server before you put it into production.
>
> Perhaps we just need a few extra events.

That's not a bad idea at all. Offhand the missing events that come to mind are:

1. Validation passes
1. Table Rebuild pass
1. Index builds
1. Locks that block DDL (currently that's AccessExclusiveLock and
ShareRowExclusiveLock)

I see this as complementary to EXPLAIN ALTER TABLE rather than
replacing it. EXPLAIN would help the developer understand what their
commands were doing and why during development and the event triggers
would serve largely as safety backstop in staging and production to
ensure nothing slips through.

--
greg

Reply | Threaded
Open this post in threaded view
|

Re: Thinking about EXPLAIN ALTER TABLE

Robert Haas
In reply to this post by Tom Lane-2
On Tue, Dec 11, 2018 at 1:32 AM Tom Lane <[hidden email]> wrote:
> Simon Riggs <[hidden email]> writes:
> > I suggest ALTER TABLE should respond to a parameter setting of ddl_dry_run
> > = on, so the whole world doesn't need to rewrite its syntax to support the
> > new option.
>
> We were just busy shooting down a different suggestion of
> behavior-changing GUCs.  A GUC that turns all ALTERs into no-ops
> sure seems like a foot-gun to me.

Yeah, I like EXPLAIN better.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Reply | Threaded
Open this post in threaded view
|

Re: Thinking about EXPLAIN ALTER TABLE

Michael Paquier-2
On Tue, Dec 11, 2018 at 11:25:12AM +0900, Robert Haas wrote:
> Yeah, I like EXPLAIN better.

+1.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Thinking about EXPLAIN ALTER TABLE

David Fetter
In reply to this post by Robert Haas
On Tue, Dec 11, 2018 at 11:25:12AM +0900, Robert Haas wrote:

> On Tue, Dec 11, 2018 at 1:32 AM Tom Lane <[hidden email]> wrote:
> > Simon Riggs <[hidden email]> writes:
> > > I suggest ALTER TABLE should respond to a parameter setting of ddl_dry_run
> > > = on, so the whole world doesn't need to rewrite its syntax to support the
> > > new option.
> >
> > We were just busy shooting down a different suggestion of
> > behavior-changing GUCs.  A GUC that turns all ALTERs into no-ops
> > sure seems like a foot-gun to me.
>
> Yeah, I like EXPLAIN better.

+1 for EXPLAIN

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Reply | Threaded
Open this post in threaded view
|

Re: Thinking about EXPLAIN ALTER TABLE

Jose Luis Tallon
>>> We were just busy shooting down a different suggestion of
>>> behavior-changing GUCs.  A GUC that turns all ALTERs into no-ops
>>> sure seems like a foot-gun to me.
>> Yeah, I like EXPLAIN better.
> +1 for EXPLAIN

IMVHO, and for "symmetry" with existing mechanisms:

* EXPLAIN ALTER TABLE    ==> "DDL dry run", but tell me what would be
done (similar to what EXPLAIN SELECT does)

* EXPLAIN PERFORM ALTER TABLE    (EXPLAIN EXEC?)    would explain + do

     ...and bonus points for explaining each step just before it is
performed. This way, It'd be easy for users to verify that a particular
step (i.e. table rewrite) is the one taking æons to run or hammering the
storage.

     Of course, regular "ALTER TABLE" stays as it is.


Just my .02€ :)

I'm not familiar with this part of the code and clearly can't devote
enough time to do it in the near future, but count on me to test it if
accepted.


Thanks,

     Jose



Reply | Threaded
Open this post in threaded view
|

Re: Thinking about EXPLAIN ALTER TABLE

Alvaro Herrera-9
On 2018-Dec-11, Jose Luis Tallon wrote:

> * EXPLAIN PERFORM ALTER TABLE    (EXPLAIN EXEC?)    would explain + do
>
>     ...and bonus points for explaining each step just before it is
> performed. This way, It'd be easy for users to verify that a particular step
> (i.e. table rewrite) is the one taking æons to run or hammering the storage.

We have a mechanism for reporting progress of DDL commands.  Currently
as far as I remember it's able to report VACUUM progress; there's a
patch in commitfest to add support for CLUSTER.  Rahila Syed in
https://www.pgcon.org/2017/schedule/attachments/472_Progress%20Measurement%20PostgreSQL.pdf
proposed ways to have it report for ALTER TABLE also, though I don't
think anyone has worked on that.

I think these two things are separate.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: Thinking about EXPLAIN ALTER TABLE

Greg Stark
In reply to this post by Alvaro Herrera-9
On Mon, 10 Dec 2018 at 11:14, Alvaro Herrera <[hidden email]> wrote:

>
> Hi Greg
>
> On 2018-Dec-07, Greg Stark wrote:
>
> > I'm thinking I should try to move all these decisions to phase 1 as
> > much as possible but I'm not sure how feasible it will be to get the
> > results exactly correct. Of course the cases where it's hardest to
> > predict are precisely where users would most like to know what's going
> > to happen...
>
> Maybe you can move some of these decisions to phase 1, but I'm not sure
> it can be done for all of them.

Indeed I've done some work on this and it will be impossible to get it
100% of the way there. At least not with a reasonable amount of work.
Personally I think it can be close enough to be useful but others may
have a different opinion...

A typically example, and I think the main realistic source of
problems, is the partitioning bound constraint checks. They're elided
if the new bound can be proven from the existing constraints. But that
requires recursing through all the subpartitions which currently only
happens in the final phase. Worse, I think -- though I'm not 100%
clear on this -- that these constraint proofs can't even be done
earlier in the process because the constraints can refer to column
names that don't necessarily exist until earlier catalog changes are
done.


Personally I'm fine saying here's the plan -- some steps may be
optimized away at run-time but the locking and table rewrites
described should be correct and most of the time the constraint
validations should be correct too.

> Another possible plan is to add a flag
> "dry run" so that phases 2/3 do whatever analysis they need to report
> for your EXPLAIN, but not actually carry out their tasks.

I don't think this helps any. Anything that could be done in a dry run
could just be duplicated work earlier. The bits that cannot are the
bits that depend on catalog changes that are done in the same command.
The ony way they could be done in a "dry run" would be doing the whole
operation in a not-dry-run in a transaction and then rolling it back.
But that would produce a lot of production impact that would defeat
the whole purpose of having the command at all.

--
greg