using index or check in ALTER TABLE SET NOT NULL

classic Classic list List threaded Threaded
71 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

using index or check in ALTER TABLE SET NOT NULL

Sergei Kornilov
Hello
I write patch to speed up ALTER TABLE SET NOT NULL by check existed check constraints or indexes. Huge phase 3 with verify table data will be skipped if table has valid check constraint cover "alteredfield IS NOT NULL" condition or by SPI query if found index with compatible condition or regular amsearchnulls index on processed field.

Patch based on current master branch, i believe it has no platform-dependent code, of course code compiled and pass tests locally.
Tell me please, what i forgot or make incorrectly.

Implementation notes:
I use existed PartConstraintImpliedByRelConstraint method to check relation constraints. But i rename original method to static ConstraintImpliedByRelConstraint (because method now used not only in partitions) and leave PartConstraintImpliedByRelConstraint as proxy to not change public API.
I found it difficult to do index scan and choose index with lower costs if found many suitable indexes. Is it acceptable to use SPI here?

Related archive discussions:
https://www.postgresql.org/message-id/flat/530C10CF.4020101%40strangersgate.com
https://www.postgresql.org/message-id/flat/CAASwCXdAK55BzuOy_FtYj2zQWg26PriDKL5pRoWiyFJe0eg-Hg%40mail.gmail.com

Thanks!

alter_table_set_not_null_by_index_or_check_v1.patch (10K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: using index or check in ALTER TABLE SET NOT NULL

Robert Haas
On Tue, Nov 28, 2017 at 1:59 PM, Sergei Kornilov <[hidden email]> wrote:
> I write patch to speed up ALTER TABLE SET NOT NULL by check existed check constraints or indexes. Huge phase 3 with verify table data will be skipped if table has valid check constraint cover "alteredfield IS NOT NULL" condition or by SPI query if found index with compatible condition or regular amsearchnulls index on processed field.

Doing this based on the existence of a valid constraint which implies
that no nulls can be present seems like a good idea.  Doing it based
on an index scan doesn't necessarily seem like a good idea.  We have
no guarantee at all that the index scan will be faster than scanning
the table would have been, and a single table scan can do multiple
verification steps if, for example, multiple columns are set NOT NULL
at the same time.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Reply | Threaded
Open this post in threaded view
|

Re: using index or check in ALTER TABLE SET NOT NULL

Stephen Frost
Robert,

* Robert Haas ([hidden email]) wrote:

> On Tue, Nov 28, 2017 at 1:59 PM, Sergei Kornilov <[hidden email]> wrote:
> > I write patch to speed up ALTER TABLE SET NOT NULL by check existed check constraints or indexes. Huge phase 3 with verify table data will be skipped if table has valid check constraint cover "alteredfield IS NOT NULL" condition or by SPI query if found index with compatible condition or regular amsearchnulls index on processed field.
>
> Doing this based on the existence of a valid constraint which implies
> that no nulls can be present seems like a good idea.  Doing it based
> on an index scan doesn't necessarily seem like a good idea.  We have
> no guarantee at all that the index scan will be faster than scanning
> the table would have been, and a single table scan can do multiple
> verification steps if, for example, multiple columns are set NOT NULL
> at the same time.
Isn't the first concern addressed by using SPI..?

As for the second concern, couldn't that be done with a more complicated
query through SPI, though things might have to be restructured some to
make it possible to do that.

Just, generally speaking, this is definitely something that I think we
want and neither of the above concerns seem like they're technical
reasons why we can't use something like this approach, just needs to
perhaps be reworked to handle multiple columns in a single query.

Thanks!

Stephen

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: using index or check in ALTER TABLE SET NOT NULL

Tom Lane-2
Stephen Frost <[hidden email]> writes:
> Isn't the first concern addressed by using SPI..?

I did not look at the patch yet, but TBH if it uses SPI for sub-operations
of ALTER TABLE I think that is sufficient reason to reject it out of hand.
Doing things that way would create way too much of a vulnerability surface
for code touching a partially-updated table.  At minimum, we'd have to
blow holes in existing protections like CheckTableNotInUse, and I think
we'd be forever finding other stuff that failed to work quite right in
that context.  I do not want ALTER TABLE going anywhere near the planner
or executor; I'm not even happy that it uses the parser (for index
definition reconstruction).

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: using index or check in ALTER TABLE SET NOT NULL

Sergei Kornilov
Thanks all for reply!

Robert,

> Doing it based on an index scan doesn't necessarily seem like a good idea. We have
> no guarantee at all that the index scan will be faster than scanning
> the table would have been

I agree this. Thinking a little about idea of index scan i can not give reasonable usecase which required index. My target problem of adding NOT NULL to big relation without long downtime can be done with ADD CONSTRAINT NOT VALID, VALIDATE it in second transaction, then SET NOT NULL by my patch and drop unneeded constraint.

Stephen,

> Just, generally speaking, this is definitely something that I think we
> want and neither of the above concerns seem like they're technical
> reasons why we can't use something like this approach, just needs to
> perhaps be reworked to handle multiple columns in a single query.

I understood the idea, thank you.

Tom,

> I did not look at the patch yet, but TBH if it uses SPI for sub-operations
> of ALTER TABLE I think that is sufficient reason to reject it out of hand.

I understood, thank you.

So, i will soon delete SPI usage and index scan and post new simplified patch with verify data only by constraints.

Reply | Threaded
Open this post in threaded view
|

Re: using index or check in ALTER TABLE SET NOT NULL

Stephen Frost
In reply to this post by Tom Lane-2
Tom,

* Tom Lane ([hidden email]) wrote:
> Stephen Frost <[hidden email]> writes:
> > Isn't the first concern addressed by using SPI..?
>
> I did not look at the patch yet, but TBH if it uses SPI for sub-operations
> of ALTER TABLE I think that is sufficient reason to reject it out of hand.

You mean like what ALTER TABLE ... ADD FOREIGN KEY does?

> Doing things that way would create way too much of a vulnerability surface
> for code touching a partially-updated table.  At minimum, we'd have to
> blow holes in existing protections like CheckTableNotInUse, and I think
> we'd be forever finding other stuff that failed to work quite right in
> that context.  I do not want ALTER TABLE going anywhere near the planner
> or executor; I'm not even happy that it uses the parser (for index
> definition reconstruction).

That's more along the lines of the kind of response I was expecting
given the suggestion, and perhaps a good reason to just go with the
index-based lookup, when an index is available to do so with, but I'm
not entirely sure how this is different from how we handle foreign keys.

Thanks!

Stephen

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: using index or check in ALTER TABLE SET NOT NULL

Robert Haas
In reply to this post by Sergei Kornilov
On Wed, Nov 29, 2017 at 10:52 AM, Sergei Kornilov <[hidden email]> wrote:
> I agree this. Thinking a little about idea of index scan i can not give reasonable usecase which required index. My target problem of adding NOT NULL to big relation without long downtime can be done with ADD CONSTRAINT NOT VALID, VALIDATE it in second transaction, then SET NOT NULL by my patch and drop unneeded constraint.

Yes, I had the same thought.

Thinking a little bit further, maybe the idea you had in mind using
the index scan was that some indexes offer cheap ways of testing
whether ANY nulls are present in the column.  For example, if we had a
non-partial btree index whose first column is the one being made NOT
NULL, we could try an index scan - via index_beginscan() /
index_getnext() / index_endscan() - trying to pull exactly one null
from the index, and judge whether or not there are nulls present based
only whether we get one.  This would be a lot cheaper than scanning a
large table, but it needs some careful thought because of visibility
issues.  It's not sufficient that the index contains no nulls that are
visible to our snapshot; it must contain no nulls that are visible to
any plausible current or future snapshot.  I doubt that test can be
written in SQL, but it can probably be written in C.  Moreover, we
need to avoid not only false negatives (thinking that there is no NULL
when there is one) but also false positives (thinking there's a NULL
in the column when there isn't, and thus failing spuriously).  But it
seems like it might be useful if someone can figure out the details of
how to make it 100% correct; one index lookup is sure to be a lot
quicker than a full table scan.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Reply | Threaded
Open this post in threaded view
|

Re: using index or check in ALTER TABLE SET NOT NULL

Tom Lane-2
In reply to this post by Stephen Frost
Stephen Frost <[hidden email]> writes:
> * Tom Lane ([hidden email]) wrote:
>> I did not look at the patch yet, but TBH if it uses SPI for sub-operations
>> of ALTER TABLE I think that is sufficient reason to reject it out of hand.

> You mean like what ALTER TABLE ... ADD FOREIGN KEY does?

Yeah, and if you look at the warts that SPI has grown to support that
usage, you'll see why I'm so unhappy.  We should never have allowed
FKs to be built on top of SPI; they require semantics that don't exist
in SQL.  I think this would lead to more of the same --- not exactly
the same of course, but more warts.  See Robert's nearby musings about
semantics of index null checks for an example.

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: using index or check in ALTER TABLE SET NOT NULL

Stephen Frost
In reply to this post by Robert Haas
Robert,

* Robert Haas ([hidden email]) wrote:
> On Wed, Nov 29, 2017 at 10:52 AM, Sergei Kornilov <[hidden email]> wrote:
> > I agree this. Thinking a little about idea of index scan i can not give reasonable usecase which required index. My target problem of adding NOT NULL to big relation without long downtime can be done with ADD CONSTRAINT NOT VALID, VALIDATE it in second transaction, then SET NOT NULL by my patch and drop unneeded constraint.
>
> Yes, I had the same thought.

I have to admit that the case I was thinking about was the one you
outline below..

> Thinking a little bit further, maybe the idea you had in mind using
> the index scan was that some indexes offer cheap ways of testing
> whether ANY nulls are present in the column.  For example, if we had a
> non-partial btree index whose first column is the one being made NOT
> NULL, we could try an index scan - via index_beginscan() /
> index_getnext() / index_endscan() - trying to pull exactly one null
> from the index, and judge whether or not there are nulls present based
> only whether we get one.  This would be a lot cheaper than scanning a
> large table, but it needs some careful thought because of visibility
> issues.  It's not sufficient that the index contains no nulls that are
> visible to our snapshot; it must contain no nulls that are visible to
> any plausible current or future snapshot.  I doubt that test can be
> written in SQL, but it can probably be written in C.  Moreover, we
> need to avoid not only false negatives (thinking that there is no NULL
> when there is one) but also false positives (thinking there's a NULL
> in the column when there isn't, and thus failing spuriously).  But it
> seems like it might be useful if someone can figure out the details of
> how to make it 100% correct; one index lookup is sure to be a lot
> quicker than a full table scan.
This was along the lines I was thinking also (though I had thought SPI
might be reasonable given our usage of it with FKs, but if it'd require
uglier warts than what SPI already has, then just finding an
appropriate index and using the internal methods would be best).

As for conflicting snapshots, isn't the lock we're taking already
AccessExclusive..?

Thanks!

Stephen

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: using index or check in ALTER TABLE SET NOT NULL

Sergei Kornilov
In reply to this post by Sergei Kornilov
Here is new patch with check only existed valid constraints and without SPI at all.

Thanks

alter_table_set_not_null_by_constraints_only_v2.patch (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: using index or check in ALTER TABLE SET NOT NULL

Andres Freund
In reply to this post by Stephen Frost


On November 29, 2017 8:50:31 AM PST, Stephen Frost <[hidden email]> wrote:
>As for conflicting snapshots, isn't the lock we're taking already
>AccessExclusive..?

Doesn't help if e.g. the current xact is repeatable read or if your own xact deleted things (other xacts with snapshots could still see null rows, despite new take definition).

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Reply | Threaded
Open this post in threaded view
|

Re: using index or check in ALTER TABLE SET NOT NULL

Robert Haas
In reply to this post by Sergei Kornilov
On Wed, Nov 29, 2017 at 11:53 AM, Sergei Kornilov <[hidden email]> wrote:
> Here is new patch with check only existed valid constraints and without SPI at all.

Please use pgindent or anyway try to follow project style.  { needs to
go on a line by itself, for example.

isSetNotNullNeedsTableScan seems different than other similarly-named
functions we have.  NotNullImpliedByRelConstraints?

It also lacks a header comment.

Having both PartConstraintImpliedByRelConstraint and
ConstraintImpliedByRelConstraint with identical implementations
doesn't seem like a good idea to me.  I guess just renaming it is
probably fine.

The comment /* T if we added new NOT NULL constraints */ should
probably be changed to /* T if we should recheck NOT NULL constraints
*/ or similar.

I'd suggest registering this with the next CommitFest.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Reply | Threaded
Open this post in threaded view
|

Re: using index or check in ALTER TABLE SET NOT NULL

Sergei Kornilov
Thank you for review! My apologies for my errors. It seems i read developers wiki pages not enough carefully. I will reread wiki, code style and then update patch with all your remarks.

> The comment /* T if we added new NOT NULL constraints */ should
> probably be changed to /* T if we should recheck NOT NULL constraints
> */ or similar.
Correct. Hm, probably I will also rename this property to verify_new_notnull for clarity

> I'd suggest registering this with the next CommitFest.
Already done in https://commitfest.postgresql.org/16/1389/ , i noticed this step in wiki

Reply | Threaded
Open this post in threaded view
|

Re: using index or check in ALTER TABLE SET NOT NULL

Sergei Kornilov
In reply to this post by Robert Haas
Hello
I update patch and also rebase to current head. I hope now it is better aligned with project style

alter_table_set_not_null_by_constraints_only_v3.patch (10K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: using index or check in ALTER TABLE SET NOT NULL

Stephen Frost
Greetings Sergei,

* Sergei Kornilov ([hidden email]) wrote:
> I update patch and also rebase to current head. I hope now it is better aligned with project style

Thanks for updating it and continuing to work on it.  I haven't done a
full review but there were a few things below that I thought could be
improved-

> @@ -5863,8 +5864,11 @@ ATExecSetNotNull(AlteredTableInfo *tab, Relation rel,
>  
>   CatalogTupleUpdate(attr_rel, &tuple->t_self, tuple);
>  
> - /* Tell Phase 3 it needs to test the constraint */
> - tab->new_notnull = true;
> + if (!NotNullImpliedByRelConstraints(rel, (Form_pg_attribute) GETSTRUCT(tuple)))
> + {
> + /* Tell Phase 3 it needs to test the constraint */
> + tab->verify_new_notnull = true;
> + }
>  
>   ObjectAddressSubSet(address, RelationRelationId,
>   RelationGetRelid(rel), attnum);
This could really use some additional comments, imv.  In particular, we
need to make it clear that verify_new_notnull only moves from the
initial 'false' value to 'true', since we could be asked to add multiple
NOT NULL constraints and if any of them aren't already covered by an
existing CHECK constraint then we need to perform the full table check.

> @@ -13618,15 +13662,14 @@ ComputePartitionAttrs(Relation rel, List *partParams, AttrNumber *partattrs,
>  }
>  
>  /*
> - * PartConstraintImpliedByRelConstraint
> - * Does scanrel's existing constraints imply the partition constraint?
> + * ConstraintImpliedByRelConstraint
> + * Does scanrel's existing constraints imply given constraint
>   *
>   * Existing constraints includes its check constraints and column-level
>   * NOT NULL constraints and partConstraint describes the partition constraint.
>   */
>  bool
> -PartConstraintImpliedByRelConstraint(Relation scanrel,
> - List *partConstraint)
> +ConstraintImpliedByRelConstraint(Relation scanrel, List *partConstraint)
>  {
>   List   *existConstraint = NIL;
>   TupleConstr *constr = RelationGetDescr(scanrel)->constr;
I would also rename 'partConstraint' since this function is no longer,
necessairly, working with a partition's constraint.

This could also really use some regression tests and I'd be sure to
include tests of adding multiple NOT NULL constraints, sometimes where
they're all covered by existing CHECK constraints and other times when
only one or the other is (and there's existing NULL values in the other
column), etc.

Thanks!

Stephen

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: using index or check in ALTER TABLE SET NOT NULL

Sergei Kornilov
Hello Stephen

I changed the suggested things and added some regression tests. Also i wrote few words to the documentation. New patch is attached.

Thank you for feedback!
regards, Sergei

alter_table_set_not_null_by_constraints_only_v4.patch (18K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: using index or check in ALTER TABLE SET NOT NULL

Ildar Musin
Hello Sergei,

I couldn't find any case when your code doesn't work properly. So it
seems ok to me.

> @@ -220,6 +220,13 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
>       </para>
>
>       <para>
> +      Full table scan is performed to check that no existing row
> +      in the table has null values in given column.  It is possible to avoid
> +      this scan by adding a valid <literal>CHECK</literal> constraint to
> +      the table that would allow only NOT NULL values for given column.
> +     </para>

Adding check constraint will also force the full table scan. So I think
it would be better to rephrased it as follows:

"Full table scan is performed to check that column doesn't contain NULL
values unless there are valid check constraints that prohibit NULL
values for specified column. In the latter case table scan is skipped."

A native English speaker input would be helpful here.

Regarding regression tests it may be useful to set client_min_messages
to 'debug1' before setting "NOT NULL" attribute for a column. In this
case you can tell for sure that NotNullImpliedByRelConstraints()
returned true (i.e. your code actually did the job) as the special debug
message is printed to the log.

Thanks!

--
Ildar Musin
[hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: using index or check in ALTER TABLE SET NOT NULL

Sergei Kornilov
Hello
thank you for review!

> Adding check constraint will also force the full table scan. So I think
> it would be better to rephrased it as follows:
Agree. I updated docs in new attached patch slightly different

> Regarding regression tests it may be useful to set client_min_messages
> to 'debug1' before setting "NOT NULL" attribute for a column. In this
> case you can tell for sure that NotNullImpliedByRelConstraints()
> returned true (i.e. your code actually did the job) as the special debug
> message is printed to the log.
I can not find any debug messages in tests: grep client_min_messages -irn src/test/ Only some warning level and few error. So i verify in regression tests only top-level behavior.
Or this paragraph was for other people, not for tests?

regards, Sergei

alter_table_set_not_null_by_constraints_only_v5.patch (18K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: using index or check in ALTER TABLE SET NOT NULL

Ildar Musin


On 06.03.2018 16:12, Sergei Kornilov wrote:

> Hello thank you for review!
>
>> Adding check constraint will also force the full table scan. So I
>> think it would be better to rephrased it as follows:
> Agree. I updated docs in new attached patch slightly different
>
>> Regarding regression tests it may be useful to set
>> client_min_messages to 'debug1' before setting "NOT NULL" attribute
>> for a column. In this case you can tell for sure that
>> NotNullImpliedByRelConstraints() returned true (i.e. your code
>> actually did the job) as the special debug message is printed to
>> the log.
> I can not find any debug messages in tests: grep client_min_messages
> -irn src/test/ Only some warning level and few error. So i verify in
> regression tests only top-level behavior. Or this paragraph was for
> other people, not for tests?
>

Sorry, probably I didn't explain it clearly enough. I meant that your
regression tests can't distinguish cases when the full table scan was
actually performed from the ones when it was skipped due to
NotNullImpliedByRelConstraints() check. For instance, consider this
piece from the test suite:


# create table atacc1 (test_a int, test_b int);
CREATE TABLE
...

# alter table atacc1 add constraint atacc1_constr_a_valid check(test_a
is not null);
ALTER TABLE

# alter table atacc1 alter test_a set not null;
ALTER TABLE


It is not obvious that full table scan was omitted. But if we set
client_min_messages to 'debug1', we'll be able to see what is really
happened by the different debug messages. For example:


# create table atacc1 (test_a int, test_b int);
CREATE TABLE
...

# set client_min_messages to 'debug1';
SET

# alter table atacc1 alter test_a set not null;
DEBUG:  verifying table "atacc1"   <<<< full table scan!
ALTER TABLE

# alter table atacc1 alter test_a drop not null;
ALTER TABLE

# alter table atacc1 add constraint atacc1_constr_a_valid check(test_a
is not null);
DEBUG:  verifying table "atacc1"
ALTER TABLE

# alter table atacc1 alter test_a set not null;
DEBUG:  verifying table "atacc1" NOT NULL constraint on test_a attribute
by existed constraints   <<<< full scan was skipped!
ALTER TABLE


--
Ildar Musin
[hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: using index or check in ALTER TABLE SET NOT NULL

Sergei Kornilov
Hello, Ildar
Thanks. I looked ATTACH PARTITION tests and found such checks. If no one is against i will just use in my patch INFO level instead DEBUG1, similarly ATTACH PARTITION code. Updated patch attached.

Or i can rewrite tests to use DEBUG1 level.

regards, Sergei

alter_table_set_not_null_by_constraints_only_v6.patch (18K) Download Attachment
1234