Bug with indexes on whole-row expressions

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Bug with indexes on whole-row expressions

Laurenz Albe
CREATE TABLE boom (a integer, b integer);

-- index on whole-row expression
CREATE UNIQUE INDEX ON boom ((boom));

INSERT INTO boom VALUES
   (1, 2),
   (1, 3);

ALTER TABLE boom DROP b;

TABLE boom;

 a
---
 1
 1
(2 rows)

REINDEX TABLE boom;
ERROR:  could not create unique index "boom_boom_idx"
DETAIL:  Key ((boom.*))=((1)) is duplicated.

The problem here is that there *is* a "pg_depend" entry for the
index, but it only depends on the whole table, not on specific columns.

I have been thinking what would be the right approach to fix this:

1. Don't fix it, because it is an artificial corner case.
   (But I can imagine someone trying to exclude duplicate rows with
   a unique index.)

2. Add code that checks if there is an index with a whole-row reference
   in the definition before dropping a column.
   That feels like a wart for a special case.

3. Forbid indexes on whole-row expressions.
   After all, you can do the same with an index on all the columns.
   That would open the question what to do about upgrading old databases
   that might have such indexes today.

4. Add dependencies on all columns whenever a whole-row expression
   is used in an index.
   That would need special processing for pg_upgrade.

I'd like to hear your opinions.

Yours,
Laurenz Albe



Reply | Threaded
Open this post in threaded view
|

Re: Bug with indexes on whole-row expressions

Ashutosh Bapat-2
On Mon, Jun 29, 2020 at 3:43 PM Laurenz Albe <[hidden email]> wrote:

>
> CREATE TABLE boom (a integer, b integer);
>
> -- index on whole-row expression
> CREATE UNIQUE INDEX ON boom ((boom));
>
> INSERT INTO boom VALUES
>    (1, 2),
>    (1, 3);
>
> ALTER TABLE boom DROP b;
>
> TABLE boom;
>
>  a
> ---
>  1
>  1
> (2 rows)
>
> REINDEX TABLE boom;
> ERROR:  could not create unique index "boom_boom_idx"
> DETAIL:  Key ((boom.*))=((1)) is duplicated.
>
> The problem here is that there *is* a "pg_depend" entry for the
> index, but it only depends on the whole table, not on specific columns.
>
> I have been thinking what would be the right approach to fix this:
>
> 1. Don't fix it, because it is an artificial corner case.
>    (But I can imagine someone trying to exclude duplicate rows with
>    a unique index.)
>
> 2. Add code that checks if there is an index with a whole-row reference
>    in the definition before dropping a column.
>    That feels like a wart for a special case.

Do we need to do something about adding a new column or modifying an
existing one. Esp. if the later changes the uniqueness of row
expression.
>
> 3. Forbid indexes on whole-row expressions.
>    After all, you can do the same with an index on all the columns.
>    That would open the question what to do about upgrading old databases
>    that might have such indexes today.

This would be the best case. However, a whole row expression is not
necessarily the same as "all columns" at that moment. A whole row
expression means whatever column there are at any given point in time.
So creating an index on whole row expression is not the same as an
index on all the columns. However, I can't imagine a case where we
want an index on "whole row expression". An index containing all the
columns would always suffice and will be deterministic as well.

So this option looks best.

>
> 4. Add dependencies on all columns whenever a whole-row expression
>    is used in an index.
>    That would need special processing for pg_upgrade.

Again, that dependency needs to be maintained as and when the columns
are added and dropped.


--
Best Wishes,
Ashutosh Bapat