plpgsql variable assignment not supporting distinct anymore

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

plpgsql variable assignment not supporting distinct anymore

easteregg
hi,

no noticed after the assignment with union ( https://www.postgresql.org/message-id/flat/20210105201257.f0d76aff%40mail.verfriemelt.org ), that the assignment with distinct is broken aswell.



  DO $$
  DECLARE
    _test bool;
  BEGIN

    _test := DISTINCT a FROM ( VALUES ( (true), ( true ) ) )t(a);
   
  END $$;

i would argue, that thats a way more common usecase than the union, which was merely bad code.

tested with version 14~~devel~20210111.0540-1~299.gitce6a71f.pgdg110+1 from the apt repo

with kind redards,
richard


Reply | Threaded
Open this post in threaded view
|

Re: plpgsql variable assignment not supporting distinct anymore

Pavel Stehule


pá 22. 1. 2021 v 9:21 odesílatel <[hidden email]> napsal:
hi,

no noticed after the assignment with union ( https://www.postgresql.org/message-id/flat/20210105201257.f0d76aff%40mail.verfriemelt.org ), that the assignment with distinct is broken aswell.



  DO $$
  DECLARE
    _test bool;
  BEGIN

    _test := DISTINCT a FROM ( VALUES ( (true), ( true ) ) )t(a);

  END $$;

i would argue, that thats a way more common usecase than the union, which was merely bad code.

What is the sense of this code?

This is strange with not well defined behavior (in dependency on data type the result can depend on collate).

More - because this breaks simple expression optimization (10x), then the code will be significantly slower, than you use IF or CASE.

Regards

Pavel


tested with version 14~~devel~20210111.0540-1~299.gitce6a71f.pgdg110+1 from the apt repo

with kind redards,
richard
Reply | Threaded
Open this post in threaded view
|

Re: plpgsql variable assignment not supporting distinct anymore

easteregg
In reply to this post by easteregg
the code provided is just a little poc to get the error ( which i have not included with my first mail sorry. )

   ERROR:  syntax error at or near "DISTINCT"
   LINE 8:     _test := DISTINCT a FROM ( VALUES ( (true), ( true ) ) )...


the code in production looked like this:


    _resource_id :=
        DISTINCT ti_resource_id
           FROM tabk.resource_timeline
          WHERE ti_a2_id = _ab2_id
            AND ti_type = 'task'
    ;

this is backed up by a trigger function, that will ensure to every instance with the same ti_a2_id exists only one ti_resource_id, hence the query can never fail due to more than one row beeing returned. but this syntax is not supported anymore, which will break BC. up until PG 13, the assignment statement was just an implizit SELECT <expression> Query.
Since Tom Lane didn't mentioned this change in the other thread, i figured the devteam might not be aware of this chance.

i can refactor this line into

    _resource_id :=
        ti_resource_id
       FROM tabk.resource_timeline
      WHERE ti_a2_id = _ab2_id
        AND ti_type = 'task'
      GROUP BY ti_resource_id
    ;

but concerns about BC was already raised, although with UNION there might be far less people affected.
with kind regards, richard


Reply | Threaded
Open this post in threaded view
|

Re: plpgsql variable assignment not supporting distinct anymore

Pavel Stehule


pá 22. 1. 2021 v 14:41 odesílatel <[hidden email]> napsal:
the code provided is just a little poc to get the error ( which i have not included with my first mail sorry. )

   ERROR:  syntax error at or near "DISTINCT"
   LINE 8:     _test := DISTINCT a FROM ( VALUES ( (true), ( true ) ) )...


the code in production looked like this:


    _resource_id :=
        DISTINCT ti_resource_id
           FROM tabk.resource_timeline
          WHERE ti_a2_id = _ab2_id
            AND ti_type = 'task'
    ;

this is backed up by a trigger function, that will ensure to every instance with the same ti_a2_id exists only one ti_resource_id, hence the query can never fail due to more than one row beeing returned. but this syntax is not supported anymore, which will break BC. up until PG 13, the assignment statement was just an implizit SELECT <expression> Query.
Since Tom Lane didn't mentioned this change in the other thread, i figured the devteam might not be aware of this chance.

i can refactor this line into

    _resource_id :=
        ti_resource_id
       FROM tabk.resource_timeline
      WHERE ti_a2_id = _ab2_id
        AND ti_type = 'task'
      GROUP BY ti_resource_id
    ;

but concerns about BC was already raised, although with UNION there might be far less people affected.
with kind regards, richard

Probably the fix is not hard, but it is almost the same situation as the UNION case. The result of your code is not deterministic

If there are more different ti_resource_id then some values can be randomly ignored - when hash agg is used.

The safe fix should be

_resource_id := (SELECT ti_resource_id
       FROM tabk.resource_timeline
      WHERE ti_a2_id = _ab2_id
        AND ti_type = 'task');

and you get an exception if some values are ignored. Or if you want to ignore some values, then you can write

_resource_id := (SELECT MIN(ti_resource_id) -- or MAX
       FROM tabk.resource_timeline
      WHERE ti_a2_id = _ab2_id
        AND ti_type = 'task');

Using DISTINCT is not a good solution.




Reply | Threaded
Open this post in threaded view
|

Re: plpgsql variable assignment not supporting distinct anymore

easteregg
In reply to this post by easteregg
> Probably the fix is not hard, but it is almost the same situation as the
> UNION case. The result of your code is not deterministic
>
> If there are more different ti_resource_id then some values can be randomly
> ignored - when hash agg is used.
>
> The safe fix should be
>
> _resource_id := (SELECT ti_resource_id
>        FROM tabk.resource_timeline
>       WHERE ti_a2_id = _ab2_id
>         AND ti_type = 'task');
>
> and you get an exception if some values are ignored. Or if you want to
> ignore some values, then you can write
>
> _resource_id := (SELECT MIN(ti_resource_id) -- or MAX
>        FROM tabk.resource_timeline
>       WHERE ti_a2_id = _ab2_id
>         AND ti_type = 'task');
>
> Using DISTINCT is not a good solution.
>

in my usecase it was perfectly fine, because there is a constraint ensuring that here can never be more than on ti_resource_id at any given time for a given _ab2_id.
also, whenever there would be more data ( for example if the constraint trigger would have a bug ) you will get an error like this:


  create table a ( t int );
  insert into a values (1),(2);

  do $$
  declare _t int;
  begin
    _t := distinct t from a;
  end $$;

  Query failed: ERROR:  query "SELECT distinct t from a" returned more than one row
  CONTEXT:  PL/pgSQL function inline_code_block line 4 at assignment

no doubt, that this piece of code might not look optimal at first glance, but i like my code to fail fast. because with the min() approach, you will not notice, that the constraint trigger is not doing its job, until you get other strange sideeffects down the road.

richard


Reply | Threaded
Open this post in threaded view
|

Re: plpgsql variable assignment not supporting distinct anymore

Pavel Stehule


pá 22. 1. 2021 v 15:10 odesílatel <[hidden email]> napsal:
> Probably the fix is not hard, but it is almost the same situation as the
> UNION case. The result of your code is not deterministic
>
> If there are more different ti_resource_id then some values can be randomly
> ignored - when hash agg is used.
>
> The safe fix should be
>
> _resource_id := (SELECT ti_resource_id
>        FROM tabk.resource_timeline
>       WHERE ti_a2_id = _ab2_id
>         AND ti_type = 'task');
>
> and you get an exception if some values are ignored. Or if you want to
> ignore some values, then you can write
>
> _resource_id := (SELECT MIN(ti_resource_id) -- or MAX
>        FROM tabk.resource_timeline
>       WHERE ti_a2_id = _ab2_id
>         AND ti_type = 'task');
>
> Using DISTINCT is not a good solution.
>

in my usecase it was perfectly fine, because there is a constraint ensuring that here can never be more than on ti_resource_id at any given time for a given _ab2_id.
also, whenever there would be more data ( for example if the constraint trigger would have a bug ) you will get an error like this:


  create table a ( t int );
  insert into a values (1),(2);

  do $$
  declare _t int;
  begin
    _t := distinct t from a;
  end $$;

  Query failed: ERROR:  query "SELECT distinct t from a" returned more than one row
  CONTEXT:  PL/pgSQL function inline_code_block line 4 at assignment

no doubt, that this piece of code might not look optimal at first glance, but i like my code to fail fast. because with the min() approach, you will not notice, that the constraint trigger is not doing its job, until you get other strange sideeffects down the road.

ok

then you don't need to use group by or DISTINCT

just use

_t := (SELECT ...);

The performance will be same and less obfuscate and you will not use undocumented feature

Regards

Pavel




richard
Reply | Threaded
Open this post in threaded view
|

Re: plpgsql variable assignment not supporting distinct anymore

Tom Lane-2
In reply to this post by Pavel Stehule
Pavel Stehule <[hidden email]> writes:
> pá 22. 1. 2021 v 14:41 odesílatel <[hidden email]> napsal:
>> ERROR:  syntax error at or near "DISTINCT"
>> LINE 8:     _test := DISTINCT a FROM ( VALUES ( (true), ( true ) ) )...

> Using DISTINCT is not a good solution.

Yeah.  It wouldn't be as painful to support this in the grammar
as it would be for UNION et al, so maybe we should just do it.
But I still find this to be mighty ugly plpgsql code.

                        regards, tom lane