Better documentation for row_number() combined with set returning functions

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Better documentation for row_number() combined with set returning functions

Erik Tews
Hi

The current documentation of Postgresql sounds like row_number() over
() can be used to number the rows returned by postgres. However, that
doesn't work when the query also uses set returning functions such as
json_array_elements. In this case, row_number() will be the same for
every element of the set.

I suggest to add a paragraph to the description of row_number() that
states that this is the behaviour and maybe also reference the "with
ordinality" feature that can be used instead.

Erik

signature.asc (499 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Better documentation for row_number() combined with set returning functions

Bruce Momjian
On Tue, Feb 25, 2020 at 08:24:13PM +0100, Erik Tews wrote:
> Hi
>
> The current documentation of Postgresql sounds like row_number() over
> () can be used to number the rows returned by postgres. However, that

Yes, row_number() is a member of the set of window functions.

> doesn't work when the query also uses set returning functions such as
> json_array_elements. In this case, row_number() will be the same for
> every element of the set.

Yes, the issue is that a set-returning function in the target list
generates multiple rows while the other target list values are
duplicated, e.g.:

        CREATE TABLE test (x int);
       
        INSERT INTO test VALUES (1), (2), (3);
       
        SELECT x, generate_series(4,6) AS y, row_number() OVER () AS z FROM
        test;
         x | y | z
        ---+---+---
         1 | 4 | 1
         1 | 5 | 1
         1 | 6 | 1
         2 | 4 | 2
         2 | 5 | 2
         2 | 6 | 2
         3 | 4 | 3
         3 | 5 | 3
         3 | 6 | 3

Notice the x=1 value is duplicated for the y values of 4,5,6, and the
row_number is duplicated too.

One way to avoid that is to do the expansion of the set-returning
function in a WITH query, and then apply row_number():

        WITH z AS (select x, generate_series(4,6) as y from test)
        select x, y, row_number() OVER () FROM z;
         x | y | row_number
        ---+---+------------
         1 | 4 |          1
         1 | 5 |          2
         1 | 6 |          3
         2 | 4 |          4
         2 | 5 |          5
         2 | 6 |          6
         3 | 4 |          7
         3 | 5 |          8
         3 | 6 |          9

> I suggest to add a paragraph to the description of row_number() that
> states that this is the behavior and maybe also reference the "with
> ordinality" feature that can be used instead.

I can't see how ordinality could be used:

        SELECT x, z, ord FROM test, generate_series(4,6) WITH ORDINALITY AS a(z, ord);
         x | z | ord
        ---+---+-----
         1 | 4 |   1
         2 | 4 |   1
         3 | 4 |   1
         1 | 5 |   2
         2 | 5 |   2
         3 | 5 |   2
         1 | 6 |   3
         2 | 6 |   3
         3 | 6 |   3

--
  Bruce Momjian  <[hidden email]>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +