FETCH FIRST clause PERCENT option

classic Classic list List threaded Threaded
43 messages Options
123
Reply | Threaded
Open this post in threaded view
|

FETCH FIRST clause PERCENT option

Surafel Temesgen

FETCH FIRST with PERCENT option is SQL standard that state limit count to be specified in a percentage in addition to specify it in exact count and listed as one of Major features simply not implemented yet in recent wiki page [1].

I implemented it by executing the query plan twice. One without limit filter to find the total number of row that will be feed to limit node so the exact limit count can be calculated and the query plan execute again with limit filter with newly calculated exact count .


[1].https://wiki.postgresql.org/wiki/PostgreSQL_vs_SQL_Standard


Regards

Surafel



fetch-wth-percent-v1.patch (31K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: FETCH FIRST clause PERCENT option

Andres Freund
Hi,

On 2018-08-16 17:27:45 +0300, Surafel Temesgen wrote:
> FETCH FIRST with PERCENT option is SQL standard that state limit count to
> be specified in a percentage in addition to specify it in exact count and
> listed as one of Major features simply not implemented yet in recent wiki
> page [1].
>
> I implemented it by executing the query plan twice. One without limit
> filter to find the total number of row that will be feed to limit node so
> the exact limit count can be calculated and the query plan execute again
> with limit filter with newly calculated exact count .

Won't that have rather massive issues with multiple evaluations of
clauses in the query?  Besides being really expensive?

I think you'd have to instead spill the query results into a tuplestore
(like we do for FOR HOLD queries at end of xact), and then do the
computations based on that.

- Andres

Reply | Threaded
Open this post in threaded view
|

Re: FETCH FIRST clause PERCENT option

Andrew Gierth
>>>>> "Andres" == Andres Freund <[hidden email]> writes:

 Andres> I think you'd have to instead spill the query results into a
 Andres> tuplestore (like we do for FOR HOLD queries at end of xact),
 Andres> and then do the computations based on that.

The approach I've suggested to some people who wanted to look into this
was to inject a window function call into the query, and teach the Limit
node to be able to stop on a boolean condition (which would be based on
that function result). This method should also work for WITH TIES by
using a different window function.

--
Andrew (irc:RhodiumToad)

Reply | Threaded
Open this post in threaded view
|

Re: FETCH FIRST clause PERCENT option

Surafel Temesgen
In reply to this post by Andres Freund


On Thu, Aug 16, 2018 at 5:34 PM, Andres Freund <[hidden email]> wrote:

Won't that have rather massive issues with multiple evaluations of
clauses in the query?  Besides being really expensive?

The plan re-scane after first execution I can’t see issue for multiple execution of a clause in this case


I think you'd have to instead spill the query results into a tuplestore

The downside of it is all the result have to be stored even if needed tuple is a fraction of it and also store it for longer so the choice became memory or cpu utilization

 

- Andres
Regards
Surafel
Reply | Threaded
Open this post in threaded view
|

Re: FETCH FIRST clause PERCENT option

Andres Freund
On 2018-08-21 15:47:07 +0300, Surafel Temesgen wrote:
> On Thu, Aug 16, 2018 at 5:34 PM, Andres Freund <[hidden email]> wrote:
>
> >
> > Won't that have rather massive issues with multiple evaluations of
> > clauses in the query?  Besides being really expensive?
> >
> The plan re-scane after first execution I can’t see issue for multiple
> execution of a clause in this case

Imagine volatile functions being used. That can be problematic because
of repeated side-effects, but also will lead to plainly wrong
results. Consider what happens with your approach where somebody does
something like WHERE value < random().

Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: FETCH FIRST clause PERCENT option

Surafel Temesgen


On Tue, Aug 21, 2018 at 3:50 PM Andres Freund <[hidden email]> wrote:

Imagine volatile functions being used. That can be problematic because
of repeated side-effects, but also will lead to plainly wrong
results. Consider what happens with your approach where somebody does
something like WHERE value < random().
Thanks for explaining . The attach patch use tuplestore instead
Regards
Surafel

fetch-wth-percent-v2.patch (38K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: FETCH FIRST clause PERCENT option

Erik Rijkers
On 2018-08-28 14:14, Surafel Temesgen wrote:

> On Tue, Aug 21, 2018 at 3:50 PM Andres Freund <[hidden email]>
> wrote:
>
>>
>> Imagine volatile functions being used. That can be problematic because
>> of repeated side-effects, but also will lead to plainly wrong
>> results. Consider what happens with your approach where somebody does
>> something like WHERE value < random().
>>
> Thanks for explaining . The attach patch use tuplestore instead

> [fetch-wth-percent-v2.patch]

I had a quick look at this.  Apply, compile, make check are ok.

In straightforward usage seems to work ok.


But I also ran across this crash:

create table if not exists t as
    select i from generate_series(1, 100) as f(i);

select * from t
   offset 60 rows
   fetch next 3 percent rows only
   FOR UPDATE
;

TRAP: FailedAssertion("!(slot != ((void *)0))", File: "execTuples.c",
Line: 428)




Erik Rijkers



















Reply | Threaded
Open this post in threaded view
|

Re: FETCH FIRST clause PERCENT option

Surafel Temesgen


On Tue, Aug 28, 2018 at 7:33 PM Erik Rijkers <[hidden email]> wrote:
;

TRAP: FailedAssertion("!(slot != ((void *)0))", File: "execTuples.c",
Line: 42
The attache patch include a fix for the crash .can you check it again?
Regards
Surafel

fetch-wth-percent-v3.patch (39K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: FETCH FIRST clause PERCENT option

Thomas Munro-3
On Fri, Aug 31, 2018 at 11:42 PM Surafel Temesgen <[hidden email]> wrote:
> On Tue, Aug 28, 2018 at 7:33 PM Erik Rijkers <[hidden email]> wrote:
>>
>> ;
>>
>> TRAP: FailedAssertion("!(slot != ((void *)0))", File: "execTuples.c",
>> Line: 42
>
> The attache patch include a fix for the crash .can you check it again?

FYI this fails[1] in src/test/modules/test_ddl_deparse's create_table
test because of the keyword:

  CREATE TABLE stud_emp (
  percent int4
  ) INHERITS (emp, student);
! ERROR:  syntax error at or near "percent"

[1] https://travis-ci.org/postgresql-cfbot/postgresql/builds/430777123

--
Thomas Munro
http://www.enterprisedb.com

Reply | Threaded
Open this post in threaded view
|

Re: FETCH FIRST clause PERCENT option

Mark Dilger-4
In reply to this post by Andres Freund


> On Aug 16, 2018, at 7:34 AM, Andres Freund <[hidden email]> wrote:
>
> Hi,
>
> On 2018-08-16 17:27:45 +0300, Surafel Temesgen wrote:
>> FETCH FIRST with PERCENT option is SQL standard that state limit count to
>> be specified in a percentage in addition to specify it in exact count and
>> listed as one of Major features simply not implemented yet in recent wiki
>> page [1].
>>
>> I implemented it by executing the query plan twice. One without limit
>> filter to find the total number of row that will be feed to limit node so
>> the exact limit count can be calculated and the query plan execute again
>> with limit filter with newly calculated exact count .

Surafel, there are no regression tests that I can see in your patch.  It
would help if you added some, as then I could precisely what behavior you
are expecting.  As it is, I'm just guessing here, but here goes....


> Won't that have rather massive issues with multiple evaluations of
> clauses in the query?  Besides being really expensive?
>
> I think you'd have to instead spill the query results into a tuplestore
> (like we do for FOR HOLD queries at end of xact), and then do the
> computations based on that.


I should think that spilling anything to a tuplestore would only be needed
if the query contains an ORDER BY expression.  If you query

        FETCH FIRST 50 PERCENT * FROM foo;

you should just return every other row, discarding the rest, right?  It's
only when an explicit ordering is given that the need to store the results
arises.  Even with

        FETCH FIRST 50 PERCENT name FROM foo ORDER BY name;

you can return one row for every two rows that you get back from the
sort node, reducing the maximum number you need to store at any time to
no more than 25% of all rows.

Or am I missing something?

mark

Reply | Threaded
Open this post in threaded view
|

Re: FETCH FIRST clause PERCENT option

Andres Freund
Hi,

On 2018-09-20 17:06:36 -0700, Mark Dilger wrote:

> I should think that spilling anything to a tuplestore would only be needed
> if the query contains an ORDER BY expression.  If you query
>
> FETCH FIRST 50 PERCENT * FROM foo;
>
> you should just return every other row, discarding the rest, right?  It's
> only when an explicit ordering is given that the need to store the results
> arises.  Even with
>
> FETCH FIRST 50 PERCENT name FROM foo ORDER BY name;
>
> you can return one row for every two rows that you get back from the
> sort node, reducing the maximum number you need to store at any time to
> no more than 25% of all rows.

I'm doubtful about the validity of these optimizations, particularly
around being surprising. But I think more importantly, we should focus
on the basic implementation that's needed anyway.

Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: FETCH FIRST clause PERCENT option

Mark Dilger-4


> On Sep 20, 2018, at 5:29 PM, Andres Freund <[hidden email]> wrote:
>
> Hi,
>
> On 2018-09-20 17:06:36 -0700, Mark Dilger wrote:
>> I should think that spilling anything to a tuplestore would only be needed
>> if the query contains an ORDER BY expression.  If you query
>>
>> FETCH FIRST 50 PERCENT * FROM foo;
>>
>> you should just return every other row, discarding the rest, right?  It's
>> only when an explicit ordering is given that the need to store the results
>> arises.  Even with
>>
>> FETCH FIRST 50 PERCENT name FROM foo ORDER BY name;
>>
>> you can return one row for every two rows that you get back from the
>> sort node, reducing the maximum number you need to store at any time to
>> no more than 25% of all rows.
>
> I'm doubtful about the validity of these optimizations, particularly
> around being surprising. But I think more importantly, we should focus
> on the basic implementation that's needed anyway.

You may be right that getting the basic implementation finished first
is better than optimizing at this stage.  So the rest of what I'm going
to say is just in defense of the optimization, and not an argument for
needing to optimize right away.

As for reducing the surprise factor, I think that it would be surprising
if I ask for a smallish percentage of rows and it takes significantly longer
and significantly more memory or disk than asking for all the rows takes.
If I'm including an explicit ORDER BY, then that explains it, but otherwise,
I'd be surprised.  Note that I'm not saying I'd be surprised by it taking
roughly the same length of time / memory / disk.  I'd only be surprised if
it took a lot more.

There are plenty of SQL generation engines that people put in their software.
I'd expect something like

        sprintf("FETCH FIRST %d PERCENT %s FROM %s", percentage, columns, tablename)

to show up in such engines, and percentage to sometimes be 100.  At least
in that case you should just return all rows rather than dumping them into
a tuplestore.  Likewise, if the percentage is 0, you'll want to finish quickly.
Actually, I don't know if the SQL spec would require side effects to still
happen, in which case you'd still have to generate all rows for their side
effects to happen, and then just not return them.  But still, no tuplestore.
So the implementation of FETCH FIRST would at least need to think about what
percentage is being requested, rather than just mindlessly adding a node to
the tree for storing everything, then computing the LIMIT based on the number
of rows stored, and then returning that number of rows.

mark
Reply | Threaded
Open this post in threaded view
|

Re: FETCH FIRST clause PERCENT option

Surafel Temesgen
In reply to this post by Mark Dilger-4
hey

On 9/21/18, Mark Dilger <[hidden email]> wrote:

> Surafel, there are no regression tests that I can see in your patch.  It
> would help if you added some, as then I could precisely what behavior you
> are expecting.
thank you for looking at it .the attach patch add regression tests

fetch-wth-percent-v4.patch (47K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: FETCH FIRST clause PERCENT option

Mark Dilger-4


> On Sep 25, 2018, at 5:07 AM, Surafel Temesgen <[hidden email]> wrote:
>
> hey
>
> On 9/21/18, Mark Dilger <[hidden email]> wrote:
>
>> Surafel, there are no regression tests that I can see in your patch.  It
>> would help if you added some, as then I could precisely what behavior you
>> are expecting.
> thank you for looking at it .the attach patch add regression tests
> <fetch-wth-percent-v4.patch>

Surafel,

I messed around with your changes to the grammar and it seems you don't
need to add PERCENT as a reserved keyword.  Moving this to the unreserved
keyword section does not cause any shift/reduce errors, and the regression
tests still pass.  Relative to your patch v4, these changes help:

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index a97ee30924..9872cf7257 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -15186,6 +15186,7 @@ unreserved_keyword:
                        | PARTITION
                        | PASSING
                        | PASSWORD
+                       | PERCENT
                        | PLANS
                        | POLICY
                        | PRECEDING
@@ -15465,7 +15466,6 @@ reserved_keyword:
                        | ONLY
                        | OR
                        | ORDER
-                       | PERCENT
                        | PLACING
                        | PRIMARY
                        | REFERENCES
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 150d51df8f..f23fee0653 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -300,7 +300,7 @@ PG_KEYWORD("partial", PARTIAL, UNRESERVED_KEYWORD)
 PG_KEYWORD("partition", PARTITION, UNRESERVED_KEYWORD)
 PG_KEYWORD("passing", PASSING, UNRESERVED_KEYWORD)
 PG_KEYWORD("password", PASSWORD, UNRESERVED_KEYWORD)
-PG_KEYWORD("percent", PERCENT, RESERVED_KEYWORD)
+PG_KEYWORD("percent", PERCENT, UNRESERVED_KEYWORD)
 PG_KEYWORD("placing", PLACING, RESERVED_KEYWORD)
 PG_KEYWORD("plans", PLANS, UNRESERVED_KEYWORD)
 PG_KEYWORD("policy", POLICY, UNRESERVED_KEYWORD)


Reply | Threaded
Open this post in threaded view
|

Re: FETCH FIRST clause PERCENT option

Mark Dilger-4


> On Sep 25, 2018, at 8:08 AM, Mark Dilger <[hidden email]> wrote:
>
>
>
>> On Sep 25, 2018, at 5:07 AM, Surafel Temesgen <[hidden email]> wrote:
>>
>> hey
>>
>> On 9/21/18, Mark Dilger <[hidden email]> wrote:
>>
>>> Surafel, there are no regression tests that I can see in your patch.  It
>>> would help if you added some, as then I could precisely what behavior you
>>> are expecting.
>> thank you for looking at it .the attach patch add regression tests
>> <fetch-wth-percent-v4.patch>
>
> Surafel,
>
> I messed around with your changes to the grammar and it seems you don't
> need to add PERCENT as a reserved keyword.  Moving this to the unreserved
> keyword section does not cause any shift/reduce errors, and the regression
> tests still pass.  Relative to your patch v4, these changes help:

I spoke too soon.  The main regression tests pass, but your change to
src/test/modules/test_ddl_deparse/sql/create_table.sql per Thomas's
suggestion is no longer needed, since PERCENT no longer needs to be
quoted.

I recommend you also apply the following to your v4 patch, which just
rolls back that one change you made, and at least for me, is enough
to get `make check-world` to pass:

diff --git a/src/test/modules/test_ddl_deparse/sql/create_table.sql b/src/test/modules/test_ddl_deparse/sql/create_table.sql
index 4325de2d04..5e78452729 100644
--- a/src/test/modules/test_ddl_deparse/sql/create_table.sql
+++ b/src/test/modules/test_ddl_deparse/sql/create_table.sql
@@ -94,7 +94,7 @@ CREATE TABLE student (
 ) INHERITS (person);
 
 CREATE TABLE stud_emp (
-       "percent"       int4
+       percent         int4
 ) INHERITS (emp, student);



Reply | Threaded
Open this post in threaded view
|

Re: FETCH FIRST clause PERCENT option

Surafel Temesgen
In reply to this post by Mark Dilger-4




I messed around with your changes to the grammar and it seems you don't
need to add PERCENT as a reserved keyword.  Moving this to the unreserved
keyword section does not cause any shift/reduce errors, and the regression
tests still pass.  Relative to your patch v4, these changes help:


In sql standard PERCENT list as reserved world so i don't think it is a thing that can change by me.
I think the main reason create_table fail in  test_ddl_deparse  is because i don't surround PERCENT key word in
/src/test/regress/expected/create_table.out too

regards
Surafel

fetch-wth-percent-v5.patch (48K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: FETCH FIRST clause PERCENT option

Vik Fearing-4
On 25/11/2018 10:00, Surafel Temesgen wrote:

>
>
>
>
>     I messed around with your changes to the grammar and it seems you don't
>     need to add PERCENT as a reserved keyword.  Moving this to the
>     unreserved
>     keyword section does not cause any shift/reduce errors, and the
>     regression
>     tests still pass.  Relative to your patch v4, these changes help:
>
>
> In sql standard PERCENT list as reserved world so i don't think it is a
> thing that can change by me.

Yes it absolutely is.  In PostgreSQL we only make words as reserved as
they need to be, and PERCENT doesn't need to be reserved at all.

Also, this query returns 210 rows instead of the expected 208:

    select *
    from generate_series(1, 1000)
    fetch first 20.8 percent rows only

As for the design, I think you should put some serious consideration
into Andrew Gierth's approach.  I would rather see something like that.
--
Vik Fearing                                          +33 6 46 75 15 36
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support

Reply | Threaded
Open this post in threaded view
|

Re: FETCH FIRST clause PERCENT option

Surafel Temesgen


On Sun, Nov 25, 2018 at 1:24 PM Vik Fearing <[hidden email]> wrote:

Also, this query returns 210 rows instead of the expected 208:

    select *
    from generate_series(1, 1000)
    fetch first 20.8 percent rows only

 

this is because fetch first values work with integer and it change fractional number to nearest integer number like select * from generate_series(1, 1000) fetch first 20.3 rows only; is not an error rather it return 20 rows.

 
As for the design, I think you should put some serious consideration
into Andrew Gierth's approach.  I would rather see something like that.

 

I didn't go in that direction because I don’t understand why this approach is inferior to using window function and I am not fully understand on how to implement it.

regards

Surafel

 
Reply | Threaded
Open this post in threaded view
|

Re: FETCH FIRST clause PERCENT option

Vik Fearing-4
On 25/11/2018 12:49, Surafel Temesgen wrote:

>
>
> On Sun, Nov 25, 2018 at 1:24 PM Vik Fearing <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>
>     Also, this query returns 210 rows instead of the expected 208:
>
>         select *
>         from generate_series(1, 1000)
>         fetch first 20.8 percent rows only
>
> this is because fetch first values work with integer and it change
> fractional number to nearest integer number like select * from
> generate_series(1, 1000) fetch first 20.3 rows only; is not an error
> rather it return 20 rows.

I don't see how this behavior is justified by reading the SQL standard.
 Obviously only an integer number of rows is going to be returned, but
the percentage should be calculated correctly.

I assume you meant 200 rows there, but the correct number of rows to
return is 203 for that query.  Please fix it.
--
Vik Fearing                                          +33 6 46 75 15 36
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support

Reply | Threaded
Open this post in threaded view
|

Re: FETCH FIRST clause PERCENT option

Andrew Gierth
In reply to this post by Surafel Temesgen
>>>>> "Surafel" == Surafel Temesgen <[hidden email]> writes:

 Surafel> this is because fetch first values work with integer and it
 Surafel> change fractional number to nearest integer number like select
 Surafel> * from generate_series(1, 1000) fetch first 20.3 rows only; is
 Surafel> not an error rather it return 20 rows.

 31) The declared type of the <simple value specification> simply
     contained in <fetch first percentage> shall be numeric.

Note that unlike <fetch first row count> there is no requirement for the
type to be _exact_ numeric or to have scale 0.

later:

    ii) If <fetch first percentage> is specified, then let FFP be the
        <simple value specification> simply contained in <fetch first
        percentage>, and let LOCT be a <literal> whose value is OCT. Let
        FFRC be the value of

          CEILING ( FFP * LOCT / 100.0E0 )

                NOTE 333 -- The percentage is computed using the number
                of rows before removing the rows specified by <offset
                row count>.

ceil(20.3 * 1000 / 100.0E0) is definitely 203, not 200.

--
Andrew.

123