A rather hackish POC for alternative implementation of WITH TIES

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

A rather hackish POC for alternative implementation of WITH TIES

Andrew Gierth
This patch is a rather hacky implementation of the basic idea for
implementing FETCH ... WITH TIES, and potentially also PERCENT, by using
a window function expression to compute a stopping point.

Large chunks of this (the parser/ruleutils changes, docs, tests) are
taken from Surafel Temesgen's patch. The difference is that the executor
change in my version is minimal: Limit allows a boolean column in the
input to signal the point at which to stop. The planner inserts a
WindowAgg node to compute the necessary condition using the rank()
function.

The way this is done in the planner isn't (IMO) the best and should
probably be improved; in particular it currently misses some possible
optimizations (most notably constant-folding of the offset+limit
subexpression). I also haven't tested it properly to see whether I broke
anything, though it does pass regression.

--
Andrew (irc:RhodiumToad)


with_ties_window.patch (34K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: A rather hackish POC for alternative implementation of WITH TIES

Surafel Temesgen


On Fri, Nov 29, 2019 at 8:40 AM Andrew Gierth <[hidden email]> wrote:
This patch is a rather hacky implementation of the basic idea for
implementing FETCH ... WITH TIES, and potentially also PERCENT, by using
a window function expression to compute a stopping point.

Large chunks of this (the parser/ruleutils changes, docs, tests) are
taken from Surafel Temesgen's patch. The difference is that the executor
change in my version is minimal: Limit allows a boolean column in the
input to signal the point at which to stop. The planner inserts a
WindowAgg node to compute the necessary condition using the rank()
function.

The way this is done in the planner isn't (IMO) the best and should
probably be improved; in particular it currently misses some possible
optimizations (most notably constant-folding of the offset+limit
subexpression). I also haven't tested it properly to see whether I broke
anything, though it does pass regression.



Unlike most other executor node limit node has implementation for handling backward scan that support cursor operation but your approach didn't do this inherently because it outsource limitNode functionality to window function and window function didn't do this

eg.

postgres=# begin;

BEGIN

postgres=# declare c cursor for select i from generate_series(1,1000000) s(i) order by i fetch first 2 rows with ties;

DECLARE CURSOR

postgres=# fetch all in c;

i

---

1

2

(2 rows)


postgres=# fetch backward all in c;

ERROR: cursor can only scan forward

HINT: Declare it with SCROLL option to enable backward scan.


Even with SCROLL option it is not working as limitNode does. It store the result and return in backward scan that use more space than current limit and limit with ties implementation.


If am not mistaken the patch also reevaluate limit every time returning row beside its not good for performance its will return incorrect result with limit involving volatile function


regards

Surafel

 
Reply | Threaded
Open this post in threaded view
|

Re: A rather hackish POC for alternative implementation of WITH TIES

Andrew Gierth
>>>>> "Surafel" == Surafel Temesgen <[hidden email]> writes:

 Surafel> Unlike most other executor node limit node has implementation
 Surafel> for handling backward scan that support cursor operation but
 Surafel> your approach didn't do this inherently because it outsource
 Surafel> limitNode functionality to window function and window function
 Surafel> didn't do this

Correct. But this is a non-issue: if you want to be able to do backward
scan you are supposed to declare the cursor as SCROLL; if it happens to
work without it, that is pure coincidence. (Cursors declared with neither
SCROLL nor NO SCROLL support backwards scan only if the underlying plan
supports backward scan with no additional overhead, which is something
you can't predict from the query.)

The Limit node declares that it supports backwards scan if, and only if,
its immediate child node supports it. It happens that WindowAgg does
not, so in this implementation, LIMIT ... WITH TIES will not support
backward scan without a tuplestore. I don't consider this an especially
big deal; backward scans are extremely rare (as shown by the fact that
bugs in backward scan have tended to go unnoticed for decades, e.g. bug
#15336), and therefore we should not optimize for them.

 Surafel> If am not mistaken the patch also reevaluate limit every time

The (offset+limit) expression is, yes. I noted in the original post that
this needs work - probably it should be pushed out to an InitPlan if it
doesn't fold to a constant. i.e. using the expression

  rank() over (...) > (select offset+limit)

where it currently has

  rank() over (...) > (offset+limit)

(Generating the limit expression so late in planning is the main thing
that needs changing to get this from a hack POC to usable code)

The main point here is that the same rather minimal executor changes
allow support for not only WITH TIES but also PERCENT and possibly
arbitrary stop conditions as well. (I know I've often wanted LIMIT WHEN
to stop a query at a data-dependent point without having to resort to
recursion - this patch doesn't quite get there, because of the scope
issues involved in analyzing the WHEN condition, but it at least sets up
the concept.)

--
Andrew (irc:RhodiumToad)


Reply | Threaded
Open this post in threaded view
|

Re: A rather hackish POC for alternative implementation of WITH TIES

Alvaro Herrera-9
Hello

As this is a valuable feature, it would be good to have something happen
here.  I wouldn't like to have pg13 ship with no implementation of WITH
TIES at all.

My own inclination is that Andrew's implementation, being more general
in nature, would be the better one to have in the codebase; but we don't
have a complete patch yet.  Can we reach some compromise such as if
Andrew's patch is not completed then we push Surafel's?

Thanks

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: A rather hackish POC for alternative implementation of WITH TIES

Andrew Gierth
>>>>> "Alvaro" == Alvaro Herrera <[hidden email]> writes:

 Alvaro> My own inclination is that Andrew's implementation, being more
 Alvaro> general in nature, would be the better one to have in the
 Alvaro> codebase; but we don't have a complete patch yet. Can we reach
 Alvaro> some compromise such as if Andrew's patch is not completed then
 Alvaro> we push Surafel's?

Mine needs some attention to where exactly in planning the necessary
transformation work should be done; right now the planner part is a
hack, intended to demonstrate the idea (and to let the executor changes
work) rather than actually be the final version. As I mentioned before,
some stuff does need to be pushed out to an InitPlan to make it work
without multiple-evaluation problems.

(A second opinion from another planner expert would be welcome on that
part)

I was largely holding off on doing further work hoping for some
discussion of which way we should go. If you think my approach is worth
pursuing (I haven't seriously tested the performance, but I'd expect it
to be slower than Surafel's - the price you pay for flexibility) then I
can look at it further, but figuring out the planner stuff will take
some time.

--
Andrew.


Reply | Threaded
Open this post in threaded view
|

Re: A rather hackish POC for alternative implementation of WITH TIES

ryanlambert
On Wed, Jan 22, 2020 at 3:06 PM Alvaro Herrera <[hidden email]> wrote:
> My own inclination is that Andrew's implementation, being more general
> in nature, would be the better one to have in the codebase; but we don't
> have a complete patch yet.  Can we reach some compromise such as if
> Andrew's patch is not completed then we push Surafel's?

+1

On Wed, Jan 22, 2020 at 4:35 PM Andrew Gierth <[hidden email]> wrote:
> I was largely holding off on doing further work hoping for some
> discussion of which way we should go. If you think my approach is worth
> pursuing (I haven't seriously tested the performance, but I'd expect it
> to be slower than Surafel's - the price you pay for flexibility) then I
> can look at it further, but figuring out the planner stuff will take
> some time.

Flexibility with more generalized code is good, though if performance is significantly slower I would be concerned.  I quickly reviewed the patch but haven't tested it yet.

Is it realistic to add PERCENT into this patch or would that be a future enhancement?  

Thanks,

Ryan Lambert
Ryan Lambert
RustProof  Labs
Reply | Threaded
Open this post in threaded view
|

Re: A rather hackish POC for alternative implementation of WITH TIES

Surafel Temesgen
In reply to this post by Andrew Gierth


On Wed, Jan 22, 2020 at 3:35 PM Andrew Gierth <[hidden email]> wrote:
>>>>> "Alvaro" == Alvaro Herrera <[hidden email]> writes:


I was largely holding off on doing further work hoping for some
discussion of which way we should go. If you think my approach is worth
pursuing (I haven't seriously tested the performance, but I'd expect it
to be slower than Surafel's - the price you pay for flexibility) then I
can look at it further, but figuring out the planner stuff will take
some time.


Other alternative can be pushing the existing implementation
which will be open to change in case of better-finished
implementation.

regards 
Surafel 
  
Reply | Threaded
Open this post in threaded view
|

Re: A rather hackish POC for alternative implementation of WITH TIES

Alvaro Herrera-9
On 2020-Mar-26, Surafel Temesgen wrote:

> On Wed, Jan 22, 2020 at 3:35 PM Andrew Gierth <[hidden email]>
> wrote:
>
> > >>>>> "Alvaro" == Alvaro Herrera <[hidden email]> writes:
> >
> > I was largely holding off on doing further work hoping for some
> > discussion of which way we should go. If you think my approach is worth
> > pursuing (I haven't seriously tested the performance, but I'd expect it
> > to be slower than Surafel's - the price you pay for flexibility) then I
> > can look at it further, but figuring out the planner stuff will take
> > some time.
>
> Other alternative can be pushing the existing implementation
> which will be open to change in case of better-finished
> implementation.

At this point, I think that's what we should do.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services