Pulling up direct-correlated ANY_SUBLINK

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Pulling up direct-correlated ANY_SUBLINK

Richard Guo-2
Hi,

Currently we do not try to pull up sub-select of type ANY_SUBLINK if it
refers to any Vars of the parent query, as indicated in the code snippet
below:

JoinExpr *
convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink,
                            Relids available_rels)
{
    ...

    if (contain_vars_of_level((Node *) subselect, 1))
        return NULL;


Why do we have this check?

Can we try to pull up direct-correlated ANY SubLink with the help of
LATERAL? That is, do the pull up in the same way as uncorrelated ANY
SubLink, by adding the SubLink's subselect to the query's rangetable,
but explicitly set LATERAL for its RangeTblEntry, like:

--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -1226,13 +1226,6 @@ convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink,
        Assert(sublink->subLinkType == ANY_SUBLINK);

        /*
-        * The sub-select must not refer to any Vars of the parent query. (Vars of
-        * higher levels should be okay, though.)
-        */
-       if (contain_vars_of_level((Node *) subselect, 1))
-               return NULL;
-
-       /*
         * The test expression must contain some Vars of the parent query, else
         * it's not gonna be a join.  (Note that it won't have Vars referring to
         * the subquery, rather Params.)
@@ -1267,7 +1260,7 @@ convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink,
        rte = addRangeTableEntryForSubquery(pstate,
                                            subselect,
                                            makeAlias("ANY_subquery", NIL),
-                                           false,
+                                          contain_vars_of_level((Node *) subselect, 1), /* lateral */
                                            false);
        parse->rtable = lappend(parse->rtable, rte);
        rtindex = list_length(parse->rtable);


By this way, we can convert the query:

select * from a where a.i = ANY(select i from b where a.j > b.j);

To:

select * from a SEMI JOIN lateral(select * from b where a.j > b.j) sub on a.i = sub.i;


Does this make sense?

Thanks
Richard
Reply | Threaded
Open this post in threaded view
|

Re: Pulling up direct-correlated ANY_SUBLINK

Antonin Houska-2
Richard Guo <[hidden email]> wrote:

> Can we try to pull up direct-correlated ANY SubLink with the help of
> LATERAL?

> By this way, we can convert the query:
>
> select * from a where a.i = ANY(select i from b where a.j > b.j);
>
> To:
>
> select * from a SEMI JOIN lateral(select * from b where a.j > b.j)
> sub on a.i = sub.i;
>

I tried this a few years ago. This is where the problems started:

https://www.postgresql.org/message-id/1386716782.5203.YahooMailNeo%40web162905.mail.bf1.yahoo.com

I'm not sure I remember enough, but the problem has something to do with one
possible strategy to plan SEMI JOIN: unique-ify the inner path and then
perform plain INNER JOIN instead.

I think the problemm was that the WHERE clause of the subquery didn't
participate in the SEMI JOIN evaluation and was used as filter instead. Thus
the clause's Vars were not used in unique keys of the inner path and so the
SEMI JOIN didn't work well.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com


Reply | Threaded
Open this post in threaded view
|

Re: Pulling up direct-correlated ANY_SUBLINK

Tom Lane-2
In reply to this post by Richard Guo-2
Richard Guo <[hidden email]> writes:
> Currently we do not try to pull up sub-select of type ANY_SUBLINK if it
> refers to any Vars of the parent query, as indicated in the code snippet
> below:
>     if (contain_vars_of_level((Node *) subselect, 1))
>         return NULL;
> Why do we have this check?

Because the result would not be a join between two independent tables.

> Can we try to pull up direct-correlated ANY SubLink with the help of
> LATERAL?

Perhaps.  But what's the argument that you'd end up with a better
plan?  LATERAL pretty much constrains things to use a nestloop,
so I'm not sure there's anything fundamentally different.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Pulling up direct-correlated ANY_SUBLINK

Richard Guo-2
In reply to this post by Antonin Houska-2
Hi Antonin,

On Tue, Sep 10, 2019 at 4:31 PM Antonin Houska <[hidden email]> wrote:
Richard Guo <[hidden email]> wrote:

> Can we try to pull up direct-correlated ANY SubLink with the help of
> LATERAL?

> By this way, we can convert the query:
>
> select * from a where a.i = ANY(select i from b where a.j > b.j);
>
> To:
>
> select * from a SEMI JOIN lateral(select * from b where a.j > b.j)
> sub on a.i = sub.i;
>

I tried this a few years ago. This is where the problems started:

https://www.postgresql.org/message-id/1386716782.5203.YahooMailNeo%40web162905.mail.bf1.yahoo.com

Thank you for this link. Good to know the discussions years ago.
 
I'm not sure I remember enough, but the problem has something to do with one
possible strategy to plan SEMI JOIN: unique-ify the inner path and then
perform plain INNER JOIN instead.

I think the problemm was that the WHERE clause of the subquery didn't
participate in the SEMI JOIN evaluation and was used as filter instead. Thus
the clause's Vars were not used in unique keys of the inner path and so the
SEMI JOIN didn't work well.

This used to be a problem until it was fixed by commit 043f6ff0, which
includes the postponed qual from a LATERAL subquery into the quals seen
by make_outerjoininfo().

Thanks
Richard
Reply | Threaded
Open this post in threaded view
|

Re: Pulling up direct-correlated ANY_SUBLINK

Antonin Houska-2
In reply to this post by Tom Lane-2
Tom Lane <[hidden email]> wrote:

> > Can we try to pull up direct-correlated ANY SubLink with the help of
> > LATERAL?
>
> Perhaps.  But what's the argument that you'd end up with a better
> plan?  LATERAL pretty much constrains things to use a nestloop,
> so I'm not sure there's anything fundamentally different.

I think that subquery pull-up is most beneficial when the queries (both the
subquery and the upper query) contain more than a few tables. In such a case,
if only a few tables reference the upper query (or if just a single one does),
the constraints imposed by LATERAL might be less significant.

Nevertheless, I don't know how to overcome the problems that I mentioned
upthread.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com


Reply | Threaded
Open this post in threaded view
|

Re: Pulling up direct-correlated ANY_SUBLINK

Richard Guo-2
Hi Antonin,

On Wed, Sep 11, 2019 at 3:25 PM Antonin Houska <[hidden email]> wrote:

Nevertheless, I don't know how to overcome the problems that I mentioned
upthread.

Do you mean the problem "the WHERE clause of the subquery didn't
participate in the SEMI JOIN evaluation"? Good news is it has been fixed
by commit 043f6ff0 as I mentioned upthread.

Thanks
Richard


Reply | Threaded
Open this post in threaded view
|

Re: Pulling up direct-correlated ANY_SUBLINK

Richard Guo-2
In reply to this post by Tom Lane-2
Hi Tom,

On Tue, Sep 10, 2019 at 9:48 PM Tom Lane <[hidden email]> wrote:

> Can we try to pull up direct-correlated ANY SubLink with the help of
> LATERAL?

Perhaps.  But what's the argument that you'd end up with a better
plan?  LATERAL pretty much constrains things to use a nestloop,
so I'm not sure there's anything fundamentally different.

This is a point I didn't think of. In that case if the pull-up mostly
results in a nestloop then we cannot make sure we will get a better
plan. Thank you for pointing it out.

Thanks
Richard
Reply | Threaded
Open this post in threaded view
|

Re: Pulling up direct-correlated ANY_SUBLINK

Antonin Houska-2
In reply to this post by Richard Guo-2
Richard Guo <[hidden email]> wrote:

> On Wed, Sep 11, 2019 at 3:25 PM Antonin Houska <[hidden email]>
> wrote:
>
>    
>     Nevertheless, I don't know how to overcome the problems that I
>     mentioned
>     upthread.
>
>
> Do you mean the problem "the WHERE clause of the subquery didn't
> participate in the SEMI JOIN evaluation"? Good news is it has been
> fixed
> by commit 043f6ff0 as I mentioned upthread.

Do you say that my old patch (rebased) no longer breaks the regression tests?

(I noticed your other email in the thread which seems to indicate that you're
no lo longer interested to work on the feature, but asking out of curiosity.)

--
Antonin Houska
Web: https://www.cybertec-postgresql.com


Reply | Threaded
Open this post in threaded view
|

Re: Pulling up direct-correlated ANY_SUBLINK

Richard Guo-2

On Thu, Sep 12, 2019 at 11:35 PM Antonin Houska <[hidden email]> wrote:
Richard Guo <[hidden email]> wrote:

> On Wed, Sep 11, 2019 at 3:25 PM Antonin Houska <[hidden email]>
> wrote:
>
>   
>     Nevertheless, I don't know how to overcome the problems that I
>     mentioned
>     upthread.
>
>
> Do you mean the problem "the WHERE clause of the subquery didn't
> participate in the SEMI JOIN evaluation"? Good news is it has been
> fixed
> by commit 043f6ff0 as I mentioned upthread.

Do you say that my old patch (rebased) no longer breaks the regression tests?

I think so.
 

(I noticed your other email in the thread which seems to indicate that you're
no lo longer interested to work on the feature, but asking out of curiosity.)

Tom pointed out that even if we pull up the subquery with the help of
LATERAL, we cannot make sure we will end up with a better plan, since
LATERAL pretty much constrains things to use a nestloop. Hmm, I think
what he said makes sense.

Thanks
Richard