postgres_fdw bug in 9.6

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
113 messages Options
1234 ... 6
Reply | Threaded
Open this post in threaded view
|

postgres_fdw bug in 9.6

Jeff Janes
I have a setup where a 9.6.1 server uses postgres_fdw to connect to a 9.4.9 hot standby server.

I have a DML statement which triggers the error:

ERROR:  XX000: outer pathkeys do not match mergeclauses
LOCATION:  create_mergejoin_plan, createplan.c:3722

The error first starts appearing with this commit (on the local side): 

commit aa09cd242fa7e3a694a31f8aed521e80d1e626a4
Author: Robert Haas <[hidden email]>
Date:   Wed Mar 9 10:51:49 2016 -0500

    postgres_fdw: Consider foreign joining and foreign sorting together.

 
The version of the remote side does not seem to matter.  I've also promoted a test instance of the remote from hot standby to master and then upgraded to 9.6.1, and neither step fixes the issue.

The statement is like this:

explain update foo_local set col3=foo_remote.col3 from foo_remote where foo_local.id=foo_remote.id and foo_local.id in ('aaa','bbb','ccc','ddd');

Where foo_remote is a pretty complicated view (defined locally) over the join of 8 foreign tables.

I am having trouble producing a self-contained, disclosable test case for this.  Small changes causes the error to go away.  On the local side, it doesn't seem to depend on the contents of the table, only the structure.  But on the remote side, truncating the central table for the query makes the error go away.

Any tips on investigating this further in situ?  Or is the best option just to work harder on a minimal and disclosable test case?

Cheers,

Jeff
Reply | Threaded
Open this post in threaded view
|

Re: postgres_fdw bug in 9.6

Tom Lane-2
Jeff Janes <[hidden email]> writes:
> I have a DML statement which triggers the error:
> ERROR:  XX000: outer pathkeys do not match mergeclauses
> LOCATION:  create_mergejoin_plan, createplan.c:3722

Hmm.

> Any tips on investigating this further in situ?  Or is the best option just
> to work harder on a minimal and disclosable test case?

I think we need a test case --- not minimal necessarily, but something
other people can reproduce.  You might find that setting enable_hashjoin
and/or enable_nestloop to false makes it easier to provoke the error,
since evidently this requires that we (a) generate a faulty mergejoin Path
and then (b) choose it as the cheapest one, since the error occurs while
converting it to a Plan.

BTW, if you're not doing this in a debug (--enable-cassert) build, it'd
be useful to try it in one.  I'm a little suspicious that the root cause
might be a memory-stomp type of problem, ie somebody scribbling on a
pathkey data structure without accounting for it being shared with another
path.  It's possible that cassert memory checking would help catch that.

                        regards, tom lane


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: postgres_fdw bug in 9.6

Robert Haas
On Thu, Dec 8, 2016 at 12:50 PM, Tom Lane <[hidden email]> wrote:

> Jeff Janes <[hidden email]> writes:
>> I have a DML statement which triggers the error:
>> ERROR:  XX000: outer pathkeys do not match mergeclauses
>> LOCATION:  create_mergejoin_plan, createplan.c:3722
>
> Hmm.
>
>> Any tips on investigating this further in situ?  Or is the best option just
>> to work harder on a minimal and disclosable test case?
>
> I think we need a test case --- not minimal necessarily, but something
> other people can reproduce.  You might find that setting enable_hashjoin
> and/or enable_nestloop to false makes it easier to provoke the error,
> since evidently this requires that we (a) generate a faulty mergejoin Path
> and then (b) choose it as the cheapest one, since the error occurs while
> converting it to a Plan.

Maybe it would help for Jeff to use elog_node_display() to the nodes
that are causing the problem - e.g. outerpathkeys and innerpathkeys
and best_path->path_mergeclauses, or just best_path - at the point
where the error is thrown. That might give us enough information to
see what's broken.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: postgres_fdw bug in 9.6

Tom Lane-2
Robert Haas <[hidden email]> writes:
> Maybe it would help for Jeff to use elog_node_display() to the nodes
> that are causing the problem - e.g. outerpathkeys and innerpathkeys
> and best_path->path_mergeclauses, or just best_path - at the point
> where the error is thrown. That might give us enough information to
> see what's broken.

I'll be astonished if that's sufficient evidence.  We already know that
the problem is that the input path doesn't claim to be sorted in a way
that would match the merge clauses, but that doesn't tell us how such
a path came to be generated (or, if it wasn't intentionally done, where
the data structure got clobbered later).

It's possible that setting a breakpoint at create_mergejoin_path and
capturing stack traces for all calls would yield usable insight.  But
there are likely to be lots of calls if this is an 8-way join query,
and probably only a few are wrong.

I'd much rather have a test case than try to debug this remotely.
Bandwidth too low.

                        regards, tom lane


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: postgres_fdw bug in 9.6

Jeff Janes
On Thu, Dec 8, 2016 at 10:28 AM, Tom Lane <[hidden email]> wrote:
Robert Haas <[hidden email]> writes:
> Maybe it would help for Jeff to use elog_node_display() to the nodes
> that are causing the problem - e.g. outerpathkeys and innerpathkeys
> and best_path->path_mergeclauses, or just best_path - at the point
> where the error is thrown. That might give us enough information to
> see what's broken.

I'll be astonished if that's sufficient evidence.  We already know that
the problem is that the input path doesn't claim to be sorted in a way
that would match the merge clauses, but that doesn't tell us how such
a path came to be generated (or, if it wasn't intentionally done, where
the data structure got clobbered later).

It's possible that setting a breakpoint at create_mergejoin_path and
capturing stack traces for all calls would yield usable insight.  But
there are likely to be lots of calls if this is an 8-way join query,
and probably only a few are wrong.

I'd much rather have a test case than try to debug this remotely.
Bandwidth too low.

I didn't get an asserts on the assert-enabled build. 

I have a test case where I made the fdw connect back to itself, and stripped out all the objects that I could and still reproduce the case.  It is large, 21MB compressed, 163MB uncompressed, so I am linking it here:


When running this against and default configured 9.6.1 or 10devel-a73491e, I get the error.

psql -U postgres -f <(xzcat combined_anon.sql.xz)

...

ERROR:  outer pathkeys do not match mergeclauses
STATEMENT:  explain update local1 set local19 = jj_join.local19 from jj_join where local1.local12=jj_join.local12  and local1.local12 in ('ddd','bbb','aaa','abc');

Cheers,

Jeff


Reply | Threaded
Open this post in threaded view
|

Re: postgres_fdw bug in 9.6

Tom Lane-2
Jeff Janes <[hidden email]> writes:
> I have a test case where I made the fdw connect back to itself, and
> stripped out all the objects that I could and still reproduce the case.  It
> is large, 21MB compressed, 163MB uncompressed, so I am linking it here:
> https://drive.google.com/open?id=0Bzqrh1SO9FcEZkpPM0JwUEMzcmc

Thanks for the test case.  I believe I understand the fundamental problem,
or one of the fundamental problems --- I'm not exactly convinced there
aren't several here.

The key issue is that GetExistingLocalJoinPath believes it can return
any random join path as what to use for the fdw_outerpath of a foreign
join.  You can get away with that, perhaps, as long as you only consider
two-way joins.  But in a nest of foreign joins, this fails to consider
the interrelationships of join paths and their children.  In particular,
what we have happening in this example is that GetExistingLocalJoinPath
seizes randomly on a MergePath for an upper join relation, clones it,
sees that the outer child is a join ForeignPath, and replaces that outer
child with its fdw_outerpath ... which was chosen equally cavalierly by
some previous execution of GetExistingLocalJoinPath, and does not have
the sort ordering expected by the MergePath.  So we'd generate an invalid
EPQ plan, except that the debug cross-check in create_mergejoin_plan
notices the discrepancy.

I believe there are probably more problems here, or at least if there
aren't, it's not clear why not.  Because of GetExistingLocalJoinPath's
lack of curiosity about what's underneath the join pathnode it picks,
it seems to me that it's possible for it to return a path tree that
*isn't* all local joins.  If we're looking at, say, a hash path for
a 4-way join, whose left input is a hash path for a 3-way join, whose
left input is a 2-way foreign join, what's stopping that from being
returned as a "local" path tree?

Likewise, it seems like the code is trying to reject any custom-path join
types, or at least this barely-intelligible comment seems to imply that:

                 * Just skip anything else. We don't know if corresponding
                 * plan would build the output row from whole-row references
                 * of base relations and execute the EPQ checks.

But this coding fails to notice any such join type that's below the
level of the immediate two join inputs.

We've probably managed to not notice this so far because foreign joins
generally ought to dominate any local join method, so that there wouldn't
often be cases where the surviving paths use local joins for input
sub-joins.  But Jeff's test case proves it can happen.

I kind of wonder why this infrastructure exists at all; it's not the way
I'd have foreseen handling EPQ for remote joins.  However, while "throw
it away and start again" might be good advice going forward, I suppose
it won't be very popular for applying to 9.6.

One way that we could make things better is to rely on the knowledge
that EPQ isn't asked to evaluate joins for more than one row per input
relation, and therefore using merge or hash join technology is really
overkill.  We could make a tree of simple nestloop joins, which aren't
going to care about sort order, if we could identify the correct join
clauses to apply.  At least some of that could be laid off on the FDW,
which if it's gotten this far at all, ought to know what join clauses
need to be enforced by the foreign join.  So I'm thinking a little bit
in terms of "just collect the foreign scans for all the base rels
included in the join and then build a cross-product nestloop join tree,
applying all the join clauses at the top".  This would have the signal
value that it's guaranteed to succeed and so can be left for later,
rather than having to expensively redo it at each level of join planning.

(Hm, that does sound a lot like "throw it away and start again", doesn't
it.  But what we've got here is busted enough that I'm not sure there
are good alternatives.  Maybe for 9.6 we could just rewrite
GetExistingLocalJoinPath, and limp along doing a lot of redundant
computation during planning.)

BTW, what's "existing" about the result of GetExistingLocalJoinPath?
And for that matter, what's "outer" about the content of fdw_outerpath?
Good luck figuring that out from the documentation of the field, all
zero words of it.

                        regards, tom lane


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: postgres_fdw bug in 9.6

Ashutosh Bapat
On Tue, Dec 13, 2016 at 4:15 AM, Tom Lane <[hidden email]> wrote:

> Jeff Janes <[hidden email]> writes:
>> I have a test case where I made the fdw connect back to itself, and
>> stripped out all the objects that I could and still reproduce the case.  It
>> is large, 21MB compressed, 163MB uncompressed, so I am linking it here:
>> https://drive.google.com/open?id=0Bzqrh1SO9FcEZkpPM0JwUEMzcmc
>
> Thanks for the test case.  I believe I understand the fundamental problem,
> or one of the fundamental problems --- I'm not exactly convinced there
> aren't several here.
>
> The key issue is that GetExistingLocalJoinPath believes it can return
> any random join path as what to use for the fdw_outerpath of a foreign
> join.  You can get away with that, perhaps, as long as you only consider
> two-way joins.  But in a nest of foreign joins, this fails to consider
> the interrelationships of join paths and their children.  In particular,
> what we have happening in this example is that GetExistingLocalJoinPath
> seizes randomly on a MergePath for an upper join relation, clones it,
> sees that the outer child is a join ForeignPath, and replaces that outer
> child with its fdw_outerpath ... which was chosen equally cavalierly by
> some previous execution of GetExistingLocalJoinPath, and does not have
> the sort ordering expected by the MergePath.  So we'd generate an invalid
> EPQ plan, except that the debug cross-check in create_mergejoin_plan
> notices the discrepancy.

Thanks for the detailed explanation.

>
> I believe there are probably more problems here, or at least if there
> aren't, it's not clear why not.  Because of GetExistingLocalJoinPath's
> lack of curiosity about what's underneath the join pathnode it picks,
> it seems to me that it's possible for it to return a path tree that
> *isn't* all local joins.  If we're looking at, say, a hash path for
> a 4-way join, whose left input is a hash path for a 3-way join, whose
> left input is a 2-way foreign join, what's stopping that from being
> returned as a "local" path tree?

Right, the so called "local join tree" can contain ForeignPath
representing joins between foreign relations. AFAIU what protects us
from getting into problem is that postgresRecheckForeignScan calls
ExecProcNode() on the outer plan. So, even if there are ForeignPaths
in fdw_outerpath tree, corresponding plans would use a local join plan
to apply EPQ recheck. The code in GetExistingLocalJoinPath() avoids
the redirection at least for the inner or outer ForeignPaths. Any
comment claiming that path tree under fdw_outerpath is entirely
"local" should be removed, if it survives the fix.

>
> Likewise, it seems like the code is trying to reject any custom-path join
> types, or at least this barely-intelligible comment seems to imply that:
>
>                  * Just skip anything else. We don't know if corresponding
>                  * plan would build the output row from whole-row references
>                  * of base relations and execute the EPQ checks.
>
> But this coding fails to notice any such join type that's below the
> level of the immediate two join inputs.

Similarly, here we assume that any custom plan would tell us whether
the given row survives EPQ recheck. As long as a custom plan doing
that correctly, it shouldn't be a problem.

>
> We've probably managed to not notice this so far because foreign joins
> generally ought to dominate any local join method, so that there wouldn't
> often be cases where the surviving paths use local joins for input
> sub-joins.  But Jeff's test case proves it can happen.
>
> I kind of wonder why this infrastructure exists at all; it's not the way
> I'd have foreseen handling EPQ for remote joins.  However, while "throw
> it away and start again" might be good advice going forward, I suppose
> it won't be very popular for applying to 9.6.
>
> One way that we could make things better is to rely on the knowledge
> that EPQ isn't asked to evaluate joins for more than one row per input
> relation, and therefore using merge or hash join technology is really
> overkill.  We could make a tree of simple nestloop joins, which aren't
> going to care about sort order, if we could identify the correct join
> clauses to apply.  At least some of that could be laid off on the FDW,
> which if it's gotten this far at all, ought to know what join clauses
> need to be enforced by the foreign join.  So I'm thinking a little bit
> in terms of "just collect the foreign scans for all the base rels
> included in the join and then build a cross-product nestloop join tree,
> applying all the join clauses at the top".  This would have the signal
> value that it's guaranteed to succeed and so can be left for later,
> rather than having to expensively redo it at each level of join planning.

I am not able to understand how this strategy of applying all join
clauses on top of cross product would work if there are OUTER joins
involved. Wouldn't nullable sides change the result?

>
> (Hm, that does sound a lot like "throw it away and start again", doesn't
> it.  But what we've got here is busted enough that I'm not sure there
> are good alternatives.  Maybe for 9.6 we could just rewrite
> GetExistingLocalJoinPath, and limp along doing a lot of redundant
> computation during planning.)

A possible short-term fix may be this: instead of picking any random
path to stick into fdw_outerpath, we choose a path which covers the
pathkeys of ForeignPath. If postgres_fdw was able to come up with a
ForeignPath with those pathkeys, there is high chance that there
exists a local path with those pathkeys. Since we are yet to call
add_path() with the ForeignPath being built, that local path should
exist in the path list. Stick that path in fdw_outerpath. If such a
path doesn't exist, return NULL from GetExistingLocalJoinPath().
postgres_fdw wouldn't push down the join in that case and we will
loose some optimization, but that might be better than throwing an
error.

I agree that a join tree consisting of nested loop joins would be more
efficient in this case.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: postgres_fdw bug in 9.6

Robert Haas
In reply to this post by Tom Lane-2
On Mon, Dec 12, 2016 at 5:45 PM, Tom Lane <[hidden email]> wrote:

> One way that we could make things better is to rely on the knowledge
> that EPQ isn't asked to evaluate joins for more than one row per input
> relation, and therefore using merge or hash join technology is really
> overkill.  We could make a tree of simple nestloop joins, which aren't
> going to care about sort order, if we could identify the correct join
> clauses to apply.  At least some of that could be laid off on the FDW,
> which if it's gotten this far at all, ought to know what join clauses
> need to be enforced by the foreign join.  So I'm thinking a little bit
> in terms of "just collect the foreign scans for all the base rels
> included in the join and then build a cross-product nestloop join tree,
> applying all the join clauses at the top".  This would have the signal
> value that it's guaranteed to succeed and so can be left for later,
> rather than having to expensively redo it at each level of join planning.
>
> (Hm, that does sound a lot like "throw it away and start again", doesn't
> it.  But what we've got here is busted enough that I'm not sure there
> are good alternatives.  Maybe for 9.6 we could just rewrite
> GetExistingLocalJoinPath, and limp along doing a lot of redundant
> computation during planning.)

I think there was an earlier version of the patch that actually built
up NestLoop joins as it went, but at some point (possibly at my
suggestion) it got rewritten to try to fish out existing paths
instead.  Apparently, the other idea was better.

> BTW, what's "existing" about the result of GetExistingLocalJoinPath?

It's an existing path - i.e. already added to the RelOptInfo - to
perform the join locally.

> And for that matter, what's "outer" about the content of fdw_outerpath?

It's got the same meaning here that "outer path" does in general.
Right now, a Foreign Scan only has an outer subpath if it needs one
for EPQ rechecks, but in theory I suppose it could be used for
something else; every Plan has a lefttree, whether or not we choose to
populate it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: postgres_fdw bug in 9.6

Etsuro Fujita
In reply to this post by Ashutosh Bapat
On 2016/12/13 23:13, Ashutosh Bapat wrote:
> On Tue, Dec 13, 2016 at 4:15 AM, Tom Lane <[hidden email]> wrote:
>> I believe there are probably more problems here, or at least if there
>> aren't, it's not clear why not.  Because of GetExistingLocalJoinPath's
>> lack of curiosity about what's underneath the join pathnode it picks,
>> it seems to me that it's possible for it to return a path tree that
>> *isn't* all local joins.  If we're looking at, say, a hash path for
>> a 4-way join, whose left input is a hash path for a 3-way join, whose
>> left input is a 2-way foreign join, what's stopping that from being
>> returned as a "local" path tree?

> Right, the so called "local join tree" can contain ForeignPath
> representing joins between foreign relations. AFAIU what protects us
> from getting into problem is that postgresRecheckForeignScan calls
> ExecProcNode() on the outer plan. So, even if there are ForeignPaths
> in fdw_outerpath tree, corresponding plans would use a local join plan
> to apply EPQ recheck. The code in GetExistingLocalJoinPath() avoids
> the redirection at least for the inner or outer ForeignPaths.

I think Ashutosh is right.

> Any
> comment claiming that path tree under fdw_outerpath is entirely
> "local" should be removed, if it survives the fix.

+1

>> Likewise, it seems like the code is trying to reject any custom-path join
>> types, or at least this barely-intelligible comment seems to imply that:
>>
>>                  * Just skip anything else. We don't know if corresponding
>>                  * plan would build the output row from whole-row references
>>                  * of base relations and execute the EPQ checks.
>>
>> But this coding fails to notice any such join type that's below the
>> level of the immediate two join inputs.

I wrote the first version of GetExistingLocalJoinPath (and
postgresRecheckForeignScan), to verify that the changes to core for
pushing down foreign/custom joins in 9.5 would work well for EPQ
rechecks.  And I rejected custom paths in that version because I
intended to use that function not only for foreign joins but custom
joins.  (IIRC, for 9.6, I proposed to put GetExistingLocalJoinPath in a
more common area such as src/backend/optimizer/path/joinpath.c, but that
was ignored...)

>> I kind of wonder why this infrastructure exists at all; it's not the way
>> I'd have foreseen handling EPQ for remote joins.  However, while "throw
>> it away and start again" might be good advice going forward, I suppose
>> it won't be very popular for applying to 9.6.
>>
>> One way that we could make things better is to rely on the knowledge
>> that EPQ isn't asked to evaluate joins for more than one row per input
>> relation, and therefore using merge or hash join technology is really
>> overkill.  We could make a tree of simple nestloop joins, which aren't
>> going to care about sort order, if we could identify the correct join
>> clauses to apply.  At least some of that could be laid off on the FDW,
>> which if it's gotten this far at all, ought to know what join clauses
>> need to be enforced by the foreign join.  So I'm thinking a little bit
>> in terms of "just collect the foreign scans for all the base rels
>> included in the join and then build a cross-product nestloop join tree,
>> applying all the join clauses at the top".  This would have the signal
>> value that it's guaranteed to succeed and so can be left for later,
>> rather than having to expensively redo it at each level of join planning.

> I am not able to understand how this strategy of applying all join
> clauses on top of cross product would work if there are OUTER joins
> involved. Wouldn't nullable sides change the result?

I'm not able to understand that, either.

>> (Hm, that does sound a lot like "throw it away and start again", doesn't
>> it.  But what we've got here is busted enough that I'm not sure there
>> are good alternatives.  Maybe for 9.6 we could just rewrite
>> GetExistingLocalJoinPath, and limp along doing a lot of redundant
>> computation during planning.)

An alternative I was thinking was (1) to create an equivalent nestloop
join path for each foreign/custom join path and store it in
fdw_outerpath as-is, except when that join path implements a full join,
in which case a merge or hash join path is created, and (2) to apply
postgresRecheckForeignScan to the outer subplan created from the
fdw_outerpath when executing EPQ rechecks.  So, I was thinking to
provide a helper function that creates the equivalent local join path
for a given foreign/custom join path in #1.

> A possible short-term fix may be this: instead of picking any random
> path to stick into fdw_outerpath, we choose a path which covers the
> pathkeys of ForeignPath. If postgres_fdw was able to come up with a
> ForeignPath with those pathkeys, there is high chance that there
> exists a local path with those pathkeys. Since we are yet to call
> add_path() with the ForeignPath being built, that local path should
> exist in the path list. Stick that path in fdw_outerpath. If such a
> path doesn't exist, return NULL from GetExistingLocalJoinPath().
> postgres_fdw wouldn't push down the join in that case and we will
> loose some optimization, but that might be better than throwing an
> error.

Seems reasonable.

Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: postgres_fdw bug in 9.6

Tom Lane-2
Etsuro Fujita <[hidden email]> writes:
> On 2016/12/13 23:13, Ashutosh Bapat wrote:
>> On Tue, Dec 13, 2016 at 4:15 AM, Tom Lane <[hidden email]> wrote:
>>> One way that we could make things better is to rely on the knowledge
>>> that EPQ isn't asked to evaluate joins for more than one row per input
>>> relation, and therefore using merge or hash join technology is really
>>> overkill.  We could make a tree of simple nestloop joins, which aren't
>>> going to care about sort order, if we could identify the correct join
>>> clauses to apply.

>> I am not able to understand how this strategy of applying all join
>> clauses on top of cross product would work if there are OUTER joins
>> involved. Wouldn't nullable sides change the result?

> I'm not able to understand that, either.

Yeah, you'd have to reproduce the outer join structure if any.

>> A possible short-term fix may be this: instead of picking any random
>> path to stick into fdw_outerpath, we choose a path which covers the
>> pathkeys of ForeignPath.

> Seems reasonable.

No, because GetExistingLocalJoinPath is called once per relation not once
per path.  Even if we were willing to eat the overhead of calling it once
per path, we'd have to give up considering foreign paths with sort orders
that there wasn't any cheap way to produce locally.

                        regards, tom lane


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: postgres_fdw bug in 9.6

Etsuro Fujita
On 2016/12/16 1:39, Tom Lane wrote:
> Etsuro Fujita <[hidden email]> writes:
>> On 2016/12/13 23:13, Ashutosh Bapat wrote:
>>> A possible short-term fix may be this: instead of picking any random
>>> path to stick into fdw_outerpath, we choose a path which covers the
>>> pathkeys of ForeignPath.

>> Seems reasonable.

> No, because GetExistingLocalJoinPath is called once per relation not once
> per path.  Even if we were willing to eat the overhead of calling it once
> per path, we'd have to give up considering foreign paths with sort orders
> that there wasn't any cheap way to produce locally.

Hmm, I agree on that point that giving it up might result in a bad plan.

As I said upthread, an alternative I am thinking is (1) to create an
equivalent nestloop join path using inner/outer paths of a foreign join
path, except when that join path implements a full join, in which case a
merge/hash join path is used, (2) store it in fdw_outerpath as-is, and
(3) during an EPQ recheck, apply postgresRecheckForeignScan to the outer
subplan created from the fdw_outerpath as-is.  What do you think about that?

Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: postgres_fdw bug in 9.6

Etsuro Fujita
On 2016/12/16 11:25, Etsuro Fujita wrote:
> As I said upthread, an alternative I am thinking is (1) to create an
> equivalent nestloop join path using inner/outer paths of a foreign join
> path, except when that join path implements a full join, in which case a
> merge/hash join path is used, (2) store it in fdw_outerpath as-is, and
> (3) during an EPQ recheck, apply postgresRecheckForeignScan to the outer
> subplan created from the fdw_outerpath as-is.  What do you think about
> that?

Let me explain about #1 and #2 a bit more.  What I have in mind is:

* modify postgresGetForeignPaths so that a simple foreign table scan
path is stored into the base relation's PgFdwRelationInfo.
* modify postgresGetForeignJoinPaths so that
     (a) a local join path for a 2-way foreign join is created using
simple foreign table scan paths stored in the base relations'
PgFdwRelationInfos, and stored into the join relation's PgFdwRelationInfo.
     (b) a local join path for a 3-way foreign join, whose left input is
a 2-way foreign join, is created using a local join path stored in the
left input join relation's PgFdwRelationInfo and a simple foreign table
scan path stored into the right input base relation's PgFdwRelationInfo.
     (c) Likewise for higher level foreign joins.
     (d) local join paths created are passed to create_foreignscan_path
and stored into the fdw_outerpaths of the resulting ForeignPahts.

I think that that would need special handling for foreign joins with
sort orders; add an explicit sort to the local join paths, if needed.

I am thinking to provide a helper function that creates a local join
path for (a), (b), and (c).

Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: postgres_fdw bug in 9.6

Tom Lane-2
Etsuro Fujita <[hidden email]> writes:
> On 2016/12/16 11:25, Etsuro Fujita wrote:
>> As I said upthread, an alternative I am thinking is (1) to create an
>> equivalent nestloop join path using inner/outer paths of a foreign join
>> path, except when that join path implements a full join, in which case a
>> merge/hash join path is used, (2) store it in fdw_outerpath as-is, and
>> (3) during an EPQ recheck, apply postgresRecheckForeignScan to the outer
>> subplan created from the fdw_outerpath as-is.  What do you think about
>> that?

> Let me explain about #1 and #2 a bit more.  What I have in mind is:

> * modify postgresGetForeignPaths so that a simple foreign table scan
> path is stored into the base relation's PgFdwRelationInfo.
> * modify postgresGetForeignJoinPaths so that
>      (a) a local join path for a 2-way foreign join is created using
> simple foreign table scan paths stored in the base relations'
> PgFdwRelationInfos, and stored into the join relation's PgFdwRelationInfo.
>      (b) a local join path for a 3-way foreign join, whose left input is
> a 2-way foreign join, is created using a local join path stored in the
> left input join relation's PgFdwRelationInfo and a simple foreign table
> scan path stored into the right input base relation's PgFdwRelationInfo.
>      (c) Likewise for higher level foreign joins.
>      (d) local join paths created are passed to create_foreignscan_path
> and stored into the fdw_outerpaths of the resulting ForeignPahts.

Hm, isn't this overcomplicated?  As you said earlier, we don't really
care all that much whether the fdw_outerpath is free of lower foreign
joins, because EPQ setup will select those lower join's substitute EPQ
plans anyway.  All that we need is that the EPQ tree be a legal
implementation of the join order with join quals applied at the right
places.  So I think the rule could be

"When first asked to produce a path for a given foreign joinrel, collect
the cheapest paths for its left and right inputs, and make a nestloop path
(or hashjoin path, if full join) from those, using the join quals needed
for the current input relation pair.  Use this as the fdw_outerpath for
all foreign paths made for the joinrel."

The important point here is that we avoid using a merge join because that
has assumptions about input ordering that likely won't be satisfied by
the child paths chosen through this method.  (I guess you could fall back
to it for the case of no quals in a fulljoin, because then the ordering
assumptions are vacuous anyway.)

                        regards, tom lane


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: postgres_fdw bug in 9.6

Etsuro Fujita
On 2016/12/17 1:13, Tom Lane wrote:
> Etsuro Fujita <[hidden email]> writes:
>> On 2016/12/16 11:25, Etsuro Fujita wrote:
>>> As I said upthread, an alternative I am thinking is (1) to create an
>>> equivalent nestloop join path using inner/outer paths of a foreign join
>>> path, except when that join path implements a full join, in which case a
>>> merge/hash join path is used, (2) store it in fdw_outerpath as-is, and
>>> (3) during an EPQ recheck, apply postgresRecheckForeignScan to the outer
>>> subplan created from the fdw_outerpath as-is.  What do you think about
>>> that?

>> Let me explain about #1 and #2 a bit more.  What I have in mind is:

>> * modify postgresGetForeignPaths so that a simple foreign table scan
>> path is stored into the base relation's PgFdwRelationInfo.
>> * modify postgresGetForeignJoinPaths so that
>>      (a) a local join path for a 2-way foreign join is created using
>> simple foreign table scan paths stored in the base relations'
>> PgFdwRelationInfos, and stored into the join relation's PgFdwRelationInfo.
>>      (b) a local join path for a 3-way foreign join, whose left input is
>> a 2-way foreign join, is created using a local join path stored in the
>> left input join relation's PgFdwRelationInfo and a simple foreign table
>> scan path stored into the right input base relation's PgFdwRelationInfo.
>>      (c) Likewise for higher level foreign joins.
>>      (d) local join paths created are passed to create_foreignscan_path
>> and stored into the fdw_outerpaths of the resulting ForeignPahts.

> Hm, isn't this overcomplicated?  As you said earlier, we don't really
> care all that much whether the fdw_outerpath is free of lower foreign
> joins, because EPQ setup will select those lower join's substitute EPQ
> plans anyway.  All that we need is that the EPQ tree be a legal
> implementation of the join order with join quals applied at the right
> places.

Exactly.  I thought the EPQ trees without lower foreign joins would be
better because that would be easier to understand.

> So I think the rule could be

> "When first asked to produce a path for a given foreign joinrel, collect
> the cheapest paths for its left and right inputs, and make a nestloop path
> (or hashjoin path, if full join) from those, using the join quals needed
> for the current input relation pair.

Seems reasonable.

> Use this as the fdw_outerpath for
> all foreign paths made for the joinrel."

I'm not sure that would work well for foreign joins with sort orders.
Consider a merge join, whose left input is a 2-way foreign join with a
sort order that implements a full join and whose right input is a sorted
local table scan.  If the EPQ subplan for the foreign join wouldn't
produce the right sort order, the merge join might break during EPQ
rechecks (note that in this case the EPQ subplan for the foreign join
might produce more than a single row during an EPQ recheck).  So, I
think we would need to add an explicit sort to the fdw_outerpath for the
foreign join.

> The important point here is that we avoid using a merge join because that
> has assumptions about input ordering that likely won't be satisfied by
> the child paths chosen through this method.  (I guess you could fall back
> to it for the case of no quals in a fulljoin, because then the ordering
> assumptions are vacuous anyway.)

I agree on that point.  I'll create a patch.

Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: postgres_fdw bug in 9.6

Ashutosh Bapat
In reply to this post by Tom Lane-2
On Fri, Dec 16, 2016 at 9:43 PM, Tom Lane <[hidden email]> wrote:

> Etsuro Fujita <[hidden email]> writes:
>> On 2016/12/16 11:25, Etsuro Fujita wrote:
>>> As I said upthread, an alternative I am thinking is (1) to create an
>>> equivalent nestloop join path using inner/outer paths of a foreign join
>>> path, except when that join path implements a full join, in which case a
>>> merge/hash join path is used, (2) store it in fdw_outerpath as-is, and
>>> (3) during an EPQ recheck, apply postgresRecheckForeignScan to the outer
>>> subplan created from the fdw_outerpath as-is.  What do you think about
>>> that?
>
>> Let me explain about #1 and #2 a bit more.  What I have in mind is:
>
>> * modify postgresGetForeignPaths so that a simple foreign table scan
>> path is stored into the base relation's PgFdwRelationInfo.
>> * modify postgresGetForeignJoinPaths so that
>>      (a) a local join path for a 2-way foreign join is created using
>> simple foreign table scan paths stored in the base relations'
>> PgFdwRelationInfos, and stored into the join relation's PgFdwRelationInfo.
>>      (b) a local join path for a 3-way foreign join, whose left input is
>> a 2-way foreign join, is created using a local join path stored in the
>> left input join relation's PgFdwRelationInfo and a simple foreign table
>> scan path stored into the right input base relation's PgFdwRelationInfo.
>>      (c) Likewise for higher level foreign joins.
>>      (d) local join paths created are passed to create_foreignscan_path
>> and stored into the fdw_outerpaths of the resulting ForeignPahts.
>
> Hm, isn't this overcomplicated?  As you said earlier, we don't really
> care all that much whether the fdw_outerpath is free of lower foreign
> joins, because EPQ setup will select those lower join's substitute EPQ
> plans anyway.  All that we need is that the EPQ tree be a legal
> implementation of the join order with join quals applied at the right
> places.  So I think the rule could be
>
> "When first asked to produce a path for a given foreign joinrel, collect
> the cheapest paths for its left and right inputs, and make a nestloop path
> (or hashjoin path, if full join) from those, using the join quals needed
> for the current input relation pair.  Use this as the fdw_outerpath for
> all foreign paths made for the joinrel."

We could use fdw_outerpath of the left and right inputs instead of
looking for the cheapest paths. For base relations as left or right
relations, use the unparameterized cheapest foreign path. This will
ensure that all joins in fdw_outerpath are local joins, making EPQ
checks slightly efficient by avoiding redirection at every foreign
path.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: postgres_fdw bug in 9.6

Etsuro Fujita
On 2016/12/19 13:59, Ashutosh Bapat wrote:

> On Fri, Dec 16, 2016 at 9:43 PM, Tom Lane <[hidden email]> wrote:
>> Etsuro Fujita <[hidden email]> writes:
>>> On 2016/12/16 11:25, Etsuro Fujita wrote:
>>>> As I said upthread, an alternative I am thinking is (1) to create an
>>>> equivalent nestloop join path using inner/outer paths of a foreign join
>>>> path, except when that join path implements a full join, in which case a
>>>> merge/hash join path is used, (2) store it in fdw_outerpath as-is, and
>>>> (3) during an EPQ recheck, apply postgresRecheckForeignScan to the outer
>>>> subplan created from the fdw_outerpath as-is.  What do you think about
>>>> that?

>>> Let me explain about #1 and #2 a bit more.  What I have in mind is:

>>> * modify postgresGetForeignPaths so that a simple foreign table scan
>>> path is stored into the base relation's PgFdwRelationInfo.
>>> * modify postgresGetForeignJoinPaths so that
>>>      (a) a local join path for a 2-way foreign join is created using
>>> simple foreign table scan paths stored in the base relations'
>>> PgFdwRelationInfos, and stored into the join relation's PgFdwRelationInfo.
>>>      (b) a local join path for a 3-way foreign join, whose left input is
>>> a 2-way foreign join, is created using a local join path stored in the
>>> left input join relation's PgFdwRelationInfo and a simple foreign table
>>> scan path stored into the right input base relation's PgFdwRelationInfo.
>>>      (c) Likewise for higher level foreign joins.
>>>      (d) local join paths created are passed to create_foreignscan_path
>>> and stored into the fdw_outerpaths of the resulting ForeignPahts.

>> Hm, isn't this overcomplicated?  As you said earlier, we don't really
>> care all that much whether the fdw_outerpath is free of lower foreign
>> joins, because EPQ setup will select those lower join's substitute EPQ
>> plans anyway.  All that we need is that the EPQ tree be a legal
>> implementation of the join order with join quals applied at the right
>> places.  So I think the rule could be
>>
>> "When first asked to produce a path for a given foreign joinrel, collect
>> the cheapest paths for its left and right inputs, and make a nestloop path
>> (or hashjoin path, if full join) from those, using the join quals needed
>> for the current input relation pair.  Use this as the fdw_outerpath for
>> all foreign paths made for the joinrel."

> We could use fdw_outerpath of the left and right inputs instead of
> looking for the cheapest paths. For base relations as left or right
> relations, use the unparameterized cheapest foreign path. This will
> ensure that all joins in fdw_outerpath are local joins, making EPQ
> checks slightly efficient by avoiding redirection at every foreign
> path.

That seems close to what I had in mind described above.  One additional
work required for that would to store the fdw_outerpath into the
RelOptInfo's fdw_private such as PgFdwRelationInfo before add_path for
the foreign-join path containing the fdw_outerpath, because add_path
might reject the foreign-join path.  I think the additional work would
make that rather complicated.

Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: postgres_fdw bug in 9.6

Tom Lane-2
In reply to this post by Etsuro Fujita
Etsuro Fujita <[hidden email]> writes:
> On 2016/12/17 1:13, Tom Lane wrote:
>> So I think the rule could be

>> "When first asked to produce a path for a given foreign joinrel, collect
>> the cheapest paths for its left and right inputs, and make a nestloop path
>> (or hashjoin path, if full join) from those, using the join quals needed
>> for the current input relation pair.

> Seems reasonable.

>> Use this as the fdw_outerpath for
>> all foreign paths made for the joinrel."

> I'm not sure that would work well for foreign joins with sort orders.
> Consider a merge join, whose left input is a 2-way foreign join with a
> sort order that implements a full join and whose right input is a sorted
> local table scan.  If the EPQ subplan for the foreign join wouldn't
> produce the right sort order, the merge join might break during EPQ
> rechecks (note that in this case the EPQ subplan for the foreign join
> might produce more than a single row during an EPQ recheck).

How so?  We only recheck one row at a time, therefore it can be claimed to
have any sort order you care about.

                        regards, tom lane


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: postgres_fdw bug in 9.6

Etsuro Fujita
On 2016/12/20 0:37, Tom Lane wrote:
> Etsuro Fujita <[hidden email]> writes:
>> On 2016/12/17 1:13, Tom Lane wrote:
>>> So I think the rule could be

>>> "When first asked to produce a path for a given foreign joinrel, collect
>>> the cheapest paths for its left and right inputs, and make a nestloop path
>>> (or hashjoin path, if full join) from those, using the join quals needed
>>> for the current input relation pair.

>> Seems reasonable.

>>> Use this as the fdw_outerpath for
>>> all foreign paths made for the joinrel."

>> I'm not sure that would work well for foreign joins with sort orders.
>> Consider a merge join, whose left input is a 2-way foreign join with a
>> sort order that implements a full join and whose right input is a sorted
>> local table scan.  If the EPQ subplan for the foreign join wouldn't
>> produce the right sort order, the merge join might break during EPQ
>> rechecks (note that in this case the EPQ subplan for the foreign join
>> might produce more than a single row during an EPQ recheck).

> How so?  We only recheck one row at a time, therefore it can be claimed to
> have any sort order you care about.

I'll have second thoughts about that.  I agree with you except for that,
so I've created a patch; I removed GetExistingLocalJoinPath and added a
helper function, CreateLocalJoinPath, that generates a local join path
for a given foreign join, as described above.  Please find attached a patch.

Best regards,
Etsuro Fujita


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

epqpath-for-foreignjoin.patch (36K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: postgres_fdw bug in 9.6

Ashutosh Bapat
Some review comments

1. postgres_fdw doesn't push down semi and anti joins so you may want to
discount these two too.
+           jointype == JOIN_SEMI ||
+           jointype == JOIN_ANTI);

2. We should try to look for other not-so-cheap paths if the cheapest one is
paramterized. You might want to use get_cheapest_path_for_pathkeys() to find a
suitable unparameterized path by passing NULL for required_outer and NIL for
pathkeys, that's a very strange usage, but I think it will serve the purpose.
On the thread we discussed that we should save the epq_path create for lower
join and use it here. That may be better than searching for a path.
+    /* Give up if the cheapest-total-cost paths are parameterized. */
+    if (!bms_is_empty(PATH_REQ_OUTER(outer_path)) ||
+        !bms_is_empty(PATH_REQ_OUTER(inner_path)))
+        return NULL;

3. Adding new members to JoinPathExtraData may save some work for postgres_fdw
and other FDWs which would use CreateLocalJoinPath(), but it will add few bytes
to the structure even when there is no "FULL foreign join which requires EPQ"
involved in the query. That may not be so much of memory overhead since the
structure is used locally to add_paths_to_joinrel(), but it might be something
to think about. Instead, what if we call select_mergejoin_clauses() within
CreateLocalJoinPath() to get that information?

3. Talking about saving some CPU cycles - if a clauseless full join can be
implemented only using merge join, probably that's the only path available in
the list of paths for the given relation. Instead of building the same path
anew, should we use the existing path like GetExistingLocalJoinPath() for that
case? In fact, I am wondering whether we should look for an existing nestloop
path for all joins except full, in which case we should look for merge or hash
join path. We go on building a new path, if only there isn't an existing one.
That will certainly save some CPU cycles spent in costing the path.

4. Following comment mentions only hash join, but the code creates a merge join
or hash join path.
     * If the jointype is JOIN_FULL, try to create a hashjoin join path from

5. Per comment below a clauseless full join can be implemented using only merge
join. Why are we checking extra->mergejoin_allowed? Shouldn't it be true here?
        /*
         * Special corner case: for "x FULL JOIN y ON true", there will be no
         * join clauses at all.  Note that mergejoin is our only join type
         * that supports FULL JOIN without any join clauses, and in that case
         * it doesn't require the input paths to be well ordered, so generate
         * a clauseless mergejoin path from the cheapest-total-cost paths.
         */
        if (extra->mergejoin_allowed && !extra->mergeclause_list)

Rethinking about the problem, the error comes because the inner or outer plan
of a merge join plan doesn't have pathkeys required by the merge join. This
happens because the merge join path had foreign path with pathkeys as inner or
outer path and corresponding fdw_outerpath didn't have those pathkeys. I am
wondering whether the easy and possibly correct solution here is to not replace
a ForeignPath with fdw_outerpath in GetExistingLocalJoinPath()? If we don't do
that, there won't be error building merge join plan and
postgresRecheckForeignScan() would correctly route the EPQ checks to the local
plan available as outer plan. Attached patch with that code removed.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

epq_plan_failure.patch (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: postgres_fdw bug in 9.6

Robert Haas
On Wed, Dec 21, 2016 at 11:04 AM, Ashutosh Bapat
<[hidden email]> wrote:
> Some review comments
>
> 1. postgres_fdw doesn't push down semi and anti joins so you may want to
> discount these two too.
> +           jointype == JOIN_SEMI ||
> +           jointype == JOIN_ANTI);

But in the future, it might.  We shouldn't randomly leave foot-guns
lying around if there's an easy alternative.

> 3. Adding new members to JoinPathExtraData may save some work for postgres_fdw
> and other FDWs which would use CreateLocalJoinPath(), but it will add few bytes
> to the structure even when there is no "FULL foreign join which requires EPQ"
> involved in the query. That may not be so much of memory overhead since the
> structure is used locally to add_paths_to_joinrel(), but it might be something
> to think about. Instead, what if we call select_mergejoin_clauses() within
> CreateLocalJoinPath() to get that information?

I think that's exactly backwards.  The few bytes of storage don't
matter, but extra CPU cycles might.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
1234 ... 6
Previous Thread Next Thread