How to retain lesser paths at add_path()?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|

How to retain lesser paths at add_path()?

Kohei KaiGai-4
Hello,

When we add a new path using add_path(), it checks estimated cost and path-keys,
then it also removes dominated paths, if any.
Do we have a reasonable way to retain these "dominated" paths? Once it
is considered
lesser paths at a level, however, it may have a combined cheaper cost
with upper pathnode.

In my case, PG-Strom adds CustomPath to support JOIN/GROUP BY
workloads that utilizes
GPU and NVME storage. If GpuPreAgg and GpuJoin are executed
continuously, output buffer
of GpuJoin simultaneously performs as input buffer of GpuPreAgg on GPU
device memory.
So, it allows to avoid senseless DMA transfer between GPU and CPU/RAM.
This behavior
affects cost estimation during path construction steps - GpuPreAgg
discount DMA cost if its
input path is GpuJoin.
On the other hands, it looks to me add_path() does not consider upper
level optimization
other than sorting path-keys. As long as we can keep these "lesser"
pathnodes that has
further optimization chance, it will help making more reasonable query plan.

Do we have any reasonable way to retain these paths at add_path() even
if it is dominated
by other paths? Any idea welcome.

Best regards,

[*] GpuJoin + GpuPreAgg combined GPU kernel
https://www.slideshare.net/kaigai/20181016pgconfeussd2gpumulti/13
--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <[hidden email]>


Reply | Threaded
Open this post in threaded view
|

Re: How to retain lesser paths at add_path()?

Tom Lane-2
Kohei KaiGai <[hidden email]> writes:
> When we add a new path using add_path(), it checks estimated cost and path-keys,
> then it also removes dominated paths, if any.
> Do we have a reasonable way to retain these "dominated" paths? Once it
> is considered
> lesser paths at a level, however, it may have a combined cheaper cost
> with upper pathnode.

You do *not* want to have add_path fail to remove dominated paths in
general.  Don't even think about it, because otherwise you will have
plenty of time to regret your folly while you wait for the planner
to chew through an exponential number of possible join plans.

What you'd want to do for something like the above, I think, is to
have some kind of figure of merit or other special marking for paths
that will have some possible special advantage in later planning
steps.  Then you can teach add_path that that's another dimension it
should consider, in the same way that paths with different sort orders
or parallizability attributes don't dominate each other.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: How to retain lesser paths at add_path()?

Robert Haas
On Wed, Jul 31, 2019 at 11:07 AM Tom Lane <[hidden email]> wrote:
> What you'd want to do for something like the above, I think, is to
> have some kind of figure of merit or other special marking for paths
> that will have some possible special advantage in later planning
> steps.  Then you can teach add_path that that's another dimension it
> should consider, in the same way that paths with different sort orders
> or parallizability attributes don't dominate each other.

Yeah, but I have to admit that this whole design makes me kinda
uncomfortable.  Every time somebody comes up with a new figure of
merit, it increases not only the number of paths retained but also the
cost of comparing two paths to possibly reject one of them. A few
years ago, you came up with the (good) idea of rejecting some join
paths before actually creating the paths, and I wonder if we ought to
try to go further with that somehow. Or maybe, as Peter Geoghegan, has
been saying, we ought to think about planning top-down with
memoization instead of bottom up (yeah, I know that's a huge change).
It just feels like the whole idea of a list of paths ordered by cost
breaks down when there are so many ways that a not-cheapest path can
still be worth keeping. Not sure exactly what would be better, though.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: How to retain lesser paths at add_path()?

Tom Lane-2
Robert Haas <[hidden email]> writes:

> Yeah, but I have to admit that this whole design makes me kinda
> uncomfortable.  Every time somebody comes up with a new figure of
> merit, it increases not only the number of paths retained but also the
> cost of comparing two paths to possibly reject one of them. A few
> years ago, you came up with the (good) idea of rejecting some join
> paths before actually creating the paths, and I wonder if we ought to
> try to go further with that somehow. Or maybe, as Peter Geoghegan, has
> been saying, we ought to think about planning top-down with
> memoization instead of bottom up (yeah, I know that's a huge change).
> It just feels like the whole idea of a list of paths ordered by cost
> breaks down when there are so many ways that a not-cheapest path can
> still be worth keeping. Not sure exactly what would be better, though.

Yeah, I agree that add_path is starting to feel creaky.  I don't
know what to do instead though.  Changing to a top-down design
sounds like it would solve some problems while introducing others
(not to mention the amount of work and breakage involved).

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: How to retain lesser paths at add_path()?

Kohei KaiGai-4
2019年8月1日(木) 1:41 Tom Lane <[hidden email]>:

>
> Robert Haas <[hidden email]> writes:
> > Yeah, but I have to admit that this whole design makes me kinda
> > uncomfortable.  Every time somebody comes up with a new figure of
> > merit, it increases not only the number of paths retained but also the
> > cost of comparing two paths to possibly reject one of them. A few
> > years ago, you came up with the (good) idea of rejecting some join
> > paths before actually creating the paths, and I wonder if we ought to
> > try to go further with that somehow. Or maybe, as Peter Geoghegan, has
> > been saying, we ought to think about planning top-down with
> > memoization instead of bottom up (yeah, I know that's a huge change).
> > It just feels like the whole idea of a list of paths ordered by cost
> > breaks down when there are so many ways that a not-cheapest path can
> > still be worth keeping. Not sure exactly what would be better, though.
>
> Yeah, I agree that add_path is starting to feel creaky.  I don't
> know what to do instead though.  Changing to a top-down design
> sounds like it would solve some problems while introducing others
> (not to mention the amount of work and breakage involved).
>
Hmm... It looks the problem we ought to revise about path construction
is much larger than my expectation, and uncertain for me how much works
are needed.

Although it might be a workaround until fundamental reconstruction,
how about to have a margin of estimated cost to reject paths?
Current add_path() immediately rejects lesser paths if its cost is
even a little more expensive than the compared one. One the other hands,
a little cost difference may turn back final optimizer decision in some cases.
For example, if we retain lesser paths that have less then 10% expensive
cost than the current cheapest path in the same level, we may be able to
keep the number of lesser paths retained and still use simple cost comparison
for path survive decision.

I understand it is not an essential re-design of path-construction logic, and
may have limitation. However, amount of works are reasonable and no side-
effect. (current behavior = 0% threshold).
How about your opinions?

Best regards,
--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <[hidden email]>


Reply | Threaded
Open this post in threaded view
|

Re: How to retain lesser paths at add_path()?

Richard Guo-2
On Thu, Aug 1, 2019 at 2:12 PM Kohei KaiGai <[hidden email]> wrote:
2019年8月1日(木) 1:41 Tom Lane <[hidden email]>:
>
> Robert Haas <[hidden email]> writes:
> > Yeah, but I have to admit that this whole design makes me kinda
> > uncomfortable.  Every time somebody comes up with a new figure of
> > merit, it increases not only the number of paths retained but also the
> > cost of comparing two paths to possibly reject one of them. A few
> > years ago, you came up with the (good) idea of rejecting some join
> > paths before actually creating the paths, and I wonder if we ought to
> > try to go further with that somehow. Or maybe, as Peter Geoghegan, has
> > been saying, we ought to think about planning top-down with
> > memoization instead of bottom up (yeah, I know that's a huge change).
> > It just feels like the whole idea of a list of paths ordered by cost
> > breaks down when there are so many ways that a not-cheapest path can
> > still be worth keeping. Not sure exactly what would be better, though.
>
> Yeah, I agree that add_path is starting to feel creaky.  I don't
> know what to do instead though.  Changing to a top-down design
> sounds like it would solve some problems while introducing others
> (not to mention the amount of work and breakage involved).
>
Hmm... It looks the problem we ought to revise about path construction
is much larger than my expectation, and uncertain for me how much works
are needed.

Although it might be a workaround until fundamental reconstruction,
how about to have a margin of estimated cost to reject paths?
Current add_path() immediately rejects lesser paths if its cost is
even a little more expensive than the compared one. One the other hands,

Hmm.. I don't think so. Currently add_path() uses fuzzy comparisons on
costs of two paths, although the fuzz factor (1%) is hard coded and not
user-controllable.


I understand it is not an essential re-design of path-construction logic, and
may have limitation. However, amount of works are reasonable and no side-
effect. (current behavior = 0% threshold).
How about your opinions?


How's about Tom's suggestion on adding another dimension in add_path()
to be considered, just like how it considers paths of better sort order
or parallel-safe?

Thanks
Richard
Reply | Threaded
Open this post in threaded view
|

Re: How to retain lesser paths at add_path()?

Kohei KaiGai-4
2019年8月1日(木) 16:19 Richard Guo <[hidden email]>:

>
> On Thu, Aug 1, 2019 at 2:12 PM Kohei KaiGai <[hidden email]> wrote:
>>
>> 2019年8月1日(木) 1:41 Tom Lane <[hidden email]>:
>> >
>> > Robert Haas <[hidden email]> writes:
>> > > Yeah, but I have to admit that this whole design makes me kinda
>> > > uncomfortable.  Every time somebody comes up with a new figure of
>> > > merit, it increases not only the number of paths retained but also the
>> > > cost of comparing two paths to possibly reject one of them. A few
>> > > years ago, you came up with the (good) idea of rejecting some join
>> > > paths before actually creating the paths, and I wonder if we ought to
>> > > try to go further with that somehow. Or maybe, as Peter Geoghegan, has
>> > > been saying, we ought to think about planning top-down with
>> > > memoization instead of bottom up (yeah, I know that's a huge change).
>> > > It just feels like the whole idea of a list of paths ordered by cost
>> > > breaks down when there are so many ways that a not-cheapest path can
>> > > still be worth keeping. Not sure exactly what would be better, though.
>> >
>> > Yeah, I agree that add_path is starting to feel creaky.  I don't
>> > know what to do instead though.  Changing to a top-down design
>> > sounds like it would solve some problems while introducing others
>> > (not to mention the amount of work and breakage involved).
>> >
>> Hmm... It looks the problem we ought to revise about path construction
>> is much larger than my expectation, and uncertain for me how much works
>> are needed.
>>
>> Although it might be a workaround until fundamental reconstruction,
>> how about to have a margin of estimated cost to reject paths?
>> Current add_path() immediately rejects lesser paths if its cost is
>> even a little more expensive than the compared one. One the other hands,
>
>
> Hmm.. I don't think so. Currently add_path() uses fuzzy comparisons on
> costs of two paths, although the fuzz factor (1%) is hard coded and not
> user-controllable.
>
Ah, sorry, I oversight this logic...

>> I understand it is not an essential re-design of path-construction logic, and
>> may have limitation. However, amount of works are reasonable and no side-
>> effect. (current behavior = 0% threshold).
>> How about your opinions?
>>
>
> How's about Tom's suggestion on adding another dimension in add_path()
> to be considered, just like how it considers paths of better sort order
> or parallel-safe?
>
Robert also mentioned it makes comparison operation more complicated.
If we try to have another dimension here, a callback function in Path node
may be able to tell the core optimizer whether "dominated path" shall be
dropped or not, without further complexity. It is just an idea.

Best regards,
--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <[hidden email]>


Reply | Threaded
Open this post in threaded view
|

Re: How to retain lesser paths at add_path()?

Tomas Vondra-4
On Thu, Aug 01, 2019 at 06:28:08PM +0900, Kohei KaiGai wrote:

>2019年8月1日(木) 16:19 Richard Guo <[hidden email]>:
>>
>> On Thu, Aug 1, 2019 at 2:12 PM Kohei KaiGai <[hidden email]> wrote:
>>>
>>> 2019年8月1日(木) 1:41 Tom Lane <[hidden email]>:
>>> >
>>> > Robert Haas <[hidden email]> writes:
>>> > > Yeah, but I have to admit that this whole design makes me kinda
>>> > > uncomfortable.  Every time somebody comes up with a new figure of
>>> > > merit, it increases not only the number of paths retained but also the
>>> > > cost of comparing two paths to possibly reject one of them. A few
>>> > > years ago, you came up with the (good) idea of rejecting some join
>>> > > paths before actually creating the paths, and I wonder if we ought to
>>> > > try to go further with that somehow. Or maybe, as Peter Geoghegan, has
>>> > > been saying, we ought to think about planning top-down with
>>> > > memoization instead of bottom up (yeah, I know that's a huge change).
>>> > > It just feels like the whole idea of a list of paths ordered by cost
>>> > > breaks down when there are so many ways that a not-cheapest path can
>>> > > still be worth keeping. Not sure exactly what would be better, though.
>>> >
>>> > Yeah, I agree that add_path is starting to feel creaky.  I don't
>>> > know what to do instead though.  Changing to a top-down design
>>> > sounds like it would solve some problems while introducing others
>>> > (not to mention the amount of work and breakage involved).
>>> >
>>> Hmm... It looks the problem we ought to revise about path construction
>>> is much larger than my expectation, and uncertain for me how much works
>>> are needed.
>>>
>>> Although it might be a workaround until fundamental reconstruction,
>>> how about to have a margin of estimated cost to reject paths?
>>> Current add_path() immediately rejects lesser paths if its cost is
>>> even a little more expensive than the compared one. One the other hands,
>>
>>
>> Hmm.. I don't think so. Currently add_path() uses fuzzy comparisons on
>> costs of two paths, although the fuzz factor (1%) is hard coded and not
>> user-controllable.
>>
>Ah, sorry, I oversight this logic...
>

FWIW I doubt adding larger "fuzz factor" is unlikely to be a reliable
solution, because how would you know what value is the right one? Why ould
10% be the right threshold, for example? In my experience these these
hard-coded coefficients imply behavior that's difficult to predict and
explain to users.

>>> I understand it is not an essential re-design of path-construction logic, and
>>> may have limitation. However, amount of works are reasonable and no side-
>>> effect. (current behavior = 0% threshold).
>>> How about your opinions?
>>>
>>
>> How's about Tom's suggestion on adding another dimension in add_path()
>> to be considered, just like how it considers paths of better sort order
>> or parallel-safe?
>>
>Robert also mentioned it makes comparison operation more complicated.
>If we try to have another dimension here, a callback function in Path node
>may be able to tell the core optimizer whether "dominated path" shall be
>dropped or not, without further complexity. It is just an idea.
>

I think adding a hook to add_path() allowing to override the decidion
should be OK. The chance of getting that committed in the near future
seems much higher than for a patch that completely reworks add_path().

There's one caveat, though - AFAICS various places in the planner use
things like cheapest_total_path, cheapest_startup_path and even
get_cheapest_path_for_pathkeys() which kinda assumes add_path() only
considers startup/total cost. It might happen that even after keeping
additional paths, the planner still won't use them :-(

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Reply | Threaded
Open this post in threaded view
|

Re: How to retain lesser paths at add_path()?

Kohei KaiGai-4
2019年8月1日(木) 19:24 Tomas Vondra <[hidden email]>:

>
> On Thu, Aug 01, 2019 at 06:28:08PM +0900, Kohei KaiGai wrote:
> >2019年8月1日(木) 16:19 Richard Guo <[hidden email]>:
> >>
> >> On Thu, Aug 1, 2019 at 2:12 PM Kohei KaiGai <[hidden email]> wrote:
> >>>
> >>> 2019年8月1日(木) 1:41 Tom Lane <[hidden email]>:
> >>> >
> >>> > Robert Haas <[hidden email]> writes:
> >>> > > Yeah, but I have to admit that this whole design makes me kinda
> >>> > > uncomfortable.  Every time somebody comes up with a new figure of
> >>> > > merit, it increases not only the number of paths retained but also the
> >>> > > cost of comparing two paths to possibly reject one of them. A few
> >>> > > years ago, you came up with the (good) idea of rejecting some join
> >>> > > paths before actually creating the paths, and I wonder if we ought to
> >>> > > try to go further with that somehow. Or maybe, as Peter Geoghegan, has
> >>> > > been saying, we ought to think about planning top-down with
> >>> > > memoization instead of bottom up (yeah, I know that's a huge change).
> >>> > > It just feels like the whole idea of a list of paths ordered by cost
> >>> > > breaks down when there are so many ways that a not-cheapest path can
> >>> > > still be worth keeping. Not sure exactly what would be better, though.
> >>> >
> >>> > Yeah, I agree that add_path is starting to feel creaky.  I don't
> >>> > know what to do instead though.  Changing to a top-down design
> >>> > sounds like it would solve some problems while introducing others
> >>> > (not to mention the amount of work and breakage involved).
> >>> >
> >>> Hmm... It looks the problem we ought to revise about path construction
> >>> is much larger than my expectation, and uncertain for me how much works
> >>> are needed.
> >>>
> >>> Although it might be a workaround until fundamental reconstruction,
> >>> how about to have a margin of estimated cost to reject paths?
> >>> Current add_path() immediately rejects lesser paths if its cost is
> >>> even a little more expensive than the compared one. One the other hands,
> >>
> >>
> >> Hmm.. I don't think so. Currently add_path() uses fuzzy comparisons on
> >> costs of two paths, although the fuzz factor (1%) is hard coded and not
> >> user-controllable.
> >>
> >Ah, sorry, I oversight this logic...
> >
>
> FWIW I doubt adding larger "fuzz factor" is unlikely to be a reliable
> solution, because how would you know what value is the right one? Why ould
> 10% be the right threshold, for example? In my experience these these
> hard-coded coefficients imply behavior that's difficult to predict and
> explain to users.
>
Ah... That's exactly hard task to explain to users.

> >>> I understand it is not an essential re-design of path-construction logic, and
> >>> may have limitation. However, amount of works are reasonable and no side-
> >>> effect. (current behavior = 0% threshold).
> >>> How about your opinions?
> >>>
> >>
> >> How's about Tom's suggestion on adding another dimension in add_path()
> >> to be considered, just like how it considers paths of better sort order
> >> or parallel-safe?
> >>
> >Robert also mentioned it makes comparison operation more complicated.
> >If we try to have another dimension here, a callback function in Path node
> >may be able to tell the core optimizer whether "dominated path" shall be
> >dropped or not, without further complexity. It is just an idea.
> >
>
> I think adding a hook to add_path() allowing to override the decidion
> should be OK. The chance of getting that committed in the near future
> seems much higher than for a patch that completely reworks add_path().
>
> There's one caveat, though - AFAICS various places in the planner use
> things like cheapest_total_path, cheapest_startup_path and even
> get_cheapest_path_for_pathkeys() which kinda assumes add_path() only
> considers startup/total cost. It might happen that even after keeping
> additional paths, the planner still won't use them :-(
>
Even if existing code looks at only cheapest_xxx_path, I don't think it is
a problematic behavior because these paths are exactly cheapest at a level,
but they may use more expensive paths in the deeper level.
If a hook can prevent dropping a path, not cheapest, in a particular level,
this path shall not appear on the cheapest_xxx_path, however, upper level
path construction logic can pick up these paths as a candidate of input.
If it has special discount factor here, the startup/total cost of the
upper level
path may have cheaper cost in spite of expensive input cost.

In this scenario, this hook gives a decision whether dominated path-node
shall be dropped or not. So, core optimizer still compares path-node using
estimated cost value.

In my scenario, even if GpuJoin is expensive at top level of JOIN/SCAN path
construction, GpuPreAgg + GpuJoin may be cheaper than others because of
data exchange on GPU device memory.
As long as GpuJoin is not removed from the pathlist, extension can build its
custom-path with cheaper cost.

Best regards,
--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <[hidden email]>


Reply | Threaded
Open this post in threaded view
|

Re: How to retain lesser paths at add_path()?

Kohei KaiGai-4
Hello,

For implementation of the concept, this patch puts a hook on add_path
/ add_partial_path
to override the path removal decision by extensions, according to its
own viewpoint.
We don't add any metrics other than path's cost and path keys, so
set_cheapest() still picks
up paths based on its cost for each depth.
As we are currently doing, extensions (FDW / CSP) are responsible to
construct and add
paths with reasonable cost values, then PostgreSQL optimizer chooses
the "best" path
according to the (self-reported) cost. On the other hands, an
expensive path at a particular
level is not always expensive from upper viewpoint, if combination of
path-A and path-B
has special optimization, like a reduction of DMA transfer between
host and device, or omit
of network transfer between local and remote.
In these cases, extension can get a control to override a decision
whether old_path that
is dominated by new_path shall be removed, or not. If old_path
survived, extension can
re-use the path at the upper level to construct a special path.

Best regards,

2019年8月1日(木) 21:14 Kohei KaiGai <[hidden email]>:

>
> 2019年8月1日(木) 19:24 Tomas Vondra <[hidden email]>:
> >
> > On Thu, Aug 01, 2019 at 06:28:08PM +0900, Kohei KaiGai wrote:
> > >2019年8月1日(木) 16:19 Richard Guo <[hidden email]>:
> > >>
> > >> On Thu, Aug 1, 2019 at 2:12 PM Kohei KaiGai <[hidden email]> wrote:
> > >>>
> > >>> 2019年8月1日(木) 1:41 Tom Lane <[hidden email]>:
> > >>> >
> > >>> > Robert Haas <[hidden email]> writes:
> > >>> > > Yeah, but I have to admit that this whole design makes me kinda
> > >>> > > uncomfortable.  Every time somebody comes up with a new figure of
> > >>> > > merit, it increases not only the number of paths retained but also the
> > >>> > > cost of comparing two paths to possibly reject one of them. A few
> > >>> > > years ago, you came up with the (good) idea of rejecting some join
> > >>> > > paths before actually creating the paths, and I wonder if we ought to
> > >>> > > try to go further with that somehow. Or maybe, as Peter Geoghegan, has
> > >>> > > been saying, we ought to think about planning top-down with
> > >>> > > memoization instead of bottom up (yeah, I know that's a huge change).
> > >>> > > It just feels like the whole idea of a list of paths ordered by cost
> > >>> > > breaks down when there are so many ways that a not-cheapest path can
> > >>> > > still be worth keeping. Not sure exactly what would be better, though.
> > >>> >
> > >>> > Yeah, I agree that add_path is starting to feel creaky.  I don't
> > >>> > know what to do instead though.  Changing to a top-down design
> > >>> > sounds like it would solve some problems while introducing others
> > >>> > (not to mention the amount of work and breakage involved).
> > >>> >
> > >>> Hmm... It looks the problem we ought to revise about path construction
> > >>> is much larger than my expectation, and uncertain for me how much works
> > >>> are needed.
> > >>>
> > >>> Although it might be a workaround until fundamental reconstruction,
> > >>> how about to have a margin of estimated cost to reject paths?
> > >>> Current add_path() immediately rejects lesser paths if its cost is
> > >>> even a little more expensive than the compared one. One the other hands,
> > >>
> > >>
> > >> Hmm.. I don't think so. Currently add_path() uses fuzzy comparisons on
> > >> costs of two paths, although the fuzz factor (1%) is hard coded and not
> > >> user-controllable.
> > >>
> > >Ah, sorry, I oversight this logic...
> > >
> >
> > FWIW I doubt adding larger "fuzz factor" is unlikely to be a reliable
> > solution, because how would you know what value is the right one? Why ould
> > 10% be the right threshold, for example? In my experience these these
> > hard-coded coefficients imply behavior that's difficult to predict and
> > explain to users.
> >
> Ah... That's exactly hard task to explain to users.
>
> > >>> I understand it is not an essential re-design of path-construction logic, and
> > >>> may have limitation. However, amount of works are reasonable and no side-
> > >>> effect. (current behavior = 0% threshold).
> > >>> How about your opinions?
> > >>>
> > >>
> > >> How's about Tom's suggestion on adding another dimension in add_path()
> > >> to be considered, just like how it considers paths of better sort order
> > >> or parallel-safe?
> > >>
> > >Robert also mentioned it makes comparison operation more complicated.
> > >If we try to have another dimension here, a callback function in Path node
> > >may be able to tell the core optimizer whether "dominated path" shall be
> > >dropped or not, without further complexity. It is just an idea.
> > >
> >
> > I think adding a hook to add_path() allowing to override the decidion
> > should be OK. The chance of getting that committed in the near future
> > seems much higher than for a patch that completely reworks add_path().
> >
> > There's one caveat, though - AFAICS various places in the planner use
> > things like cheapest_total_path, cheapest_startup_path and even
> > get_cheapest_path_for_pathkeys() which kinda assumes add_path() only
> > considers startup/total cost. It might happen that even after keeping
> > additional paths, the planner still won't use them :-(
> >
> Even if existing code looks at only cheapest_xxx_path, I don't think it is
> a problematic behavior because these paths are exactly cheapest at a level,
> but they may use more expensive paths in the deeper level.
> If a hook can prevent dropping a path, not cheapest, in a particular level,
> this path shall not appear on the cheapest_xxx_path, however, upper level
> path construction logic can pick up these paths as a candidate of input.
> If it has special discount factor here, the startup/total cost of the
> upper level
> path may have cheaper cost in spite of expensive input cost.
>
> In this scenario, this hook gives a decision whether dominated path-node
> shall be dropped or not. So, core optimizer still compares path-node using
> estimated cost value.
>
> In my scenario, even if GpuJoin is expensive at top level of JOIN/SCAN path
> construction, GpuPreAgg + GpuJoin may be cheaper than others because of
> data exchange on GPU device memory.
> As long as GpuJoin is not removed from the pathlist, extension can build its
> custom-path with cheaper cost.
>
> Best regards,
> --
> HeteroDB, Inc / The PG-Strom Project
> KaiGai Kohei <[hidden email]>
--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <[hidden email]>

pgsql13-path_removal_decision_hook.v1.patch (3K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: How to retain lesser paths at add_path()?

Robert Haas
On Mon, Aug 12, 2019 at 12:28 AM Kohei KaiGai <[hidden email]> wrote:
> For implementation of the concept, this patch puts a hook on add_path
> / add_partial_path
> to override the path removal decision by extensions, according to its
> own viewpoint.

I don't think this hook is a very useful approach to this problem, and
I'm concerned that it might have a measurable performance cost.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: How to retain lesser paths at add_path()?

Tomas Vondra-4
On Fri, Oct 04, 2019 at 12:19:06PM -0400, Robert Haas wrote:
>On Mon, Aug 12, 2019 at 12:28 AM Kohei KaiGai <[hidden email]> wrote:
>> For implementation of the concept, this patch puts a hook on add_path
>> / add_partial_path
>> to override the path removal decision by extensions, according to its
>> own viewpoint.
>
>I don't think this hook is a very useful approach to this problem, and
>I'm concerned that it might have a measurable performance cost.
>

Can you be more specific why you don't think this approach is not
useful? I'm not sure whether you consider all hooks to have this issue
or just this proposed one.

As for the performance impact, I think that's not difficult to measure.
I'd be surprised if it has measurable impact on cases with no hook
installed (there's plenty more expensive stuff going on). Of course, it
may have some impact for cases when the hook retains many more paths
and/or does something expensive, but that's kinda up to whoever writes
that particular hook. I think the assumption is that the savings from
building better plans far outweight that extra cost.

That does not necessarily mean the proposed hook is correct - I only
briefly looked at it, and it's not clear to me why would it be OK to
call the hook for remove_old=true but not also for accept_new=false? How
do we know whether the "better" path arrives first?


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: How to retain lesser paths at add_path()?

Robert Haas
On Sun, Oct 6, 2019 at 3:23 PM Tomas Vondra
<[hidden email]> wrote:
> >I don't think this hook is a very useful approach to this problem, and
> >I'm concerned that it might have a measurable performance cost.
>
> Can you be more specific why you don't think this approach is not
> useful? I'm not sure whether you consider all hooks to have this issue
> or just this proposed one.

I'll start by admitting that that remark was rather off-the-cuff.  On
further reflection, add_path() is not really a crazy place to try to
add a new dimension of merit, which is really what KaiGai wants to do
here.  On the other hand, as Tom and I noted upthread, that system is
creaking under its weight as it is, and making it extensible seems
like it might therefore be a bad idea - specifically, because it might
slow down planning quite a bit on large join problems, either because
of the additional cost testing the additional dimension of merit or
because of the additional cost of dealing with the extra paths that
get kept.

It is maybe worth noting that join/aggregate pushdown for FDWs has a
somewhat similar problem, and we didn't solve it this way. Should we
have? Maybe it would have worked better and been less buggy. But
slowing down all planning for the benefit of that feature also sounds
bad. I think any changes in this area need careful though.

> As for the performance impact, I think that's not difficult to measure.
> I'd be surprised if it has measurable impact on cases with no hook
> installed (there's plenty more expensive stuff going on). Of course, it
> may have some impact for cases when the hook retains many more paths
> and/or does something expensive, but that's kinda up to whoever writes
> that particular hook. I think the assumption is that the savings from
> building better plans far outweight that extra cost.

You might be right, but add_path() is a pretty darn hot path in the
planner.  You probably wouldn't see a significant overhead on a
single-table query, but on a query with many tables I would not be
surprised if even the overhead of an empty function call was
measurable. But I could be wrong, too.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: How to retain lesser paths at add_path()?

Tom Lane-2
Robert Haas <[hidden email]> writes:
> On Sun, Oct 6, 2019 at 3:23 PM Tomas Vondra
> <[hidden email]> wrote:
>> Can you be more specific why you don't think this approach is not
>> useful? I'm not sure whether you consider all hooks to have this issue
>> or just this proposed one.

> I'll start by admitting that that remark was rather off-the-cuff.  On
> further reflection, add_path() is not really a crazy place to try to
> add a new dimension of merit, which is really what KaiGai wants to do
> here.  On the other hand, as Tom and I noted upthread, that system is
> creaking under its weight as it is, and making it extensible seems
> like it might therefore be a bad idea - specifically, because it might
> slow down planning quite a bit on large join problems, either because
> of the additional cost testing the additional dimension of merit or
> because of the additional cost of dealing with the extra paths that
> get kept.

FWIW, I think that the patch as proposed would most certainly have
nasty performance problems.  To make intelligent decisions, the
hook function would basically have to re-do a large fraction of
the calculations that add_path itself does.  It's also fairly
unclear (or at least undocumented) how things would work if multiple
path providers wanted to make use of the hook; except that that
performance issue would get worse, as each of them redoes that work.

Also, as Robert says, the real goal here is to allow some additional
comparison dimension to be considered.  Simply preventing pre-existing
paths from being removed isn't sufficient for that --- you have to be
able to change the accept_new decisions as well, as Tomas already
worried about.  But if we phrase that as an additional hook that
only concerns itself with accept_new, then the duplicate-calculation
problem gets really substantially worse: I think actual use of a hook
like that would require reconsidering the entire existing path list.

I'm not very sure what a good design for adding new comparison dimensions
would look like, but I think this isn't it.

We could imagine, maybe, that a hook for the purpose of allowing an
additional dimension to be considered would be essentially a path
comparison function, returning -1, +1, or 0 depending on whether
path A is dominated by path B (on this new dimension), dominates
path B, or neither.  However, I do not see how multiple extensions
could usefully share use of such a hook.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: How to retain lesser paths at add_path()?

Robert Haas
On Mon, Oct 7, 2019 at 9:56 AM Tom Lane <[hidden email]> wrote:
> We could imagine, maybe, that a hook for the purpose of allowing an
> additional dimension to be considered would be essentially a path
> comparison function, returning -1, +1, or 0 depending on whether
> path A is dominated by path B (on this new dimension), dominates
> path B, or neither.  However, I do not see how multiple extensions
> could usefully share use of such a hook.

Typically, we support hook-sharing mostly by ad-hoc methods: when
installing a hook, you remember the previous value of the function
pointer, and arrange to call that function yourself.  That's not a
great method.  One problem with it is that you can't reasonably
uninstall a hook function, which would be a nice thing to be able to
do. We could do better by reusing the technique from on_shmem_exit or
RegisterXactCallbacks: keep an array of pointers, and call them in
order. I wish we'd retrofit all of our hooks to work more like that;
being able to unload shared libraries would be a good feature.

But if we want to stick with the ad-hoc method, we could also just
have four possible return values: dominates, dominated-by, both, or
neither.

Still, this doesn't feel like very scalable paradigm, because this
code gets called a lot.  Unless both calling the hook functions and
the hook functions themselves are dirt-cheap, it's going to hurt, and
TBH, I wonder if even the cost of detecting that the hook is unused
might be material.

I wonder whether we might get a nicer solution to this problem if our
method of managing paths looked less something invented by a LISP
hacker, but I don't have a specific proposal in mind.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: How to retain lesser paths at add_path()?

Tom Lane-2
Robert Haas <[hidden email]> writes:
> On Mon, Oct 7, 2019 at 9:56 AM Tom Lane <[hidden email]> wrote:
>> We could imagine, maybe, that a hook for the purpose of allowing an
>> additional dimension to be considered would be essentially a path
>> comparison function, returning -1, +1, or 0 depending on whether
>> path A is dominated by path B (on this new dimension), dominates
>> path B, or neither.  However, I do not see how multiple extensions
>> could usefully share use of such a hook.

> ... if we want to stick with the ad-hoc method, we could also just
> have four possible return values: dominates, dominated-by, both, or
> neither.

Right, and then *each* user of the hook would have to be prepared
to merge its result with the result from the previous user(s),
which is a complicated bit of logic that somebody would surely
get wrong, especially if (a) there's no prototype to copy from
and (b) testing only their own extension would not exercise it.

[ thinks a bit... ]  Maybe that could be improved if we can express
the result as a bitmask, defined in such a way that OR'ing (or maybe
AND'ing? haven't worked it out) the results from different comparisons
does the right thing.

> Still, this doesn't feel like very scalable paradigm, because this
> code gets called a lot.  Unless both calling the hook functions and
> the hook functions themselves are dirt-cheap, it's going to hurt, and
> TBH, I wonder if even the cost of detecting that the hook is unused
> might be material.

Yeah, I'm worried about that too.  This is quite a hot code path,
and so I don't think we can just assume that changes are free.
Still, if we could come up with a cleaner paradigm, maybe we could
buy back a few cycles in the core-code comparison logic, and thus
not come out behind from adding a hook.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: How to retain lesser paths at add_path()?

Kohei KaiGai-4
In reply to this post by Robert Haas
2019年10月7日(月) 23:44 Robert Haas <[hidden email]>:

>
> On Mon, Oct 7, 2019 at 9:56 AM Tom Lane <[hidden email]> wrote:
> > We could imagine, maybe, that a hook for the purpose of allowing an
> > additional dimension to be considered would be essentially a path
> > comparison function, returning -1, +1, or 0 depending on whether
> > path A is dominated by path B (on this new dimension), dominates
> > path B, or neither.  However, I do not see how multiple extensions
> > could usefully share use of such a hook.
>
> Typically, we support hook-sharing mostly by ad-hoc methods: when
> installing a hook, you remember the previous value of the function
> pointer, and arrange to call that function yourself.  That's not a
> great method.  One problem with it is that you can't reasonably
> uninstall a hook function, which would be a nice thing to be able to
> do. We could do better by reusing the technique from on_shmem_exit or
> RegisterXactCallbacks: keep an array of pointers, and call them in
> order. I wish we'd retrofit all of our hooks to work more like that;
> being able to unload shared libraries would be a good feature.
>
> But if we want to stick with the ad-hoc method, we could also just
> have four possible return values: dominates, dominated-by, both, or
> neither.
>
It seems to me this is a bit different from the purpose of this hook.
I never intend to overwrite existing cost-based decision by this hook.
The cheapest path at a particular level is the cheapest one regardless
of the result of this hook. However, this hook enables to prevent
immediate elimination of a particular path that we (extension) want to
use it later and may have potentially cheaper cost (e.g; a pair of
custom GpuAgg + GpuJoin by reduction of DMA cost).

So, I think expected behavior when multiple extensions would use
this hook is clear. If any of call-chain on the hook wanted to preserve
the path, it should be kept on the pathlist. (Of couse, it is not a cheapest
one)

> Still, this doesn't feel like very scalable paradigm, because this
> code gets called a lot.  Unless both calling the hook functions and
> the hook functions themselves are dirt-cheap, it's going to hurt, and
> TBH, I wonder if even the cost of detecting that the hook is unused
> might be material.
>
> I wonder whether we might get a nicer solution to this problem if our
> method of managing paths looked less something invented by a LISP
> hacker, but I don't have a specific proposal in mind.
>
One other design in my mind is, add a callback function pointer on Path
structure. Only if Path structure has valid pointer (not-NULL), add_path()
calls extension's own logic to determine whether the Path can be
eliminated now.
This design may minimize the number of callback invocation.

One potential downside of this approach is, function pointer makes
hard to implement readfuncs of Path nodes, even though we have
no read handler of them, right now.

Best regards,
--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <[hidden email]>


Reply | Threaded
Open this post in threaded view
|

Re: How to retain lesser paths at add_path()?

Tom Lane-2
Kohei KaiGai <[hidden email]> writes:
> 2019年10月7日(月) 23:44 Robert Haas <[hidden email]>:
>> But if we want to stick with the ad-hoc method, we could also just
>> have four possible return values: dominates, dominated-by, both, or
>> neither.

> It seems to me this is a bit different from the purpose of this hook.
> I never intend to overwrite existing cost-based decision by this hook.
> The cheapest path at a particular level is the cheapest one regardless
> of the result of this hook. However, this hook enables to prevent
> immediate elimination of a particular path that we (extension) want to
> use it later and may have potentially cheaper cost (e.g; a pair of
> custom GpuAgg + GpuJoin by reduction of DMA cost).

I do not think this will work for the purpose you wish, for the reason
Tomas already pointed out: if you don't also modify the accept_new
decision, there's no guarantee that the path you want will get into
the relation's path list in the first place.

Another problem with trying to manage this only by preventing removals
is that it is likely to lead to keeping extra paths and thereby wasting
planning effort.  What if there's more than one path having the property
you want to keep?

Given the way that add_path works, you really have to think about the
problem as adding an additional figure-of-merit or comparison dimension.
Anything else is going to have some unpleasant behaviors.

> One other design in my mind is, add a callback function pointer on Path
> structure. Only if Path structure has valid pointer (not-NULL), add_path()
> calls extension's own logic to determine whether the Path can be
> eliminated now.

While I'm not necessarily against having a per-path callback, I don't
see how it helps us solve this problem, especially not in the presence
of multiple extensions trying to add different paths.  I do not wish
to see things ending up with extensions saying "don't delete my path
no matter what", because that's going to be costly in terms of later
planning effort.  But I'm not seeing how this wouldn't degenerate to
pretty much that behavior.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: How to retain lesser paths at add_path()?

Kohei KaiGai-4
2019年10月8日(火) 1:56 Tom Lane <[hidden email]>:

>
> Kohei KaiGai <[hidden email]> writes:
> > 2019年10月7日(月) 23:44 Robert Haas <[hidden email]>:
> >> But if we want to stick with the ad-hoc method, we could also just
> >> have four possible return values: dominates, dominated-by, both, or
> >> neither.
>
> > It seems to me this is a bit different from the purpose of this hook.
> > I never intend to overwrite existing cost-based decision by this hook.
> > The cheapest path at a particular level is the cheapest one regardless
> > of the result of this hook. However, this hook enables to prevent
> > immediate elimination of a particular path that we (extension) want to
> > use it later and may have potentially cheaper cost (e.g; a pair of
> > custom GpuAgg + GpuJoin by reduction of DMA cost).
>
> I do not think this will work for the purpose you wish, for the reason
> Tomas already pointed out: if you don't also modify the accept_new
> decision, there's no guarantee that the path you want will get into
> the relation's path list in the first place.
>
Ah, it is right, indeed. We may need to have a variation of add_path()
that guarantee to preserve a path newly added.
We may be utilize the callback to ask extension whether it allows the
new path to be dominated by the existing cheapest path also.

> Another problem with trying to manage this only by preventing removals
> is that it is likely to lead to keeping extra paths and thereby wasting
> planning effort.  What if there's more than one path having the property
> you want to keep?
>
My assumption is that upper path tries to walk on the path-list, not only
cheapest one, to construct upper paths with lesser paths if they are capable
for special optimization.
Of course, it is not a cheap way, however, majority of path-nodes are not
interested in the lesser paths, as its sub-path. Only limited number of
extension will walk on the lesser path also?
A separated list is probably an idea, to keep the lesser paths. It is not
referenced at the usual path, however, extension can walk on the list
to lookup another opportunity more than the cheapest path.
In this case, length of the path_list is not changed.

Best regards,
--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <[hidden email]>