add_partial_path() may remove dominated path but still in use

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

add_partial_path() may remove dominated path but still in use

Kohei KaiGai-4
Hello,

I've investigated a crash report of PG-Strom for a few days, then I doubt
add_partial_path() can unexpectedly release dominated old partial path
but still referenced by other Gather node, and it leads unexpected system
crash.

Please check at the gpuscan.c:373
https://github.com/heterodb/pg-strom/blob/master/src/gpuscan.c#L373

The create_gpuscan_path() constructs a partial custom-path, then it is
added to the partial_pathlist of the baserel.
If both of old and new partial paths have no pathkeys and old-path has
larger cost, add_partial_path detaches the old path from the list, then
calls pfree() to release old_path itself.

On the other hands, the baserel may have GatherPath which references
the partial-path on its pathlist. Here is no check whether the old partial-
paths are referenced by others, or not.

To ensure my assumption, I tried to inject elog() before/after the call of
add_partial_path() and just before the pfree(old_path) in add_partial_path().

----------------------------------------------------------------
dbt3=# explain select
    sum(l_extendedprice * l_discount) as revenue
from
    lineitem
where
    l_shipdate >= date '1994-01-01'
    and l_shipdate < cast(date '1994-01-01' + interval '1 year' as date)
    and l_discount between 0.08 - 0.01 and 0.08 + 0.01
    and l_quantity < 24 limit 1;
INFO:  GpuScan:389  GATHER(0x28f3c30), SUBPATH(0x28f3f88): {GATHERPATH
:pathtype 44 :parent_relids (b 1) :required_outer (b) :parallel_aware
false :parallel_safe false :parallel_workers 0 :rows 151810
:startup_cost 1000.00 :total_cost 341760.73 :pathkeys <> :subpath
{PATH :pathtype 18 :parent_relids (b 1) :required_outer (b)
:parallel_aware true :parallel_safe true :parallel_workers 2 :rows
63254 :startup_cost 0.00 :total_cost 325579.73 :pathkeys <>}
:single_copy false :num_workers 2}
INFO:  add_partial_path:830 old_path(0x28f3f88) is removed
WARNING:  could not dump unrecognized node type: 2139062143
INFO:  GpuScan:401  GATHER(0x28f3c30), SUBPATH(0x28f3f88): {GATHERPATH
:pathtype 44 :parent_relids (b 1) :required_outer (b) :parallel_aware
false :parallel_safe false :parallel_workers 0 :rows 151810
:startup_cost 1000.00 :total_cost 341760.73 :pathkeys <> :subpath
{(HOGE)} :single_copy false :num_workers 2}
----------------------------------------------------------------

At the L389, GatherPath in the baresel->pathlist is healthy. Its
subpath (0x28f3f88) is
a valid T_Scan path node.
Then, gpuscan.c adds a cheaper path-node so add_partial_path()
considers the above
subpath (0x28f3f88) is dominated by the new custom-path, and release it.
So, elog() at L401 says subpath has unrecognized node type: 2139062143
== 0x7f7f7f7f
that implies the memory region was already released by pfree().

Reference counter or other mechanism to tack referenced paths may be an idea
to avoid unintentional release of path-node.
On the other hands, it seems to me the pfree() at add_path /
add_partial_path is not
a serious memory management because other objects referenced by the path-node
are not released here.
It is sufficient if we detach dominated path-node from the pathlist /
partial_pathlist.

How about your opinions?

Thanks,
--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <[hidden email]>

Reply | Threaded
Open this post in threaded view
|

Re: add_partial_path() may remove dominated path but still in use

Tom Lane-2
Kohei KaiGai <[hidden email]> writes:
> I've investigated a crash report of PG-Strom for a few days, then I doubt
> add_partial_path() can unexpectedly release dominated old partial path
> but still referenced by other Gather node, and it leads unexpected system
> crash.

Hm.  This seems comparable to the special case in plain add_path, where it
doesn't attempt to free IndexPaths because of the risk that they're still
referenced.  So maybe we should just drop the pfree here.

However, first I'd like to know why this situation is arising in the first
place.  To have the situation you're describing, we'd have to have
attempted to make some Gather paths before we have all the partial paths
for the relation they're for.  Why is that a good thing to do?  It seems
like such Gathers are necessarily being made with incomplete information,
and we'd be better off to fix things so that none are made till later.

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: add_partial_path() may remove dominated path but still in use

Kohei KaiGai-4
2018年12月29日(土) 1:44 Tom Lane <[hidden email]>:

>
> Kohei KaiGai <[hidden email]> writes:
> > I've investigated a crash report of PG-Strom for a few days, then I doubt
> > add_partial_path() can unexpectedly release dominated old partial path
> > but still referenced by other Gather node, and it leads unexpected system
> > crash.
>
> Hm.  This seems comparable to the special case in plain add_path, where it
> doesn't attempt to free IndexPaths because of the risk that they're still
> referenced.  So maybe we should just drop the pfree here.
>
> However, first I'd like to know why this situation is arising in the first
> place.  To have the situation you're describing, we'd have to have
> attempted to make some Gather paths before we have all the partial paths
> for the relation they're for.  Why is that a good thing to do?  It seems
> like such Gathers are necessarily being made with incomplete information,
> and we'd be better off to fix things so that none are made till later.
>
Because of the hook location, Gather-node shall be constructed with built-in
and foreign partial scan node first, then extension gets a chance to add its
custom paths (partial and full).
At the set_rel_pathlist(), set_rel_pathlist_hook() is invoked next to the
generate_gather_paths(). Even if extension adds some partial paths later,
the first generate_gather_paths() has to generate Gather node based on
incomplete information.
If we could ensure Gather node shall be made after all the partial nodes
are added, it may be a solution for the problem.

Of course, relocation of the hook may have a side-effect. Anyone may
expect the pathlist contains all the path-node including Gather node, for
editorialization of the pathlist.

Thanks,
--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <[hidden email]>

Reply | Threaded
Open this post in threaded view
|

Re: add_partial_path() may remove dominated path but still in use

Tom Lane-2
Kohei KaiGai <[hidden email]> writes:
> 2018年12月29日(土) 1:44 Tom Lane <[hidden email]>:
>> However, first I'd like to know why this situation is arising in the first
>> place.  To have the situation you're describing, we'd have to have
>> attempted to make some Gather paths before we have all the partial paths
>> for the relation they're for.  Why is that a good thing to do?  It seems
>> like such Gathers are necessarily being made with incomplete information,
>> and we'd be better off to fix things so that none are made till later.

> Because of the hook location, Gather-node shall be constructed with built-in
> and foreign partial scan node first, then extension gets a chance to add its
> custom paths (partial and full).
> At the set_rel_pathlist(), set_rel_pathlist_hook() is invoked next to the
> generate_gather_paths().

Hmm.  I'm inclined to think that we should have a separate hook
in which extensions are allowed to add partial paths, and that
set_rel_pathlist_hook should only be allowed to add regular paths.

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: add_partial_path() may remove dominated path but still in use

Kohei KaiGai-4
2018年12月30日(日) 4:12 Tom Lane <[hidden email]>:

>
> Kohei KaiGai <[hidden email]> writes:
> > 2018年12月29日(土) 1:44 Tom Lane <[hidden email]>:
> >> However, first I'd like to know why this situation is arising in the first
> >> place.  To have the situation you're describing, we'd have to have
> >> attempted to make some Gather paths before we have all the partial paths
> >> for the relation they're for.  Why is that a good thing to do?  It seems
> >> like such Gathers are necessarily being made with incomplete information,
> >> and we'd be better off to fix things so that none are made till later.
>
> > Because of the hook location, Gather-node shall be constructed with built-in
> > and foreign partial scan node first, then extension gets a chance to add its
> > custom paths (partial and full).
> > At the set_rel_pathlist(), set_rel_pathlist_hook() is invoked next to the
> > generate_gather_paths().
>
> Hmm.  I'm inclined to think that we should have a separate hook
> in which extensions are allowed to add partial paths, and that
> set_rel_pathlist_hook should only be allowed to add regular paths.
>
I have almost same opinion, but the first hook does not need to be
dedicated for partial paths. As like set_foreign_pathlist() doing, we can
add both of partial and regular paths here, then generate_gather_paths()
may generate a Gather-path on top of the best partial-path.

On the other hands, the later hook must be dedicated to add regular paths,
and also provides a chance for extensions to manipulate pre-built path-list
including Gather-path.
As long as I know, pg_hint_plan uses the set_rel_pathlist_hook to enforce
a particular path-node, including Gather-node, by manipulation of the cost
value. Horiguchi-san, is it right?
Likely, this kind of extension needs to use the later hook.

I expect these hooks are located as follows:

set_rel_pathlist(...)
{
        :
    <snip>
        :
    /* for partial / regular paths */
    if (set_rel_pathlist_hook)
      (*set_rel_pathlist_hook) (root, rel, rti, rte);
    /* generate Gather-node */
    if (rel->reloptkind == RELOPT_BASEREL)
        generate_gather_paths(root, rel);
    /* for regular paths and manipulation */
    if (post_rel_pathlist_hook)
      (*post_rel_pathlist_hook) (root, rel, rti, rte);

    set_cheapest();
}

Thanks,
--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <[hidden email]>

Reply | Threaded
Open this post in threaded view
|

Re: add_partial_path() may remove dominated path but still in use

akapila
On Sun, Dec 30, 2018 at 9:01 AM Kohei KaiGai <[hidden email]> wrote:

> 2018年12月30日(日) 4:12 Tom Lane <[hidden email]>:
> >
> > Kohei KaiGai <[hidden email]> writes:
> > > 2018年12月29日(土) 1:44 Tom Lane <[hidden email]>:
> > >> However, first I'd like to know why this situation is arising in the first
> > >> place.  To have the situation you're describing, we'd have to have
> > >> attempted to make some Gather paths before we have all the partial paths
> > >> for the relation they're for.  Why is that a good thing to do?  It seems
> > >> like such Gathers are necessarily being made with incomplete information,
> > >> and we'd be better off to fix things so that none are made till later.
> >
> > > Because of the hook location, Gather-node shall be constructed with built-in
> > > and foreign partial scan node first, then extension gets a chance to add its
> > > custom paths (partial and full).
> > > At the set_rel_pathlist(), set_rel_pathlist_hook() is invoked next to the
> > > generate_gather_paths().
> >
> > Hmm.  I'm inclined to think that we should have a separate hook
> > in which extensions are allowed to add partial paths, and that
> > set_rel_pathlist_hook should only be allowed to add regular paths.

+1.  This idea sounds sensible to me.

> >
> I have almost same opinion, but the first hook does not need to be
> dedicated for partial paths. As like set_foreign_pathlist() doing, we can
> add both of partial and regular paths here, then generate_gather_paths()
> may generate a Gather-path on top of the best partial-path.
>

Won't it be confusing for users if we allow both partial and full
paths in first hook and only full paths in the second hook?
Basically, in many cases, the second hook won't be of much use.  What
advantage you are seeing in allowing both partial and full paths in
the first hook?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Reply | Threaded
Open this post in threaded view
|

Re: add_partial_path() may remove dominated path but still in use

Kohei KaiGai-4
2018年12月31日(月) 13:10 Amit Kapila <[hidden email]>:

>
> On Sun, Dec 30, 2018 at 9:01 AM Kohei KaiGai <[hidden email]> wrote:
> > 2018年12月30日(日) 4:12 Tom Lane <[hidden email]>:
> > >
> > > Kohei KaiGai <[hidden email]> writes:
> > > > 2018年12月29日(土) 1:44 Tom Lane <[hidden email]>:
> > > >> However, first I'd like to know why this situation is arising in the first
> > > >> place.  To have the situation you're describing, we'd have to have
> > > >> attempted to make some Gather paths before we have all the partial paths
> > > >> for the relation they're for.  Why is that a good thing to do?  It seems
> > > >> like such Gathers are necessarily being made with incomplete information,
> > > >> and we'd be better off to fix things so that none are made till later.
> > >
> > > > Because of the hook location, Gather-node shall be constructed with built-in
> > > > and foreign partial scan node first, then extension gets a chance to add its
> > > > custom paths (partial and full).
> > > > At the set_rel_pathlist(), set_rel_pathlist_hook() is invoked next to the
> > > > generate_gather_paths().
> > >
> > > Hmm.  I'm inclined to think that we should have a separate hook
> > > in which extensions are allowed to add partial paths, and that
> > > set_rel_pathlist_hook should only be allowed to add regular paths.
>
> +1.  This idea sounds sensible to me.
>
> > >
> > I have almost same opinion, but the first hook does not need to be
> > dedicated for partial paths. As like set_foreign_pathlist() doing, we can
> > add both of partial and regular paths here, then generate_gather_paths()
> > may generate a Gather-path on top of the best partial-path.
> >
>
> Won't it be confusing for users if we allow both partial and full
> paths in first hook and only full paths in the second hook?
> Basically, in many cases, the second hook won't be of much use.  What
> advantage you are seeing in allowing both partial and full paths in
> the first hook?
>
Two advantages. The first one is, it follows same manner of
set_foreign_pathlist()
which allows to add both of full and partial path if FDW supports parallel-scan.
The second one is practical. During the path construction, extension needs to
check availability to run (e.g, whether operators in WHERE-clause is supported
on GPU device...), calculate its estimated cost and so on. Not a small
portion of
them are common for both of full and partial path. So, if the first
hook accepts to
add both kind of paths at once, extension can share the common properties.

Probably, the second hook is only used for a few corner case where an extension
wants to manipulate path-list already built, like pg_hint_plan.

Thanks,
--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <[hidden email]>

Reply | Threaded
Open this post in threaded view
|

Re: add_partial_path() may remove dominated path but still in use

akapila
On Mon, Dec 31, 2018 at 5:48 PM Kohei KaiGai <[hidden email]> wrote:

>
> 2018年12月31日(月) 13:10 Amit Kapila <[hidden email]>:
> >
> > On Sun, Dec 30, 2018 at 9:01 AM Kohei KaiGai <[hidden email]> wrote:
> > > 2018年12月30日(日) 4:12 Tom Lane <[hidden email]>:
> > > >
> > > > Kohei KaiGai <[hidden email]> writes:
> > > > > 2018年12月29日(土) 1:44 Tom Lane <[hidden email]>:
> > > > >> However, first I'd like to know why this situation is arising in the first
> > > > >> place.  To have the situation you're describing, we'd have to have
> > > > >> attempted to make some Gather paths before we have all the partial paths
> > > > >> for the relation they're for.  Why is that a good thing to do?  It seems
> > > > >> like such Gathers are necessarily being made with incomplete information,
> > > > >> and we'd be better off to fix things so that none are made till later.
> > > >
> > > > > Because of the hook location, Gather-node shall be constructed with built-in
> > > > > and foreign partial scan node first, then extension gets a chance to add its
> > > > > custom paths (partial and full).
> > > > > At the set_rel_pathlist(), set_rel_pathlist_hook() is invoked next to the
> > > > > generate_gather_paths().
> > > >
> > > > Hmm.  I'm inclined to think that we should have a separate hook
> > > > in which extensions are allowed to add partial paths, and that
> > > > set_rel_pathlist_hook should only be allowed to add regular paths.
> >
> > +1.  This idea sounds sensible to me.
> >
> > > >
> > > I have almost same opinion, but the first hook does not need to be
> > > dedicated for partial paths. As like set_foreign_pathlist() doing, we can
> > > add both of partial and regular paths here, then generate_gather_paths()
> > > may generate a Gather-path on top of the best partial-path.
> > >
> >
> > Won't it be confusing for users if we allow both partial and full
> > paths in first hook and only full paths in the second hook?
> > Basically, in many cases, the second hook won't be of much use.  What
> > advantage you are seeing in allowing both partial and full paths in
> > the first hook?
> >
> Two advantages. The first one is, it follows same manner of
> set_foreign_pathlist()
> which allows to add both of full and partial path if FDW supports parallel-scan.
> The second one is practical. During the path construction, extension needs to
> check availability to run (e.g, whether operators in WHERE-clause is supported
> on GPU device...), calculate its estimated cost and so on. Not a small
> portion of
> them are common for both of full and partial path. So, if the first
> hook accepts to
> add both kind of paths at once, extension can share the common properties.
>

You have a point, though I am not sure how much difference it can
create for cost computation as ideally, both will have different
costing model.  I understand there are some savings by avoiding some
common work, is there any way to cache the required information?

> Probably, the second hook is only used for a few corner case where an extension
> wants to manipulate path-list already built, like pg_hint_plan.
>

Okay, but it could be some work for extension authors who are using
the current hook, not sure they would like to divide the work between
first and second hook.


--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Reply | Threaded
Open this post in threaded view
|

Re: add_partial_path() may remove dominated path but still in use

Kohei KaiGai-4
2018年12月31日(月) 22:25 Amit Kapila <[hidden email]>:

>
> On Mon, Dec 31, 2018 at 5:48 PM Kohei KaiGai <[hidden email]> wrote:
> >
> > 2018年12月31日(月) 13:10 Amit Kapila <[hidden email]>:
> > >
> > > On Sun, Dec 30, 2018 at 9:01 AM Kohei KaiGai <[hidden email]> wrote:
> > > > 2018年12月30日(日) 4:12 Tom Lane <[hidden email]>:
> > > > >
> > > > > Kohei KaiGai <[hidden email]> writes:
> > > > > > 2018年12月29日(土) 1:44 Tom Lane <[hidden email]>:
> > > > > >> However, first I'd like to know why this situation is arising in the first
> > > > > >> place.  To have the situation you're describing, we'd have to have
> > > > > >> attempted to make some Gather paths before we have all the partial paths
> > > > > >> for the relation they're for.  Why is that a good thing to do?  It seems
> > > > > >> like such Gathers are necessarily being made with incomplete information,
> > > > > >> and we'd be better off to fix things so that none are made till later.
> > > > >
> > > > > > Because of the hook location, Gather-node shall be constructed with built-in
> > > > > > and foreign partial scan node first, then extension gets a chance to add its
> > > > > > custom paths (partial and full).
> > > > > > At the set_rel_pathlist(), set_rel_pathlist_hook() is invoked next to the
> > > > > > generate_gather_paths().
> > > > >
> > > > > Hmm.  I'm inclined to think that we should have a separate hook
> > > > > in which extensions are allowed to add partial paths, and that
> > > > > set_rel_pathlist_hook should only be allowed to add regular paths.
> > >
> > > +1.  This idea sounds sensible to me.
> > >
> > > > >
> > > > I have almost same opinion, but the first hook does not need to be
> > > > dedicated for partial paths. As like set_foreign_pathlist() doing, we can
> > > > add both of partial and regular paths here, then generate_gather_paths()
> > > > may generate a Gather-path on top of the best partial-path.
> > > >
> > >
> > > Won't it be confusing for users if we allow both partial and full
> > > paths in first hook and only full paths in the second hook?
> > > Basically, in many cases, the second hook won't be of much use.  What
> > > advantage you are seeing in allowing both partial and full paths in
> > > the first hook?
> > >
> > Two advantages. The first one is, it follows same manner of
> > set_foreign_pathlist()
> > which allows to add both of full and partial path if FDW supports parallel-scan.
> > The second one is practical. During the path construction, extension needs to
> > check availability to run (e.g, whether operators in WHERE-clause is supported
> > on GPU device...), calculate its estimated cost and so on. Not a small
> > portion of
> > them are common for both of full and partial path. So, if the first
> > hook accepts to
> > add both kind of paths at once, extension can share the common properties.
> >
>
> You have a point, though I am not sure how much difference it can
> create for cost computation as ideally, both will have different
> costing model.  I understand there are some savings by avoiding some
> common work, is there any way to cache the required information?
>
I have no idea for the clean way.
We may be able to have an opaque pointer for extension usage, however,
it may be problematic if multiple extension uses the hook.

> > Probably, the second hook is only used for a few corner case where an extension
> > wants to manipulate path-list already built, like pg_hint_plan.
> >
>
> Okay, but it could be some work for extension authors who are using
> the current hook, not sure they would like to divide the work between
> first and second hook.
>
I guess they don't divide their code, but choose either of them.
In case of PG-Strom, even if there are two hooks around the point, it will use
the first hook only, unless it does not prohibit to call add_path() here.
However, some adjustments are required. Its current implementation makes
GatherPath node with partial CustomScanPath because set_rel_pathlist_hook()
is called after the generate_gather_paths().
Once we could choose the first hook, no need to make a GatherPath by itself,
because PostgreSQL-core will make the path if partial custom-path is enough
reasonable cost. Likely, this adjustment is more preferable one.

Thanks,
--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <[hidden email]>

Reply | Threaded
Open this post in threaded view
|

Re: add_partial_path() may remove dominated path but still in use

Kohei KaiGai-4
I tried to make a patch to have dual hooks at set_rel_pathlist(), and
adjusted PG-Strom for the new design. It stopped to create GatherPath
by itself, just added a partial path for the base relation.
It successfully made a plan using parallel custom-scan node, without
system crash.

As I mentioned above, it does not use the new "post_rel_pathlist_hook"
because we can add both of partial/regular path-node at the first hook
with no particular problems.

Thanks,

dbt3=# explain select
    sum(l_extendedprice * l_discount) as revenue
from
    lineitem
where
    l_shipdate >= date '1994-01-01'
    and l_shipdate < cast(date '1994-01-01' + interval '1 year' as date)
    and l_discount between 0.08 - 0.01 and 0.08 + 0.01
    and l_quantity < 24 limit 1;
                                                      QUERY PLAN
----------------------------------------------------------------------------------------------------------------
 Limit  (cost=144332.62..144332.63 rows=1 width=4)
   ->  Aggregate  (cost=144332.62..144332.63 rows=1 width=4)
         ->  Gather  (cost=144285.70..144329.56 rows=408 width=4)
               Workers Planned: 2
               ->  Parallel Custom Scan (GpuPreAgg) on lineitem
(cost=143285.70..143288.76 rows=204 width=4)
                     Reduction: NoGroup
                     Outer Scan: lineitem  (cost=1666.67..143246.16
rows=63254 width=8)
                     Outer Scan Filter: ((l_discount >= '0.07'::double
precision) AND
                                                   (l_discount <=
'0.09'::double precision) AND
                                                   (l_quantity <
'24'::double precision) AND
                                                   (l_shipdate <
'1995-01-01'::date) AND
                                                   (l_shipdate >=
'1994-01-01'::date))
(8 rows)

Thanks,

2019年1月2日(水) 22:34 Kohei KaiGai <[hidden email]>:

>
> 2018年12月31日(月) 22:25 Amit Kapila <[hidden email]>:
> >
> > On Mon, Dec 31, 2018 at 5:48 PM Kohei KaiGai <[hidden email]> wrote:
> > >
> > > 2018年12月31日(月) 13:10 Amit Kapila <[hidden email]>:
> > > >
> > > > On Sun, Dec 30, 2018 at 9:01 AM Kohei KaiGai <[hidden email]> wrote:
> > > > > 2018年12月30日(日) 4:12 Tom Lane <[hidden email]>:
> > > > > >
> > > > > > Kohei KaiGai <[hidden email]> writes:
> > > > > > > 2018年12月29日(土) 1:44 Tom Lane <[hidden email]>:
> > > > > > >> However, first I'd like to know why this situation is arising in the first
> > > > > > >> place.  To have the situation you're describing, we'd have to have
> > > > > > >> attempted to make some Gather paths before we have all the partial paths
> > > > > > >> for the relation they're for.  Why is that a good thing to do?  It seems
> > > > > > >> like such Gathers are necessarily being made with incomplete information,
> > > > > > >> and we'd be better off to fix things so that none are made till later.
> > > > > >
> > > > > > > Because of the hook location, Gather-node shall be constructed with built-in
> > > > > > > and foreign partial scan node first, then extension gets a chance to add its
> > > > > > > custom paths (partial and full).
> > > > > > > At the set_rel_pathlist(), set_rel_pathlist_hook() is invoked next to the
> > > > > > > generate_gather_paths().
> > > > > >
> > > > > > Hmm.  I'm inclined to think that we should have a separate hook
> > > > > > in which extensions are allowed to add partial paths, and that
> > > > > > set_rel_pathlist_hook should only be allowed to add regular paths.
> > > >
> > > > +1.  This idea sounds sensible to me.
> > > >
> > > > > >
> > > > > I have almost same opinion, but the first hook does not need to be
> > > > > dedicated for partial paths. As like set_foreign_pathlist() doing, we can
> > > > > add both of partial and regular paths here, then generate_gather_paths()
> > > > > may generate a Gather-path on top of the best partial-path.
> > > > >
> > > >
> > > > Won't it be confusing for users if we allow both partial and full
> > > > paths in first hook and only full paths in the second hook?
> > > > Basically, in many cases, the second hook won't be of much use.  What
> > > > advantage you are seeing in allowing both partial and full paths in
> > > > the first hook?
> > > >
> > > Two advantages. The first one is, it follows same manner of
> > > set_foreign_pathlist()
> > > which allows to add both of full and partial path if FDW supports parallel-scan.
> > > The second one is practical. During the path construction, extension needs to
> > > check availability to run (e.g, whether operators in WHERE-clause is supported
> > > on GPU device...), calculate its estimated cost and so on. Not a small
> > > portion of
> > > them are common for both of full and partial path. So, if the first
> > > hook accepts to
> > > add both kind of paths at once, extension can share the common properties.
> > >
> >
> > You have a point, though I am not sure how much difference it can
> > create for cost computation as ideally, both will have different
> > costing model.  I understand there are some savings by avoiding some
> > common work, is there any way to cache the required information?
> >
> I have no idea for the clean way.
> We may be able to have an opaque pointer for extension usage, however,
> it may be problematic if multiple extension uses the hook.
>
> > > Probably, the second hook is only used for a few corner case where an extension
> > > wants to manipulate path-list already built, like pg_hint_plan.
> > >
> >
> > Okay, but it could be some work for extension authors who are using
> > the current hook, not sure they would like to divide the work between
> > first and second hook.
> >
> I guess they don't divide their code, but choose either of them.
> In case of PG-Strom, even if there are two hooks around the point, it will use
> the first hook only, unless it does not prohibit to call add_path() here.
> However, some adjustments are required. Its current implementation makes
> GatherPath node with partial CustomScanPath because set_rel_pathlist_hook()
> is called after the generate_gather_paths().
> Once we could choose the first hook, no need to make a GatherPath by itself,
> because PostgreSQL-core will make the path if partial custom-path is enough
> reasonable cost. Likely, this adjustment is more preferable one.
>
> Thanks,
> --
> HeteroDB, Inc / The PG-Strom Project
> KaiGai Kohei <[hidden email]>


--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <[hidden email]>

pgsql-bugfix-relocation-path-set_rel_pathlist_hook.v10.patch (3K) Download Attachment
pgsql-bugfix-relocation-path-set_rel_pathlist_hook.v11.patch (3K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: add_partial_path() may remove dominated path but still in use

Kyotaro HORIGUCHI-2
In reply to this post by Kohei KaiGai-4
At Sun, 30 Dec 2018 12:31:22 +0900, Kohei KaiGai <[hidden email]> wrote in <CAOP8fzY1Oqf-LGdrZT+TAu+JajwPGn1uYnpWWUPL=[hidden email]>
> 2018年12月30日(日) 4:12 Tom Lane <[hidden email]>:
> On the other hands, the later hook must be dedicated to add regular paths,
> and also provides a chance for extensions to manipulate pre-built path-list
> including Gather-path.
> As long as I know, pg_hint_plan uses the set_rel_pathlist_hook to enforce
> a particular path-node, including Gather-node, by manipulation of the cost
> value. Horiguchi-san, is it right?

Mmm. I haven't expected that it is mentioned here.

Actually in the hook, it changes enable_* planner variables, or
directory manipuraltes path costs or even can clear and
regenerate the path list and gather paths for the parallel
case. It will be happy if we had a chance to manpurate partial
paths before genrating gahter paths.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


Reply | Threaded
Open this post in threaded view
|

Re: add_partial_path() may remove dominated path but still in use

Kohei KaiGai-4
2019年1月9日(水) 13:18 Kyotaro HORIGUCHI <[hidden email]>:

>
> At Sun, 30 Dec 2018 12:31:22 +0900, Kohei KaiGai <[hidden email]> wrote in <CAOP8fzY1Oqf-LGdrZT+TAu+JajwPGn1uYnpWWUPL=[hidden email]>
> > 2018年12月30日(日) 4:12 Tom Lane <[hidden email]>:
> > On the other hands, the later hook must be dedicated to add regular paths,
> > and also provides a chance for extensions to manipulate pre-built path-list
> > including Gather-path.
> > As long as I know, pg_hint_plan uses the set_rel_pathlist_hook to enforce
> > a particular path-node, including Gather-node, by manipulation of the cost
> > value. Horiguchi-san, is it right?
>
> Mmm. I haven't expected that it is mentioned here.
>
> Actually in the hook, it changes enable_* planner variables, or
> directory manipuraltes path costs or even can clear and
> regenerate the path list and gather paths for the parallel
> case. It will be happy if we had a chance to manpurate partial
> paths before genrating gahter paths.
>
So, is it sufficient if set_rel_pathlist_hook is just relocated in
front of the generate_gather_paths?
If we have no use case for the second hook, here is little necessity
to have the post_rel_pathlist_hook() here.
(At least, PG-Strom will use the first hook only.)

Thanks,
--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <[hidden email]>

Reply | Threaded
Open this post in threaded view
|

Re: add_partial_path() may remove dominated path but still in use

Robert Haas
On Wed, Jan 9, 2019 at 12:44 AM Kohei KaiGai <[hidden email]> wrote:
> So, is it sufficient if set_rel_pathlist_hook is just relocated in
> front of the generate_gather_paths?
> If we have no use case for the second hook, here is little necessity
> to have the post_rel_pathlist_hook() here.
> (At least, PG-Strom will use the first hook only.)

+1.  That seems like the best way to be consistent with the principle
that we need to have all the partial paths before generating any
Gather paths.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Reply | Threaded
Open this post in threaded view
|

Re: add_partial_path() may remove dominated path but still in use

Kohei KaiGai-4
2019年1月11日(金) 5:52 Robert Haas <[hidden email]>:

>
> On Wed, Jan 9, 2019 at 12:44 AM Kohei KaiGai <[hidden email]> wrote:
> > So, is it sufficient if set_rel_pathlist_hook is just relocated in
> > front of the generate_gather_paths?
> > If we have no use case for the second hook, here is little necessity
> > to have the post_rel_pathlist_hook() here.
> > (At least, PG-Strom will use the first hook only.)
>
> +1.  That seems like the best way to be consistent with the principle
> that we need to have all the partial paths before generating any
> Gather paths.
>
Patch was updated, just for relocation of the set_rel_pathlist_hook.
Please check it.

Thanks,
--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <[hidden email]>

pgsql-bugfix-relocation-path-set_rel_pathlist_hook.v10.patch (2K) Download Attachment
pgsql-bugfix-relocation-path-set_rel_pathlist_hook.v11.patch (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: add_partial_path() may remove dominated path but still in use

Robert Haas
On Thu, Jan 10, 2019 at 9:10 PM Kohei KaiGai <[hidden email]> wrote:

> 2019年1月11日(金) 5:52 Robert Haas <[hidden email]>:
> > On Wed, Jan 9, 2019 at 12:44 AM Kohei KaiGai <[hidden email]> wrote:
> > > So, is it sufficient if set_rel_pathlist_hook is just relocated in
> > > front of the generate_gather_paths?
> > > If we have no use case for the second hook, here is little necessity
> > > to have the post_rel_pathlist_hook() here.
> > > (At least, PG-Strom will use the first hook only.)
> >
> > +1.  That seems like the best way to be consistent with the principle
> > that we need to have all the partial paths before generating any
> > Gather paths.
> >
> Patch was updated, just for relocation of the set_rel_pathlist_hook.
> Please check it.

Seems reasonable to me.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Reply | Threaded
Open this post in threaded view
|

Re: add_partial_path() may remove dominated path but still in use

Kyotaro HORIGUCHI-2
Hello, sorry for the absence.

At Fri, 11 Jan 2019 11:36:43 -0500, Robert Haas <[hidden email]> wrote in <CA+TgmoYyxBgkfN_APBdxdutFMukb=[hidden email]>

> On Thu, Jan 10, 2019 at 9:10 PM Kohei KaiGai <[hidden email]> wrote:
> > 2019年1月11日(金) 5:52 Robert Haas <[hidden email]>:
> > > On Wed, Jan 9, 2019 at 12:44 AM Kohei KaiGai <[hidden email]> wrote:
> > > > So, is it sufficient if set_rel_pathlist_hook is just relocated in
> > > > front of the generate_gather_paths?
> > > > If we have no use case for the second hook, here is little necessity
> > > > to have the post_rel_pathlist_hook() here.
> > > > (At least, PG-Strom will use the first hook only.)
> > >
> > > +1.  That seems like the best way to be consistent with the principle
> > > that we need to have all the partial paths before generating any
> > > Gather paths.
> > >
> > Patch was updated, just for relocation of the set_rel_pathlist_hook.
> > Please check it.
>
> Seems reasonable to me.

Also seems reasonable to me.  The extension can call
generate_gather_paths redundantly as is but it almost doesn't
harm, so it is acceptable even in a minor release.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center