path toward faster partition pruning

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
204 messages Options
1234 ... 11
Reply | Threaded
Open this post in threaded view
|

path toward faster partition pruning

Amit Langote-2
I've been working on implementing a way to perform plan-time
partition-pruning that is hopefully faster than the current method of
using constraint exclusion to prune each of the potentially many
partitions one-by-one.  It's not fully cooked yet though.

Meanwhile, I thought I'd share a couple of patches that implement some
restructuring of the planner code related to partitioned table inheritance
planning that I think would be helpful.  They are to be applied on top of
the patches being discussed at [1].  Note that these patches themselves
don't implement the actual code that replaces constraint exclusion as a
method of performing partition pruning.  I will share that patch after
debugging it some more.

The main design goal of the patches I'm sharing here now is to defer the
locking and  opening of leaf partitions in a given partition tree to a
point after set_append_rel_size() is called on the root partitioned table.
 Currently, AFAICS, we need to lock and open the child tables in
expand_inherited_rtentry() only to set the translated_vars field in
AppendRelInfo that we create for the child.  ISTM, we can defer the
creation of a child AppendRelInfo to a point when it (its field
translated_vars in particular) will actually be used and so lock and open
the child tables only at such a time.  Although we don't lock and open the
partition child tables in expand_inherited_rtentry(), their RT entries are
still created and added to root->parse->rtable, so that
setup_simple_rel_arrays() knows the maximum number of entries
root->simple_rel_array will need to hold and allocate the memory for that
array accordingly.   Slots in simple_rel_array[] corresponding to
partition child tables will be empty until they are created when
set_append_rel_size() is called on the root parent table and it determines
the partitions that will be scanned after all.

Patch augments the existing PartitionedChildRelInfo node, which currently
holds only the partitioned child rel RT indexes, to carry some more
information about the partition tree, which includes the information
returned by RelationGetPartitionDispatchInfo() when it is called from
expand_inherited_rtentry() (per the proposed patch in [1], we call it to
be able to add partitions to the query tree in the bound order).
Actually, since PartitionedChildRelInfo now contains more information
about the partition tree than it used to before, I thought the struct's
name is no longer relevant, so renamed it to PartitionRootInfo and renamed
root->pcinfo_list accordingly to prinfo_list.  That seems okay because we
only use that node internally.

Then during the add_base_rels_to_query() step, when build_simple_rel()
builds a RelOptInfo for the root partitioned table, it also initializes
some newly introduced fields in RelOptInfo from the information contained
in PartitionRootInfo of the table.  The aforementioned fields are only
initialized in RelOptInfos of root partitioned tables.  Note that the
add_base_rels_to_query() step won't add the partition "otherrel"
RelOptInfos yet (unlike the regular inheritance case, where they are,
after looking them up in root->append_rel_list).

When set_append_rel_size() is called on the root partitioned table, it
will call a find_partitions_for_query(), which using the partition tree
information, determines the partitions that will need to be scanned for
the query.  This processing happens recursively, that is, we first
determine the root-parent's partitions and then for each partition that's
partitioned, we will determine its partitions and so on.  As we determine
partitions in this per-partitioned-table manner, we maintain a pair
(parent_relid, list-of-partition-relids-to-scan) for each partitioned
table and also a single list of all leaf partitions determined so far.
Once all partitions have been determined, we turn to locking the leaf
partitions.  The locking happens in the order of OIDs as
find_all_inheritors would have returned in expand_inherited_rtentry(); the
list of OIDs in that original order is also stored in the table's
PartitionRootInfo node.  For each OID in that list, check if that OID is
in the set of leaf partition OIDs that was just computed, and if so, lock
it.  For all chosen partitions that are partitioned tables (including the
root), we create a PartitionAppendInfo node which stores the
aforementioned pair (parent_relid, list-of-partitions-relids-to-scan), and
append it to a list in the root table's RelOptInfo, with the root table's
PartitionAppendInfo at the head of the list.  Note that the list of
partitions in this pair contains only the immediate partitions, so that
the original parent-child relationship is reflected in the list of
PartitionAppendInfos thus collected.  The next patch that will implement
actual partition-pruning will add some more code that will run under
find_partitions_for_query().

set_append_rel_size() processing then continues for the root partitioned
table.  It is at this point that we will create the RelOptInfos and
AppendRelInfos for partitions.  First for those of the root partitioned
table and then for those of each partitioned table when
set_append_rel_size() will be recursively called for the latter.


Note that this is still largely a WIP patch and the implementation details
might change per both the feedback here and the discussion at [1].

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/befd7ec9-8f4c-6928-d330-ab05dbf860bf%40lab.ntt.co.jp


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

0001-Teach-pg_inherits.c-a-bit-about-partitioning.patch (36K) Download Attachment
0002-Allow-locking-only-partitioned-children-in-partition.patch (19K) Download Attachment
0003-WIP-Defer-opening-and-locking-partitions-to-set_appe.patch (76K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: path toward faster partition pruning

Ashutosh Bapat
On Mon, Aug 21, 2017 at 12:07 PM, Amit Langote
<[hidden email]> wrote:

> I've been working on implementing a way to perform plan-time
> partition-pruning that is hopefully faster than the current method of
> using constraint exclusion to prune each of the potentially many
> partitions one-by-one.  It's not fully cooked yet though.
>
> Meanwhile, I thought I'd share a couple of patches that implement some
> restructuring of the planner code related to partitioned table inheritance
> planning that I think would be helpful.  They are to be applied on top of
> the patches being discussed at [1].  Note that these patches themselves
> don't implement the actual code that replaces constraint exclusion as a
> method of performing partition pruning.  I will share that patch after
> debugging it some more.
>
> The main design goal of the patches I'm sharing here now is to defer the
> locking and  opening of leaf partitions in a given partition tree to a
> point after set_append_rel_size() is called on the root partitioned table.
>  Currently, AFAICS, we need to lock and open the child tables in
> expand_inherited_rtentry() only to set the translated_vars field in
> AppendRelInfo that we create for the child.  ISTM, we can defer the
> creation of a child AppendRelInfo to a point when it (its field
> translated_vars in particular) will actually be used and so lock and open
> the child tables only at such a time.  Although we don't lock and open the
> partition child tables in expand_inherited_rtentry(), their RT entries are
> still created and added to root->parse->rtable, so that
> setup_simple_rel_arrays() knows the maximum number of entries
> root->simple_rel_array will need to hold and allocate the memory for that
> array accordingly.   Slots in simple_rel_array[] corresponding to
> partition child tables will be empty until they are created when
> set_append_rel_size() is called on the root parent table and it determines
> the partitions that will be scanned after all.

The partition pruning can happen only after the quals have been
distributed to Rels i.e. after deconstruct_jointree(),
reconsider_outer_join_clauses() and generate_base_implied_equalities()
have been called. If the goal is to not heap_open() the partitions
which are pruned, we can't do that in expand_inherited_rtentry(). One
reason why I think we don't want to heap_open() partition relations is
to avoid relcache bloat because of opened partition relations, which
are ultimately pruned. But please note that according to your patches,
we still need to populate catalog caches to get relkind and reltype
etc.

There are many functions that traverse simple_rel_array[] after it's
created. Most of them assume that the empty entries in that array
correspond to non-simple range entries like join RTEs. But now we are
breaking that assumption. Most of these functions also skip "other"
relations, so that may be OK now, but I am not sure if it's really
going to be fine if we keep empty slots in place of partition
relations. There may be three options here 1. add placeholder
RelOptInfos for partition relations (may be mark those specially) and
mark the ones which get pruned as dummy later. 2. Prune the partitions
before any functions scans simple_rel_array[] or postpone creating
simple_rel_array till pruning. 3. Examine all the current scanners
esp. the ones which will be called before pruning to make sure that
skipping "other" relations is going to be kosher.

>
> Patch augments the existing PartitionedChildRelInfo node, which currently
> holds only the partitioned child rel RT indexes, to carry some more
> information about the partition tree, which includes the information
> returned by RelationGetPartitionDispatchInfo() when it is called from
> expand_inherited_rtentry() (per the proposed patch in [1], we call it to
> be able to add partitions to the query tree in the bound order).
> Actually, since PartitionedChildRelInfo now contains more information
> about the partition tree than it used to before, I thought the struct's
> name is no longer relevant, so renamed it to PartitionRootInfo and renamed
> root->pcinfo_list accordingly to prinfo_list.  That seems okay because we
> only use that node internally.
>
> Then during the add_base_rels_to_query() step, when build_simple_rel()
> builds a RelOptInfo for the root partitioned table, it also initializes
> some newly introduced fields in RelOptInfo from the information contained
> in PartitionRootInfo of the table.  The aforementioned fields are only
> initialized in RelOptInfos of root partitioned tables.  Note that the
> add_base_rels_to_query() step won't add the partition "otherrel"
> RelOptInfos yet (unlike the regular inheritance case, where they are,
> after looking them up in root->append_rel_list).

Partition-wise join requires the partition hierarchy to be expanded
level-by-level keeping in-tact the parent-child relationship between
partitioned table and its partitions. Your patch doesn't do that and
adds all the partitioning information in the root partitioned table's
RelOptInfo. OTOH, partition-wise join patch adds partition bounds, key
expressions, OID and RelOptInfos of the immediate partitions
(including partitioned partitions) to RelOptInfo of a partitioned
table (see patch 0002 in the latest set of patches at [1]). I don't
see much point in having conflicting changes in both of our patches.
May be you should review that patch from my set  and we can find a set
of members which help both partition pruning and partition-wise join.

>
> When set_append_rel_size() is called on the root partitioned table, it
> will call a find_partitions_for_query(), which using the partition tree
> information, determines the partitions that will need to be scanned for
> the query.  This processing happens recursively, that is, we first
> determine the root-parent's partitions and then for each partition that's
> partitioned, we will determine its partitions and so on.  As we determine
> partitions in this per-partitioned-table manner, we maintain a pair
> (parent_relid, list-of-partition-relids-to-scan) for each partitioned
> table and also a single list of all leaf partitions determined so far.
> Once all partitions have been determined, we turn to locking the leaf
> partitions.  The locking happens in the order of OIDs as
> find_all_inheritors would have returned in expand_inherited_rtentry(); the
> list of OIDs in that original order is also stored in the table's
> PartitionRootInfo node.  For each OID in that list, check if that OID is
> in the set of leaf partition OIDs that was just computed, and if so, lock
> it.  For all chosen partitions that are partitioned tables (including the
> root), we create a PartitionAppendInfo node which stores the
> aforementioned pair (parent_relid, list-of-partitions-relids-to-scan), and
> append it to a list in the root table's RelOptInfo, with the root table's
> PartitionAppendInfo at the head of the list.  Note that the list of
> partitions in this pair contains only the immediate partitions, so that
> the original parent-child relationship is reflected in the list of
> PartitionAppendInfos thus collected.  The next patch that will implement
> actual partition-pruning will add some more code that will run under
> find_partitions_for_query().
>
> set_append_rel_size() processing then continues for the root partitioned
> table.  It is at this point that we will create the RelOptInfos and
> AppendRelInfos for partitions.  First for those of the root partitioned
> table and then for those of each partitioned table when
> set_append_rel_size() will be recursively called for the latter.

set_append_rel_size(), set_append_rel_pathlist() are already
recursive, so if we process expansion and pruning for one level in
those functions, the recursion will automatically take care of doing
so for every level.

>
>
> Note that this is still largely a WIP patch and the implementation details
> might change per both the feedback here and the discussion at [1].

The changes to code which handle expansion in this patch set should
really be part of expansion in bound order thread so that it's easy to
review all changes together. And this thread can then only concentrate
on partition pruning.

[1] http://postgr.es/m/CAFjFpRd9Vqh_=-Ldv-XqWY006d07TJ+VXuhXCbdj=P1jukYBrw@...

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: path toward faster partition pruning

Amit Langote-2
Hi Ashutosh,

Thanks for the comments and sorry that it took me a while to reply here.

On 2017/08/23 20:16, Ashutosh Bapat wrote:

> On Mon, Aug 21, 2017 at 12:07 PM, Amit Langote
> <[hidden email]> wrote:
>> I've been working on implementing a way to perform plan-time
>> partition-pruning that is hopefully faster than the current method of
>> using constraint exclusion to prune each of the potentially many
>> partitions one-by-one.  It's not fully cooked yet though.
>>
>> Meanwhile, I thought I'd share a couple of patches that implement some
>> restructuring of the planner code related to partitioned table inheritance
>> planning that I think would be helpful.  They are to be applied on top of
>> the patches being discussed at [1].  Note that these patches themselves
>> don't implement the actual code that replaces constraint exclusion as a
>> method of performing partition pruning.  I will share that patch after
>> debugging it some more.
>>
>> The main design goal of the patches I'm sharing here now is to defer the
>> locking and  opening of leaf partitions in a given partition tree to a
>> point after set_append_rel_size() is called on the root partitioned table.
>>  Currently, AFAICS, we need to lock and open the child tables in
>> expand_inherited_rtentry() only to set the translated_vars field in
>> AppendRelInfo that we create for the child.  ISTM, we can defer the
>> creation of a child AppendRelInfo to a point when it (its field
>> translated_vars in particular) will actually be used and so lock and open
>> the child tables only at such a time.  Although we don't lock and open the
>> partition child tables in expand_inherited_rtentry(), their RT entries are
>> still created and added to root->parse->rtable, so that
>> setup_simple_rel_arrays() knows the maximum number of entries
>> root->simple_rel_array will need to hold and allocate the memory for that
>> array accordingly.   Slots in simple_rel_array[] corresponding to
>> partition child tables will be empty until they are created when
>> set_append_rel_size() is called on the root parent table and it determines
>> the partitions that will be scanned after all.
>
> The partition pruning can happen only after the quals have been
> distributed to Rels i.e. after deconstruct_jointree(),
> reconsider_outer_join_clauses() and generate_base_implied_equalities()
> have been called. If the goal is to not heap_open() the partitions
> which are pruned, we can't do that in expand_inherited_rtentry(). One
> reason why I think we don't want to heap_open() partition relations is
> to avoid relcache bloat because of opened partition relations, which
> are ultimately pruned. But please note that according to your patches,
> we still need to populate catalog caches to get relkind and reltype
> etc.

Yes, we still hit syscache for *all* partitions.  I haven't yet thought
very hard about avoiding that altogether.

> There are many functions that traverse simple_rel_array[] after it's
> created. Most of them assume that the empty entries in that array
> correspond to non-simple range entries like join RTEs. But now we are
> breaking that assumption. Most of these functions also skip "other"
> relations, so that may be OK now, but I am not sure if it's really
> going to be fine if we keep empty slots in place of partition
> relations. There may be three options here 1. add placeholder
> RelOptInfos for partition relations (may be mark those specially) and
> mark the ones which get pruned as dummy later. 2. Prune the partitions
> before any functions scans simple_rel_array[] or postpone creating
> simple_rel_array till pruning. 3. Examine all the current scanners
> esp. the ones which will be called before pruning to make sure that
> skipping "other" relations is going to be kosher.

Between the point when slots in simple_rel_array are allocated
(setup_simple_rel_arrays) and partition RelOptInfos are actually created
after the partition-pruning step has occurred (set_append_rel_size), it
seems that most places that iterate over simple_rel_array know also to
skip slots containing NULL values.  We might need to document that NULL
means partitions in addition to its current meaning - non-baserels.

>> Patch augments the existing PartitionedChildRelInfo node, which currently
>> holds only the partitioned child rel RT indexes, to carry some more
>> information about the partition tree, which includes the information
>> returned by RelationGetPartitionDispatchInfo() when it is called from
>> expand_inherited_rtentry() (per the proposed patch in [1], we call it to
>> be able to add partitions to the query tree in the bound order).
>> Actually, since PartitionedChildRelInfo now contains more information
>> about the partition tree than it used to before, I thought the struct's
>> name is no longer relevant, so renamed it to PartitionRootInfo and renamed
>> root->pcinfo_list accordingly to prinfo_list.  That seems okay because we
>> only use that node internally.
>>
>> Then during the add_base_rels_to_query() step, when build_simple_rel()
>> builds a RelOptInfo for the root partitioned table, it also initializes
>> some newly introduced fields in RelOptInfo from the information contained
>> in PartitionRootInfo of the table.  The aforementioned fields are only
>> initialized in RelOptInfos of root partitioned tables.  Note that the
>> add_base_rels_to_query() step won't add the partition "otherrel"
>> RelOptInfos yet (unlike the regular inheritance case, where they are,
>> after looking them up in root->append_rel_list).
>
> Partition-wise join requires the partition hierarchy to be expanded
> level-by-level keeping in-tact the parent-child relationship between
> partitioned table and its partitions. Your patch doesn't do that and
> adds all the partitioning information in the root partitioned table's
> RelOptInfo. OTOH, partition-wise join patch adds partition bounds, key
> expressions, OID and RelOptInfos of the immediate partitions
> (including partitioned partitions) to RelOptInfo of a partitioned
> table (see patch 0002 in the latest set of patches at [1]). I don't
> see much point in having conflicting changes in both of our patches.
> May be you should review that patch from my set  and we can find a set
> of members which help both partition pruning and partition-wise join.

Yes, I think it would be a good idea for the partition-pruning patch to
initialize those fields in the individual parents' RelOptInfos.  I will
review relevant patches in the partitionwise-join thread to see what can
be incorporated here.

>> When set_append_rel_size() is called on the root partitioned table, it
>> will call a find_partitions_for_query(), which using the partition tree
>> information, determines the partitions that will need to be scanned for
>> the query.  This processing happens recursively, that is, we first
>> determine the root-parent's partitions and then for each partition that's
>> partitioned, we will determine its partitions and so on.  As we determine
>> partitions in this per-partitioned-table manner, we maintain a pair
>> (parent_relid, list-of-partition-relids-to-scan) for each partitioned
>> table and also a single list of all leaf partitions determined so far.
>> Once all partitions have been determined, we turn to locking the leaf
>> partitions.  The locking happens in the order of OIDs as
>> find_all_inheritors would have returned in expand_inherited_rtentry(); the
>> list of OIDs in that original order is also stored in the table's
>> PartitionRootInfo node.  For each OID in that list, check if that OID is
>> in the set of leaf partition OIDs that was just computed, and if so, lock
>> it.  For all chosen partitions that are partitioned tables (including the
>> root), we create a PartitionAppendInfo node which stores the
>> aforementioned pair (parent_relid, list-of-partitions-relids-to-scan), and
>> append it to a list in the root table's RelOptInfo, with the root table's
>> PartitionAppendInfo at the head of the list.  Note that the list of
>> partitions in this pair contains only the immediate partitions, so that
>> the original parent-child relationship is reflected in the list of
>> PartitionAppendInfos thus collected.  The next patch that will implement
>> actual partition-pruning will add some more code that will run under
>> find_partitions_for_query().
>>
>> set_append_rel_size() processing then continues for the root partitioned
>> table.  It is at this point that we will create the RelOptInfos and
>> AppendRelInfos for partitions.  First for those of the root partitioned
>> table and then for those of each partitioned table when
>> set_append_rel_size() will be recursively called for the latter.
>
> set_append_rel_size(), set_append_rel_pathlist() are already
> recursive, so if we process expansion and pruning for one level in
> those functions, the recursion will automatically take care of doing
> so for every level.

My only worry about that is locking order of leaf partitions will be
different from concurrent backends if we lock them in an order dictated by
traversing the partition tree level at a time.  Because such traversal
will presumably proceed in the partition bound order.

The patch computes *all* leaf partitions that will need to be scanned by
the query (after pruning needless ones) and lock the chosen partitions in
the original order (it keeps the original order OID list generated by
find_all_inheritors around for this purpose).  While computing the leaf
partitions, it remembers immediate parent-child relationships in the
process.  The way it does it is by processing the partition tree in a
recursive depth-first manner, and in each recursive step, creating a
PartitionAppendInfo that maps a parent table to its immediate children
(only those that will satisfy the query).  Once the PartitionAppendInfo's
for all the parents in the partition tree have been computed, we resume
the set_append_rel_size() processing, which takes the root parent's
PartitionAppendInfo and builds RelOptInfos and AppendRelInfos for its
immediate children.  Its children that are partitioned tables themselves
will recursively call set_append_rel_size() and look up its own
PartitionAppendInfo and create RelOptInfos and AppendRelInfos for its
children and so on.

>> Note that this is still largely a WIP patch and the implementation details
>> might change per both the feedback here and the discussion at [1].
>
> The changes to code which handle expansion in this patch set should
> really be part of expansion in bound order thread so that it's easy to
> review all changes together. And this thread can then only concentrate
> on partition pruning.

I think I agree.  I'm posting today the patches that actually implement
partition-pruning.  The previous patches do seem to belong on the EIBO
thread, but will post them together here today for convenience of being
able to apply them to HEAD and try out.  Now that Robert has posted a
patch to implement depth-first EIBO, I will have to find a way to rebase
the actual partition-pruning patches on this thread so that its core logic
works at all by finding the information it needs.

Thanks,
Amit



--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: path toward faster partition pruning

Amit Langote-2
In reply to this post by Amit Langote-2
On 2017/08/21 15:37, Amit Langote wrote:

> Meanwhile, I thought I'd share a couple of patches that implement some
> restructuring of the planner code related to partitioned table inheritance
> planning that I think would be helpful.  They are to be applied on top of
> the patches being discussed at [1].  Note that these patches themselves
> don't implement the actual code that replaces constraint exclusion as a
> method of performing partition pruning.  I will share that patch after
> debugging it some more.
>
> The next patch that will implement
> actual partition-pruning will add some more code that will run under
> find_partitions_for_query().
Attached is now also the set of patches that implement the actual
partition-pruning logic, viz. the last 3 patches (0004, 0005, and 0006) of
the attached.

Because the patch helps avoid performing constraint exclusion on *all*
partitions for a given query, one might expect this to improve performance
for queries on partitioned tables and scale to a fairly large number of
partitions.  Here are some numbers for the partitioned table and the query
shown below:

\d+ ptab
Columns: (a date, b int, c text)
Partition key: RANGE (a, b)
Partitions:
ptab_00001 FOR VALUES FROM ('2017-08-31', 1) TO ('2017-08-31', 1000),
ptab_00002 FOR VALUES FROM ('2017-08-31', 1000) TO ('2017-08-31', 2000),
ptab_00003 FOR VALUES FROM ('2017-08-31', 2000) TO ('2017-08-31', 3000),
ptab_00004 FOR VALUES FROM ('2017-08-31', 3000) TO ('2017-08-31', 4000),
ptab_00005 FOR VALUES FROM ('2017-08-31', 4000) TO ('2017-08-31', 5000),

ptab_00006 FOR VALUES FROM ('2017-09-01', 1) TO ('2017-09-01', 1000),
ptab_00007 FOR VALUES FROM ('2017-09-01', 1000) TO ('2017-09-01', 2000),

...
ptab_NNNNN FOR VALUES FROM (..., 4000) TO (..., 5000),

A query that prunes all partitions (empty result!):

explain select * from ptab where a < '2017-08-31';

Comparison of the average response times (in milliseconds) after running
the same query 100 times using pgbench against the database:

  #: Number of partitions of ptab
c_e: Constraint exclusion
f_p: Fast pruning


    #      c_e      f_p
=====    =====     ====
   10      0.7      0.4
   50      1.8      0.6
  100      3.2      0.8
  500     16.8      2.7
 1000     36.2      5.0
 2000     79.7     10.2
 5000    214.7     27.0
10000    443.6     64.8

For each partitioned table in a given partition tree (provided it is not
pruned to begin with), the query's clauses are matched to its partition
key and from the matched clauses, a pair of bounding keys (Datum tuples
with <= key->partnatts values for possibly a prefix of a multi-column key)
is generated.  They are passed to partition.c: get_partitions_for_keys()
as Datum *minkeys and Datum *maxkeys.  A list of partitions covering that
key range is returned.  When generating that list, whether a particular
scan key is inclusive or not is considered along with the partitioning
strategy.  It should be possible to support hash partitioning with
(hopefully) minimal changes to get_partitions_for_keys().

There are still certain limitations on the planner side of things:

1. OR clauses are not counted as contributing toward bounding scan keys;
   currently only OpExprs and NullTests are recognized, so an OR clause
   would get skipped from consideration when generating the bounding keys
   to pass to partition.c

2. Redundant clauses are not appropriately pre-processed; so if a query
   contains a = 10 and a > 1, the latter clause will be matched and
   partitions holding values with a > 1 and a < 10 will not be pruned,
   even if none of their rows will pass the query's condition

Fixing these limitations, adding more regression tests and implementing
some of the things suggested by Ashutosh Bapat [1] to prevent conflicting
changes with some preparatory patches in the partitionwise-join patch
series [2] are TODOs.

Adding this to CF-201709 as "faster partition pruning in planner".

To try out the attached patches: apply the patches posted at [3] on HEAD
and then apply these

Thanks,
Amit

[1]
https://postgr.es/m/CAFjFpRdb_fkmJHFjvAbB%2BLn0t45fWjekLd5pY%3Dsv%2BeAhBAKXPQ%40mail.gmail.com

[2]
https://postgr.es/m/CAFjFpRd9Vqh_=-Ldv-XqWY006d07TJ+VXuhXCbdj=P1jukYBrw@...

[3]
https://www.postgresql.org/message-id/2124e99f-9528-0f71-4e10-ac7974dd7077%40lab.ntt.co.jp


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

0001-Teach-pg_inherits.c-a-bit-about-partitioning.patch (36K) Download Attachment
0002-Allow-locking-only-partitioned-children-in-partition.patch (19K) Download Attachment
0003-WIP-Defer-opening-and-locking-partitions-to-set_appe.patch (82K) Download Attachment
0004-WIP-Interface-changes-for-partition_bound_-cmp-bsear.patch (12K) Download Attachment
0005-WIP-partition.c-interface-additions-for-partition-pr.patch (12K) Download Attachment
0006-WIP-planner-side-changes-for-partition-pruning.patch (40K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: path toward faster partition pruning

Robert Haas
On Thu, Aug 31, 2017 at 2:02 AM, Amit Langote
<[hidden email]> wrote:
> Attached is now also the set of patches that implement the actual
> partition-pruning logic, viz. the last 3 patches (0004, 0005, and 0006) of
> the attached.

It strikes me that this patch set is doing two things but maybe in the
opposite order that I would have chosen to attack them.  First,
there's getting partition pruning to use something other than
constraint exclusion.  Second, there's deferring work that is
currently done at an early stage of the process until later, so that
we waste less effort on partitions that are ultimately going to be
pruned.

The second one is certainly a worthwhile goal, but there are fairly
firm interdependencies between the first one and some other things
that are in progress.  For example, the first one probably ought to be
done before hash partitioning gets committed, because
constraint-exclusion based partitioning pruning won't work with
partitioning pruning, but some mechanism based on asking the
partitioning code which partitions might match will.  Such a mechanism
is more efficient for list and range partitions, but it's the only
thing that will work for hash partitions.  Also, Beena Emerson is
working on run-time partition pruning, and the more I think about it,
the more I think that overlaps with this first part.  Both patches
need a mechanism to identify, given a btree-indexable comparison
operator (< > <= >= =) and a set of values, which partitions might
contain matching values.  Run-time partition pruning will call that at
execution time, and this patch will call it at plan time, but it's the
same logic; it's just a question of the point at which the values are
known.  And of course we don't want to end up with two copies of the
logic.

Therefore, IMHO, it would be best to focus first on how we're going to
identify the partitions that survive pruning, and then afterwards work
on transposing that logic to happen before partitions are opened and
locked.  That way, we get some incremental benefit sooner, and also
unblock some other development work.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: path toward faster partition pruning

Amit Langote-2
Thanks for the comments.

On 2017/09/02 2:52, Robert Haas wrote:

> On Thu, Aug 31, 2017 at 2:02 AM, Amit Langote
> <[hidden email]> wrote:
>> Attached is now also the set of patches that implement the actual
>> partition-pruning logic, viz. the last 3 patches (0004, 0005, and 0006) of
>> the attached.
>
> It strikes me that this patch set is doing two things but maybe in the
> opposite order that I would have chosen to attack them.  First,
> there's getting partition pruning to use something other than
> constraint exclusion.  Second, there's deferring work that is
> currently done at an early stage of the process until later, so that
> we waste less effort on partitions that are ultimately going to be
> pruned.

OK.

>
> The second one is certainly a worthwhile goal, but there are fairly
> firm interdependencies between the first one and some other things
> that are in progress.  For example, the first one probably ought to be
> done before hash partitioning gets committed, because
> constraint-exclusion based partitioning pruning won't work with
> partitioning pruning, but some mechanism based on asking the
> partitioning code which partitions might match will.

Yeah.

> Such a mechanism
> is more efficient for list and range partitions, but it's the only
> thing that will work for hash partitions.  Also, Beena Emerson is
> working on run-time partition pruning, and the more I think about it,
> the more I think that overlaps with this first part.  Both patches
> need a mechanism to identify, given a btree-indexable comparison
> operator (< > <= >= =) and a set of values, which partitions might
> contain matching values.  Run-time partition pruning will call that at
> execution time, and this patch will call it at plan time, but it's the
> same logic; it's just a question of the point at which the values are
> known.  And of course we don't want to end up with two copies of the
> logic.

Agreed here too.

I agree that spending effort on the first part (deferment of locking, etc.
within the planner) does not benefit either the hash partitioning and
run-time pruning patches much.

> Therefore, IMHO, it would be best to focus first on how we're going to
> identify the partitions that survive pruning, and then afterwards work
> on transposing that logic to happen before partitions are opened and
> locked.  That way, we get some incremental benefit sooner, and also
> unblock some other development work.

Alright, I will try to do it that way.

Thanks,
Amit



--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: path toward faster partition pruning

Amit Langote-2
On 2017/09/04 10:10, Amit Langote wrote:

> On 2017/09/02 2:52, Robert Haas wrote:
>> It strikes me that this patch set is doing two things but maybe in the
>> opposite order that I would have chosen to attack them.  First,
>> there's getting partition pruning to use something other than
>> constraint exclusion.  Second, there's deferring work that is
>> currently done at an early stage of the process until later, so that
>> we waste less effort on partitions that are ultimately going to be
>> pruned.
>
> OK.
>
>> Therefore, IMHO, it would be best to focus first on how we're going to
>> identify the partitions that survive pruning, and then afterwards work
>> on transposing that logic to happen before partitions are opened and
>> locked.  That way, we get some incremental benefit sooner, and also
>> unblock some other development work.
>
> Alright, I will try to do it that way.
Attached set of patches that does things that way.  Individual patches
described below:

[PATCH 1/5] Expand partitioned inheritance in a non-flattened manner

This will allow us to perform scan and join planning in a per partition
sub-tree manner, with each sub-tree's root getting its own RelOptInfo.
Previously, only the root of the whole partition tree would get a
RelOptInfo, along with the leaf partitions, with each leaf partition's
AppendRelInfo pointing to the former as its parent.

This is essential, because we will be doing partition-pruning for every
partitioned table in the tree by matching query's scan keys with its
partition key.  We won't be able to do that if the intermediate
partitioned tables didn't have a RelOptInfo.

[PATCH 2/5] WIP: planner-side changes for partition-pruning

This patch adds a stub get_partitions_for_keys in partition.c with a
suitable interface for the caller to pass bounding keys extracted from the
query and other related information.

Importantly, it implements the logic within the planner to match query's
scan keys to a parent table's partition key and form the bounding keys
that will be passed to partition.c to compute the list of partitions that
satisfy those bounds.

Also, it adds certain fields to RelOptInfos of the partitioned tables that
reflect its partitioning properties.

[PATCH 3/5] WIP: Interface changes for partition_bound_{cmp/bsearch}

This guy modifies the partition bound comparison function so that the
caller can pass incomplete partition key tuple that is potentially a
prefix of a multi-column partition key.  Currently, the input tuple must
contain all of key->partnatts values, but that may not be true for
queries, which may not have restrictions on all the partition key columns.

[PATCH 4/5] WIP: Implement get_partitions_for_keys()

This one fills the get_partitions_for_keys stub with the actual logic that
searches the partdesc->boundinfo for partition bounds that match the
bounding keys specified by the caller.

[PATCH 5/5] Add more tests for the new partitioning-related planning

More tests.


Some TODO items still remain to be done:

* Process OR clauses to use for partition-pruning
* Process redundant clauses (such as a = 10 and a > 1) more smartly
* Other tricks that are missing
* Fix bugs
* More tests

Thanks,
Amit


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

0001-Expand-partitioned-inheritance-in-a-non-flattened-ma.patch (19K) Download Attachment
0002-WIP-planner-side-changes-for-partition-pruning.patch (39K) Download Attachment
0003-WIP-Interface-changes-for-partition_bound_-cmp-bsear.patch (12K) Download Attachment
0004-WIP-Implement-get_partitions_for_keys.patch (9K) Download Attachment
0005-Add-more-tests-for-the-new-partitioning-related-plan.patch (28K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: path toward faster partition pruning

Amit Langote-2
Forgot to mention a couple of important points about the relation of some
of the patches here to the patches and discussion at the
partitionwise-join thread [1].

On 2017/09/06 19:38, Amit Langote wrote:

> [PATCH 1/5] Expand partitioned inheritance in a non-flattened manner
>
> This will allow us to perform scan and join planning in a per partition
> sub-tree manner, with each sub-tree's root getting its own RelOptInfo.
> Previously, only the root of the whole partition tree would get a
> RelOptInfo, along with the leaf partitions, with each leaf partition's
> AppendRelInfo pointing to the former as its parent.
>
> This is essential, because we will be doing partition-pruning for every
> partitioned table in the tree by matching query's scan keys with its
> partition key.  We won't be able to do that if the intermediate
> partitioned tables didn't have a RelOptInfo.

There is a patch in the Ashutosh's posted series of patches, which does
more or less the same thing that this patch does.  He included it in his
series of patches, because, IIUC, the main partitionwise-join planning
logic that one of the later patch implements depends on being able to
consider applying that new planning technique individually for every
partition sub-tree, instead of just at the whole tree root.

One notable difference from his patch is that while his patch will expand
in non-flattened manner even in the case where the parent is the result
relation of a query, my patch doesn't in that case, because the new
partition-pruning technique cannot be applied to inheritance parent that
is a result relation, for example,

update partitioned_table set ...

And AFAICS, partitionwise-join cannot be applied to such a parent either.

Note however that if there are other instances of the same partitioned
table (in the FROM list of an update statement) or other partitioned
tables in the query, they will be expanded in a non-flattened manner,
because they are themselves not the result relations of the query.  So,
the new partition-pruning and (supposedly) partitionwise-join can be
applied for those other partitioned tables.

> [PATCH 2/5] WIP: planner-side changes for partition-pruni[...]
>
> Also, it adds certain fields to RelOptInfos of the partitioned tables that
> reflect its partitioning properties.

There is something called PartitionScheme, which is a notion one of the
Ashutosh's patches invented that this patch incorporates as one of the new
fields in RelOptInfo that I mentioned above (also a list of
PartitionScheme's in the planner-global PlannerInfo).  Although,
PartitionScheme is not significant for the task of partition-pruning
itself, it's still useful.  On Ashutosh's suggestion, I adopted the same
in my patch series, so that the partition-wise join patch doesn't end up
conflicting with the partition-pruning patch while trying to implement the
same and can get straight to the task of implementing partition-wise joins.

The same patch in the partition-wise join patch series that introduces
PartitionScheme, also introduces a field in the RelOptInfo called
partexprs, which records the partition key expressions.  Since,
partition-pruning has use for the same, I incorporated the same here;
also, in a format that Ashutosh's partition-wise patch can use directly,
instead of the latter having to hack it again to make it suitable to store
partition key expressions of joinrels.  Again, that's to minimize
conflicts and let his patch just find the field to use as is, instead of
implementing it first.

Lastly, this patch introduces a PartitionAppendInfo in a partitioned
table's RelOptInfo that stores AppendRelInfos of the partitions (child
relations) that survive partition-pruning, which serves to identify those
partitions' RelOptInfos.  Along with the identities of surviving
partitions, it also stores the partitioning configuration of the
partitioned table after partitions are pruned.  That includes
partdesc->boundinfo (which is simply a pointer into the table's relcache)
and a few other fields that are set by partition-pruning code, such
min_datum_index, max_datum_index, null_partition_chosen, that describe the
result after pruning.  So, for two partitioned tables being joined, if the
boundinfos match per partition_bounds_equal() and these other fields
match, they can be safely partition-wise joined.

[1]
https://www.postgresql.org/message-id/CAFjFpRfRDhWp%3DoguNjyzN%3DNMoOD%2BRCC3wS%2Bb%2BxbGKwKUk0dRKg%40mail.gmail.com



--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: path toward faster partition pruning

Robert Haas
On Thu, Sep 7, 2017 at 7:16 AM, Amit Langote
<[hidden email]> wrote:

> There is a patch in the Ashutosh's posted series of patches, which does
> more or less the same thing that this patch does.  He included it in his
> series of patches, because, IIUC, the main partitionwise-join planning
> logic that one of the later patch implements depends on being able to
> consider applying that new planning technique individually for every
> partition sub-tree, instead of just at the whole tree root.
>
> One notable difference from his patch is that while his patch will expand
> in non-flattened manner even in the case where the parent is the result
> relation of a query, my patch doesn't in that case, because the new
> partition-pruning technique cannot be applied to inheritance parent that
> is a result relation, for example,
>
> update partitioned_table set ...
>
> And AFAICS, partitionwise-join cannot be applied to such a parent either.
>
> Note however that if there are other instances of the same partitioned
> table (in the FROM list of an update statement) or other partitioned
> tables in the query, they will be expanded in a non-flattened manner,
> because they are themselves not the result relations of the query.  So,
> the new partition-pruning and (supposedly) partitionwise-join can be
> applied for those other partitioned tables.

It seems to me that it would be better not to write new patches for
things that already have patches without a really clear explanation
with what's wrong with the already-existing patch; I don't see any
such explanation here.  Instead of writing your own patch for this to
duel with his his, why not review his and help him correct any
deficiencies which you can spot?  Then we have one patch with more
review instead of two patches with less review both of which I have to
read and try to decide which is better.

In this case, I think Ashutosh has the right idea.  I think that
handling the result-relation and non-result-relation differently
creates an unpleasant asymmetry.  With your patch, we have to deal
with three cases: (a) partitioned tables that were expanded
level-by-level because they are not result relations, (b) partitioned
tables that were expanded "flattened" because they are result
relations, and (c) non-partitioned tables that were expanded
"flattened".  With Ashutosh's approach, we only have two cases to
worry about in the future rather than three, and I like that better.

Your patch also appears to change things so that the table actually
referenced in the query ends up with an AppendRelInfo for the parent,
which seems pointless.  And it has no tests.

There are a couple of hunks from your patch that we might want or need
to incorporate into Ashutosh's patch.  The change to
relation_excluded_by_constraints() looks like it might be useful,
although it needs a better comment and some tests.  Also, Ashutosh's
patch has no equivalent of your change to add_paths_to_append_rel().
I'm not clear what the code you've introduced there is supposed to be
doing, and I'm suspicious that it is confusing "partition root" with
"table named in the query", which will often be the same but not
always; the user could have named an intermediate partition.  Can you
expand on what this is doing?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: path toward faster partition pruning

Amit Langote-2
On 2017/09/08 4:41, Robert Haas wrote:

> On Thu, Sep 7, 2017 at 7:16 AM, Amit Langote
> <[hidden email]> wrote:
>> There is a patch in the Ashutosh's posted series of patches, which does
>> more or less the same thing that this patch does.  He included it in his
>> series of patches, because, IIUC, the main partitionwise-join planning
>> logic that one of the later patch implements depends on being able to
>> consider applying that new planning technique individually for every
>> partition sub-tree, instead of just at the whole tree root.
>>
>> One notable difference from his patch is that while his patch will expand
>> in non-flattened manner even in the case where the parent is the result
>> relation of a query, my patch doesn't in that case, because the new
>> partition-pruning technique cannot be applied to inheritance parent that
>> is a result relation, for example,
>>
>> update partitioned_table set ...
>>
>> And AFAICS, partitionwise-join cannot be applied to such a parent either.
>>
>> Note however that if there are other instances of the same partitioned
>> table (in the FROM list of an update statement) or other partitioned
>> tables in the query, they will be expanded in a non-flattened manner,
>> because they are themselves not the result relations of the query.  So,
>> the new partition-pruning and (supposedly) partitionwise-join can be
>> applied for those other partitioned tables.
>
> It seems to me that it would be better not to write new patches for
> things that already have patches without a really clear explanation
> with what's wrong with the already-existing patch; I don't see any
> such explanation here.  Instead of writing your own patch for this to
> duel with his his, why not review his and help him correct any
> deficiencies which you can spot?  Then we have one patch with more
> review instead of two patches with less review both of which I have to
> read and try to decide which is better.

Sorry, I think I should have just used the Ashutosh's patch.

> In this case, I think Ashutosh has the right idea.  I think that
> handling the result-relation and non-result-relation differently
> creates an unpleasant asymmetry.  With your patch, we have to deal
> with three cases: (a) partitioned tables that were expanded
> level-by-level because they are not result relations, (b) partitioned
> tables that were expanded "flattened" because they are result
> relations, and (c) non-partitioned tables that were expanded
> "flattened".  With Ashutosh's approach, we only have two cases to
> worry about in the future rather than three, and I like that better.

I tend to agree with this now.

> Your patch also appears to change things so that the table actually
> referenced in the query ends up with an AppendRelInfo for the parent,
> which seems pointless.

Actually, it wouldn't, because my patch also got rid of the notion of
adding the duplicate RTE for original parent, because I thought the
duplicate RTE was pointless in the partitioning case.

> There are a couple of hunks from your patch that we might want or need
> to incorporate into Ashutosh's patch.  The change to
> relation_excluded_by_constraints() looks like it might be useful,
> although it needs a better comment and some tests.

I think we could just drop that part from this patch.  It also looks like
Ashutosh has a patch elsewhere concerning this.

https://commitfest.postgresql.org/14/1108/

Maybe, we could discuss what do about this on that thread.  Now that not
only the root partitioned table, but also other partitioned tables in the
tree get an RTE with inh = true, I think it would be interesting to
consider his patch.

> Also, Ashutosh's
> patch has no equivalent of your change to add_paths_to_append_rel().
> I'm not clear what the code you've introduced there is supposed to be
> doing, and I'm suspicious that it is confusing "partition root" with
> "table named in the query", which will often be the same but not
> always; the user could have named an intermediate partition.  Can you
> expand on what this is doing?

I've replied on the partition-wise thread explaining why changes in the
add_paths_to_append_rel() are necessary.

Anyway, I'm dropping my patch in favor of the patch on the other thread.
Sorry for the duplicated effort involved in having to look at both the
patches.

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/CA%2BTgmoZEUonD9dUZH1FBEyq%3DPEv_KvE3wC%3DA%3D0zm-_tRz_917A%40mail.gmail.com



--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: path toward faster partition pruning

David Rowley-3
In reply to this post by Amit Langote-2
On 21 August 2017 at 18:37, Amit Langote <[hidden email]> wrote:
> I've been working on implementing a way to perform plan-time
> partition-pruning that is hopefully faster than the current method of
> using constraint exclusion to prune each of the potentially many
> partitions one-by-one.  It's not fully cooked yet though.

I'm interested in seeing improvements in this area, so I've put my
name down to review this.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: path toward faster partition pruning

Amit Langote-2
On 2017/09/15 10:55, David Rowley wrote:
> On 21 August 2017 at 18:37, Amit Langote <[hidden email]> wrote:
>> I've been working on implementing a way to perform plan-time
>> partition-pruning that is hopefully faster than the current method of
>> using constraint exclusion to prune each of the potentially many
>> partitions one-by-one.  It's not fully cooked yet though.
>
> I'm interested in seeing improvements in this area, so I've put my
> name down to review this.

Great, thanks!

I will post rebased patches later today, although I think the overall
design of the patch on the planner side of things is not quite there yet.
Of course, your and others' feedback is greatly welcome.

Also, I must inform to all of those who're looking at this thread that I
won't be able to respond to emails from tomorrow (9/16, Sat) until 9/23,
Sat, due to some personal business.

Thanks,
Amit



--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: path toward faster partition pruning

Dilip Kumar-2
In reply to this post by Amit Langote-2
On Wed, Sep 6, 2017 at 4:08 PM, Amit Langote
<[hidden email]> wrote:
> On 2017/09/04 10:10, Amit Langote wrote:
>> On 2017/09/02 2:52, Robert Haas wrote:

>
> [PATCH 2/5] WIP: planner-side changes for partition-pruning
>
> This patch adds a stub get_partitions_for_keys in partition.c with a
> suitable interface for the caller to pass bounding keys extracted from the
> query and other related information.
>
> Importantly, it implements the logic within the planner to match query's
> scan keys to a parent table's partition key and form the bounding keys
> that will be passed to partition.c to compute the list of partitions that
> satisfy those bounds.
>

+ Node   *leftop = get_leftop(clause);
+
+ if (IsA(leftop, RelabelType))
+ leftop = (Node *) ((RelabelType *) leftop)->arg;
+
+ if (equal(leftop, partkey))

It appears that this patch always assume that clause will be of form
"var op const", but it can also be "const op var"

That's the reason in below example where in both the queries condition
is same it can only prune in the first case but not in the second.

postgres=# explain select * from t where t.a < 2;
                       QUERY PLAN
--------------------------------------------------------
 Append  (cost=0.00..2.24 rows=1 width=8)
   ->  Seq Scan on t1  (cost=0.00..2.24 rows=1 width=8)
         Filter: (a < 2)
(3 rows)

postgres=# explain select * from t where 2>t.a;
                       QUERY PLAN
--------------------------------------------------------
 Append  (cost=0.00..4.49 rows=2 width=8)
   ->  Seq Scan on t1  (cost=0.00..2.24 rows=1 width=8)
         Filter: (2 > a)
   ->  Seq Scan on t2  (cost=0.00..2.25 rows=1 width=8)
         Filter: (2 > a)
(5 rows)

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: path toward faster partition pruning

Amit Langote-2
Hi Dilip,

Thanks for looking at the patch.

On 2017/09/15 13:43, Dilip Kumar wrote:

> On Wed, Sep 6, 2017 at 4:08 PM, Amit Langote
>> [PATCH 2/5] WIP: planner-side changes for partition-pruning
>>
>> This patch adds a stub get_partitions_for_keys in partition.c with a
>> suitable interface for the caller to pass bounding keys extracted from the
>> query and other related information.
>>
>> Importantly, it implements the logic within the planner to match query's
>> scan keys to a parent table's partition key and form the bounding keys
>> that will be passed to partition.c to compute the list of partitions that
>> satisfy those bounds.
>
> + Node   *leftop = get_leftop(clause);
> +
> + if (IsA(leftop, RelabelType))
> + leftop = (Node *) ((RelabelType *) leftop)->arg;
> +
> + if (equal(leftop, partkey))
>
> It appears that this patch always assume that clause will be of form
> "var op const", but it can also be "const op var"
>
> That's the reason in below example where in both the queries condition
> is same it can only prune in the first case but not in the second.
>
> postgres=# explain select * from t where t.a < 2;
>                        QUERY PLAN
> --------------------------------------------------------
>  Append  (cost=0.00..2.24 rows=1 width=8)
>    ->  Seq Scan on t1  (cost=0.00..2.24 rows=1 width=8)
>          Filter: (a < 2)
> (3 rows)
>
> postgres=# explain select * from t where 2>t.a;
>                        QUERY PLAN
> --------------------------------------------------------
>  Append  (cost=0.00..4.49 rows=2 width=8)
>    ->  Seq Scan on t1  (cost=0.00..2.24 rows=1 width=8)
>          Filter: (2 > a)
>    ->  Seq Scan on t2  (cost=0.00..2.25 rows=1 width=8)
>          Filter: (2 > a)
> (5 rows)

Yeah, there are a bunch of smarts still missing in that patch as it is.

Thanks,
Amit



--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: path toward faster partition pruning

Amit Langote-2
In reply to this post by Amit Langote-2
On 2017/09/15 11:16, Amit Langote wrote:
> I will post rebased patches later today, although I think the overall
> design of the patch on the planner side of things is not quite there yet.
> Of course, your and others' feedback is greatly welcome.

Rebased patches attached.  Because Dilip complained earlier today about
clauses of the form (const op var) not causing partition-pruning, I've
added code to commute the clause where it is required.  Some other
previously mentioned limitations remain -- no handling of OR clauses, no
elimination of redundant clauses for given partitioning column, etc.

A note about 0001: this patch overlaps with
0003-Canonical-partition-scheme.patch from the partitionwise-join patch
series that Ashutosh Bapat posted yesterday [1].  Because I implemented
the planner-portion of this patch based on what 0001 builds, I'm posting
it here.  It might actually turn out that we will review and commit
0003-Canonical-partition-scheme.patch on that thread, but meanwhile apply
0001 if you want to play with the later patches.  I would certainly like
to review  0003-Canonical-partition-scheme.patch myself, but won't be able
to immediately (see below).

> Also, I must inform to all of those who're looking at this thread that I
> won't be able to respond to emails from tomorrow (9/16, Sat) until 9/23,
> Sat, due to some personal business.

To remind.

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/CAFiTN-skmaqeCVaoAHCBqe2DyfO3f6sgdtEjHWrUgi0kV1yPLQ%40mail.gmail.com


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

0001-Some-optimizer-data-structures-for-partitioned-rels.patch (19K) Download Attachment
0002-WIP-planner-side-changes-for-partition-pruning.patch (25K) Download Attachment
0003-WIP-Interface-changes-for-partition_bound_-cmp-bsear.patch (13K) Download Attachment
0004-WIP-Implement-get_partitions_for_keys.patch (10K) Download Attachment
0005-Add-more-tests-for-the-new-partitioning-related-plan.patch (28K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: path toward faster partition pruning

Robert Haas
On Fri, Sep 15, 2017 at 4:50 AM, Amit Langote
<[hidden email]> wrote:
> Rebased patches attached.  Because Dilip complained earlier today about
> clauses of the form (const op var) not causing partition-pruning, I've
> added code to commute the clause where it is required.  Some other
> previously mentioned limitations remain -- no handling of OR clauses, no
> elimination of redundant clauses for given partitioning column, etc.
>
> A note about 0001: this patch overlaps with
> 0003-Canonical-partition-scheme.patch from the partitionwise-join patch
> series that Ashutosh Bapat posted yesterday [1].

It doesn't merely overlap; it's obviously a derivative work, and the
commit message in your version should credit all the authors.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: path toward faster partition pruning

Amit Langote
On Sat, Sep 16, 2017 at 4:04 AM, Robert Haas <[hidden email]> wrote:

> On Fri, Sep 15, 2017 at 4:50 AM, Amit Langote
> <[hidden email]> wrote:
>> Rebased patches attached.  Because Dilip complained earlier today about
>> clauses of the form (const op var) not causing partition-pruning, I've
>> added code to commute the clause where it is required.  Some other
>> previously mentioned limitations remain -- no handling of OR clauses, no
>> elimination of redundant clauses for given partitioning column, etc.
>>
>> A note about 0001: this patch overlaps with
>> 0003-Canonical-partition-scheme.patch from the partitionwise-join patch
>> series that Ashutosh Bapat posted yesterday [1].
>
> It doesn't merely overlap; it's obviously a derivative work,

Yes it is.  I noted that upthread [1] that most of these are derived
from Ashutosh's patch on his suggestion.  I guess I should have
repeated that in this message too, sorry.

> and the
> commit message in your version should credit all the authors.

That was a mistake on my part, too.  Will be careful hereon.

Thanks,
Amit

[1] https://www.postgresql.org/message-id/0e829199-a43c-2a66-b966-89a0020a6cd4%40lab.ntt.co.jp


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: path toward faster partition pruning

Dilip Kumar-2
In reply to this post by Amit Langote-2
On Fri, Sep 15, 2017 at 2:20 PM, Amit Langote
<[hidden email]> wrote:
> On 2017/09/15 11:16, Amit Langote wrote:

Thanks for the updated patch.  I was going through the logic of
get_rel_partitions in 0002 as almost similar functionality will be
required by runtime partition pruning on which Beena is working.  The
only difference is that here we are processing the
"rel->baserestrictinfo" and in case of runtime pruning, we also need
to process join clauses which are pushed down to appendrel.

So can we make some generic logic which can be used for both the patches.

So basically, we need to do two changes

1. In get_rel_partitions instead of processing the
"rel->baserestrictinfo" we can take clause list as input that way we
can pass any clause list to this function.

2. Don't call "get_partitions_for_keys" routine from the
"get_rel_partitions", instead, get_rel_partitions can just prepare
minkey, maxkey and the caller of the get_rel_partitions can call
get_partitions_for_keys, because for runtime pruning we need to call
get_partitions_for_keys at runtime.

After these changes also there will be one problem that the
get_partitions_for_keys is directly fetching the "rightop->constvalue"
whereas, for runtime pruning, we need to store rightop itself and
calculate the value at runtime by param evaluation,  I haven't yet
thought how can we make this last part generic.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: path toward faster partition pruning

Dilip Kumar-2
I have done some refactoring of the code where I have moved the code
of getting the matching clause into the separate function so that it
can fetch the matching clause from any set of given restriction list.

It can be applied on top of 0002-WIP:
planner-side-changes-for-partition-pruning.patch

On Sat, Sep 16, 2017 at 3:13 PM, Dilip Kumar <[hidden email]> wrote:

> On Fri, Sep 15, 2017 at 2:20 PM, Amit Langote
> <[hidden email]> wrote:
>> On 2017/09/15 11:16, Amit Langote wrote:
>
> Thanks for the updated patch.  I was going through the logic of
> get_rel_partitions in 0002 as almost similar functionality will be
> required by runtime partition pruning on which Beena is working.  The
> only difference is that here we are processing the
> "rel->baserestrictinfo" and in case of runtime pruning, we also need
> to process join clauses which are pushed down to appendrel.
>
> So can we make some generic logic which can be used for both the patches.
>
> So basically, we need to do two changes
>
> 1. In get_rel_partitions instead of processing the
> "rel->baserestrictinfo" we can take clause list as input that way we
> can pass any clause list to this function.
>
> 2. Don't call "get_partitions_for_keys" routine from the
> "get_rel_partitions", instead, get_rel_partitions can just prepare
> minkey, maxkey and the caller of the get_rel_partitions can call
> get_partitions_for_keys, because for runtime pruning we need to call
> get_partitions_for_keys at runtime.
>
> After these changes also there will be one problem that the
> get_partitions_for_keys is directly fetching the "rightop->constvalue"
> whereas, for runtime pruning, we need to store rightop itself and
> calculate the value at runtime by param evaluation,  I haven't yet
> thought how can we make this last part generic.
>
> --
> Regards,
> Dilip Kumar
> EnterpriseDB: http://www.enterprisedb.com


--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

0002-refactor_get_rel_partition.patch (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: path toward faster partition pruning

Amit Langote-2
In reply to this post by Dilip Kumar-2
Hi Dilip.

Thanks for looking at the patches and the comments.

On 2017/09/16 18:43, Dilip Kumar wrote:

> On Fri, Sep 15, 2017 at 2:20 PM, Amit Langote
> <[hidden email]> wrote:
>> On 2017/09/15 11:16, Amit Langote wrote:
>
> Thanks for the updated patch.  I was going through the logic of
> get_rel_partitions in 0002 as almost similar functionality will be
> required by runtime partition pruning on which Beena is working.  The
> only difference is that here we are processing the
> "rel->baserestrictinfo" and in case of runtime pruning, we also need
> to process join clauses which are pushed down to appendrel.

Yeah, I agree with the point you seem to be making that
get_rel_partitions() covers a lot of functionality, which it would be nice
to break down into reusable function(s) with suitable signature(s) that
the executor will also be able to use.

Your proposed refactoring patch down-thread seems to be a good step in
that direction.  Thanks for working on it.

> So can we make some generic logic which can be used for both the patches.
>
> So basically, we need to do two changes
>
> 1. In get_rel_partitions instead of processing the
> "rel->baserestrictinfo" we can take clause list as input that way we
> can pass any clause list to this function.
>
> 2. Don't call "get_partitions_for_keys" routine from the
> "get_rel_partitions", instead, get_rel_partitions can just prepare
> minkey, maxkey and the caller of the get_rel_partitions can call
> get_partitions_for_keys, because for runtime pruning we need to call
> get_partitions_for_keys at runtime.

It's not clear to me whether get_rel_partitions() itself, as it is, is
callable from outside the planner, because its signature contains
RelOptInfo.  We have the RelOptInfo in the signature, because we want to
mark certain fields in it so that latter planning steps can use them.  So,
get_rel_partitions()'s job is not just to match clauses and find
partitions, but also to perform certain planner-specific tasks of
generating information that the later planning steps will want to use.
That may turn out to be unnecessary, but until we know that, let's not try
to export get_rel_partitions() itself out of the planner.

OTOH, the function that your refactoring patch separates out to match
clauses to partition keys and extract bounding values seems reusable
outside the planner and we should export it in such a way that it can be
used in the executor.  Then, the hypothetical executor function that does
the pruning will first call the planner's clause-matching function,
followed by calling get_partitions_for_keys() in partition.c to get the
selected partitions.

We should be careful when designing the interface of the exported function
to make sure it's not bound to the planner.  Your patch still maintains
the RelOptInfo in the signature of the clause-matching function, which the
executor pruning function won't have access to.

> After these changes also there will be one problem that the
> get_partitions_for_keys is directly fetching the "rightop->constvalue"
> whereas, for runtime pruning, we need to store rightop itself and
> calculate the value at runtime by param evaluation,  I haven't yet
> thought how can we make this last part generic.

I don't think any code introduced by the patch in partition.c itself looks
inside OpExpr (or any type of clause for that matter).  That is, I don't
see where get_partitions_for_keys() is looking at rightop->constvalue.
All it receives to work with are arrays of Datums and some other relevant
information like inclusivity, nullness, etc.

By the way, I'm now rebasing these patches on top of [1] and will try to
merge your refactoring patch in some appropriate way.  Will post more
tomorrow.

Thanks,
Amit

[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=9140cf826



--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
1234 ... 11
Previous Thread Next Thread