Multi-Column List Partitioning

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Multi-Column List Partitioning

Nitin Jadhav
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: Multi-Column List Partitioning

Jeevan Ladhe
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: Multi-Column List Partitioning

Nitin Jadhav
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: Multi-Column List Partitioning

Amit Langote
In reply to this post by Nitin Jadhav
Hello Nitin,

On Thu, May 6, 2021 at 11:03 PM Nitin Jadhav
<[hidden email]> wrote:
>
> Hi,
>
> While reviewing one of the 'Table partitioning' related patches, I found that Postgres does not support multiple column based LIST partitioning. Based on this understanding, I have started working on this feature. I also feel that 'Multi-Column List Partitioning' can be benefited to the Postgres users in future.

Yes, it would be nice to have this.  Thanks for picking this up.

> I am attaching the WIP patch for this feature here. It supports 'Multi-Column List Partitioning', however some tasks are still pending. I would like to know your thoughts about this, So that I can continue the work with improvising the current patch.
>
> Following things are handled in the patch.
> 1. Syntax
>
> CREATE TABLE table_name (attrs) PARTITION BY LIST(list_of_columns);
>
> Earlier there was no provision to mention multiple columns as part of the 'list_of_columns' clause. Now we can mention the list of columns separated by comma.
>
> CREATE TABLE table_name_p1 PARTITION OF table_name FOR VALUES IN list_of_values.
>
> Whereas list_of_columns can be
> a. (value [,...])
> b. (value [,...]) [,...]
>
> I would like to list a few examples here for better understanding.
> Ex-1:
> CREATE TABLE t1(a int) PARTITION BY LIST(a);
> CREATE TABLE t1_1 PARTITION OF t1 FOR VALUES IN (1, 2, 10, 5, 7);
>
> Ex-2:
> CREATE TABLE t2(a int, b int) PARTITION BY LIST(a,b);
> CREATE TABLE t2_1 PARTITION OF t2 FOR VALUES IN (1, 2), (1, 5), (2, 2),(2, 10);

Hmm, why not have parentheses around these lists, that is: (
(list_of_values) [, ...] )

So your example would look like this:

CREATE TABLE t2_1 PARTITION OF t2 FOR VALUES IN ((1, 2), (1, 5), (2,
2), (2, 10));

IMO, it is not such a bad syntax from a user's PoV.  It's not hard to
understand from this syntax that the partition constraint is something
like (a, b) = (1, 2) OR (a, b) = (1, 5) OR ..., where the = performs
row-wise comparison.

I will now take a look at the patch itself.

--
Amit Langote
EDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Multi-Column List Partitioning

Amit Langote
On Fri, May 21, 2021 at 1:02 PM Amit Langote <[hidden email]> wrote:
> I will now take a look at the patch itself.

Some quick observations:

* I get a lot of instances of the following 2 warnings when compiling
the patched code:

Warning #1:

partprune.c: In function ‘get_matching_list_bounds’:
partprune.c:2731:11: warning: passing argument 5 of
‘partition_list_bsearch’ makes pointer from integer without a cast
[enabled by default]
           nvalues, value, &is_equal);
           ^
In file included from partprune.c:53:0:
../../../src/include/partitioning/partbounds.h:117:12: note: expected
‘Datum *’ but argument is of type ‘Datum’
 extern int partition_list_bsearch(FmgrInfo *partsupfunc,

Warning #2:

partprune.c:2781:12: warning: incompatible integer to pointer
conversion passing 'Datum'
      (aka 'unsigned long') to parameter of type 'Datum *' (aka
'unsigned long *'); take the
      address with & [-Wint-conversion]

          value, &is_equal);

          ^~~~~

          &
../../../src/include/partitioning/partbounds.h:120:32: note: passing
argument to parameter 'value'
      here
  ...int nvalues, Datum *value, bool *is_equal);

* I think this code:

===
        /* Get the only column's name in case we need to output an error */
        if (key->partattrs[0] != 0)
            colname = get_attname(RelationGetRelid(parent),
                                  key->partattrs[0], false);
        else
            colname = deparse_expression((Node *) linitial(partexprs),

deparse_context_for(RelationGetRelationName(parent),

RelationGetRelid(parent)),
                                         false, false);
        /* Need its type data too */
        coltype = get_partition_col_typid(key, 0);
        coltypmod = get_partition_col_typmod(key, 0);
        partcollation = get_partition_col_collation(key, 0);
===

belongs in the new function transformPartitionListBounds that you
added, because without doing so, any errors having to do with
partitioning columns other than the first one will report the first
column's name in the error message:

postgres=# create table foo (a bool, b bool) partition by list (a, b);
CREATE TABLE

-- this is fine!
postgres=# create table foo_true_true partition of foo for values in (1, true);
ERROR:  specified value cannot be cast to type boolean for column "a"
LINE 1: ...able foo_true_true partition of foo for values in (1, true);

-- not this!
postgres=# create table foo_true_true partition of foo for values in (true, 1);
ERROR:  specified value cannot be cast to type boolean for column "a"
LINE 1: ...able foo_true_true partition of foo for values in (true, 1);

* The following prototype of transformPartitionListBounds() means that
all values in a given bound list are analyzed with the first
partitioning column's colname, type, typmod, etc., which is wrong:

+static List *
+transformPartitionListBounds(ParseState *pstate, PartitionBoundSpec *spec,
+                            char *colname, Oid coltype, int32 coltypmod,
+                            Oid partcollation, int partnatts)
+{

An example of wrong behavior because of that:

postgres=# create table foo (a bool, b text) partition by list (a, b);
CREATE TABLE
Time: 3.967 ms
postgres=# create table foo_true_true partition of foo for values in
(true, 'whatever');
ERROR:  invalid input syntax for type boolean: "whatever"
LINE 1: ...o_true_true partition of foo for values in (true, 'whatever'...

"whatever" should've been accepted but because it's checked with a's
type, it is wrongly flagged.

Please take a look at how transformPartitionRangeBound() handles this,
especially how it uses the correct partitioning column's info to
analyze the corresponding bound value expression.

I will continue looking next week.

--
Amit Langote
EDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Multi-Column List Partitioning

Nitin Jadhav
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: Multi-Column List Partitioning

Amit Langote
On Sun, May 23, 2021 at 6:49 PM Nitin Jadhav
<[hidden email]> wrote:
> > IMO, it is not such a bad syntax from a user's PoV.  It's not hard to
> > understand from this syntax that the partition constraint is something
> > like (a, b) = (1, 2) OR (a, b) = (1, 5) OR ..., where the = performs
> > row-wise comparison.
>
> Thanks for suggesting to use row-wise comparison.

Actually, I was just describing how the *users* may want to visualize
the partition constraint...

> I have few queries
> with respect to handling of NULL values.
>
> 1. What should be the partition constraint for the above case. AFAIK,
> row-wise comparison wont work with NULL values as shown in [1]. I mean
> two rows are considered equal if all their corresponding members are
> non-null and equal. The rows are unequal if any corresponding members
> are non-null and unequal. Otherwise the result of the row comparison
> is unknown (null). So we should generate different types of
> constraints for NULL values.
>
> Ex:
> CREATE TABLE t(a int, b int) PARTITION BY LIST(a,b);
> CREATE TABLE t_1 PARTITION OF t FOR VALUES IN (1, 1), (1, NULL),
> (NULL, 1), (NULL, NULL);
>
> As per my knowledge, we should consider creating partition constraints
> for the above example as given below.
>
> (a, b) = (1, 1) OR ((a = 1) AND (b IS NULL)) OR ((a IS NULL) AND (b =
> 1)) OR ((a is NULL) AND (b is NULL)).

Yeah, something like that should do the trick.

Again, I was not actually suggesting that you write code to implement
the constraint using something like RowCompareExpr, only that the
users might want to view the constraint as doing row-wise comparison
of the partitioning columns and the specified value lists.

> 2. In the current code we don't put the NULL value in the 'datums'
> field of 'PartitionBoundInfoData' structure [2]. Since there can be
> only one NULL value, we directly store the corresponding index value
> in the 'null_index' field. Now we have to handle multiple NULL values
> in case of Multi-Column List Partitioning. So the question is how to
> handle this scenario. Following are the 2 approaches to handle this.
>
> Approach-1:
> Add another field 'bool  **isnull' in [2] and mark the corresponding
> element to TRUE if it has NULL value and the corresponding location in
> 'datums' contains empty/No value. For example, If a partition bound is
> (1, NULL), then
>
> datums[0][0] = 1
> datums[0][1] = Not assigned any value
> isnull[0][0] = FALSE
> is null[0][1] = TRUE
>
> So now we have an entry in the 'datums'  field for a bound containing
> NULL value, so we should handle this in all the scenarios where we are
> manipulating 'datums' in order to support NULL values and avoid crash.
>
> Approach-2:
> Don't add the bound information to 'datums' field of [2] if any of the
> value is NULL. Store this information separately in the structures
> mentioned in [3] and process accordingly.
>
> I feel approach-1 is the better solution as this requires less code
> changes and easy to implement than approach-2. Kindly share your
> thoughts about the approaches and please share if you have any better
> solution than the above 2.

Approach 1 sounds better.  It sounds like approach 1 might help us
implement support for allowing NULLs in range partition bounds in the
future, if at all.  For now, it might be better to not allocate the
isnull array except for list partitioning.

I'll wait for you to post a new patch addressing at least the comments
in my earlier email.  Also, please make sure to run `make check`
successfully before posting the patch. :)

Thanks.

--
Amit Langote
EDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Multi-Column List Partitioning

Nitin Jadhav
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: Multi-Column List Partitioning

Amit Langote
Hi Nitin,

On Thu, Jun 3, 2021 at 11:45 PM Nitin Jadhav
<[hidden email]> wrote:
> > I'll wait for you to post a new patch addressing at least the comments
> > in my earlier email.  Also, please make sure to run `make check`
> > successfully before posting the patch. :)
>
> I have fixed all of the review comments given by you and Jeevan in the
> attached patch and also the attached patch contains more changes
> compared to the previous patch. Following are the implementation
> details.

Thanks for the updated version.

> 1. Regarding syntax, the existing syntax will work fine for the
> single-column list partitioning. However I have used the new syntax
> for the multi-column list partitioning as we discussed earlier. I have
> used a combination of 'AND' and 'OR' logic for the partition
> constraints as given in the below example.
>
> postgres@17503=#create table t(a int, b text) partition by list(a,b);
> CREATE TABLE
> postgres@17503=#create table t1 partition of t for values in ((1,'a'),
> (NULL,'b'));
> CREATE TABLE
> postgres@17503=#\d+ t
>                                       Partitioned table "public.t"
>  Column |  Type   | Collation | Nullable | Default | Storage  |
> Compression | Stats target | Description
> --------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
>  a      | integer |           |          |         | plain    |
>      |              |
>  b      | text    |           |          |         | extended |
>      |              |
> Partition key: LIST (a, b)
> Partitions: t1 FOR VALUES IN ((1, 'a'), (NULL, 'b'))
>
> postgres@17503=#\d+ t1
>                                             Table "public.t1"
>  Column |  Type   | Collation | Nullable | Default | Storage  |
> Compression | Stats target | Description
> --------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
>  a      | integer |           |          |         | plain    |
>      |              |
>  b      | text    |           |          |         | extended |
>      |              |
> Partition of: t FOR VALUES IN ((1, 'a'), (NULL, 'b'))
> Partition constraint: (((a = 1) AND (b = 'a'::text)) OR ((a IS NULL)
> AND (b = 'b'::text)))
> Access method: heap
The constraint expressions seem to come out correctly, though I
haven't checked your implementation closely yet.

> 2. In the existing code, NULL values were handled differently. It was
> not added to the 'datums' variable, rather used to store the partition
> index directly in the 'null_index' variable. Now there is a
> possibility of multiple NULL values, hence introducing  a new member
> 'isnulls' in the 'PartitionBoundInfoData' struct which indicates
> whether the corresponding element in the 'datums' is NULL. Now
> 'null_index' cannot be used directly to store the partition index, so
> removed it and made the necessary changes in multiple places.
>
> 3. I have added test cases for 'create table' and 'insert' statements
> related to multi-column list partitioning and these are working fine
> with 'make check'.
>
> 4. Handled the partition pruning code to accommodate these changes for
> single-column list partitioning. However it is pending for
> multi-column list partitioning.
>
> 5. I have done necessary changes in partition wise join related code
> to accommodate for single-column list partitioning. However it is
> pending for multi-column list partitioning.
>
> Kindly review the patch and let me know if any changes are required.
The new list bound binary search and related comparison support
function look a bit too verbose to me.  I was expecting
partition_list_bsearch() to look very much like
partition_range_datum_bsearch(), but that is not the case.  The
special case code that you wrote in partition_list_bsearch() seems
unnecessary, at least in that function.  I'm talking about the code
fragment starting with this comment:

          /*
           * Once we find the matching for the first column but if it does not
           * match for the any of the other columns, then the binary search
           * will not work in all the cases. We should traverse just below
           * and above the mid index until we find the match or we reach the
           * first mismatch.
           */

I guess you're perhaps trying to address the case where the caller
does not specify the values for all of the partition key columns,
which can happen when the partition pruning code needs to handle a set
of clauses matching only some of the partition key columns.  But
that's a concern of the partition pruning code and so the special case
should be handled there (if at all), not in the binary search function
that is shared with other callers.  Regarding that, I'm wondering if
we should require clauses matching all of the partition key columns to
be found for the pruning code to call the binary search, so do
something like get_matching_hash_bounds() does:

    /*
     * For hash partitioning we can only perform pruning based on equality
     * clauses to the partition key or IS NULL clauses.  We also can only
     * prune if we got values for all keys.
     */
    if (nvalues + bms_num_members(nullkeys) == partnatts)
    {
        /* code to compute matching hash bound offset */
    }
    else
    {
        /* Report all valid offsets into the boundinfo->indexes array. */
        result->bound_offsets = bms_add_range(NULL, 0,
                                              boundinfo->nindexes - 1);
    }

Do you think that trying to match list partitions even with fewer keys
is worth the complexity of the implementation?  That is, is the use
case to search for only a subset of partition key columns common
enough with list partitioning?

If we do decide to implement the special case, remember that to do
that efficiently, we'd need to require that the subset of matched key
columns constitutes a prefix, because of the way the datums are
sorted.  That is, match all partitions when the query only contains a
clause for b when the partition key is (a, b, c), but engage the
special case of pruning if the query contains clauses for a, or for a
and b.

I will look at other parts of the patch next week hopefully.   For
now, attached is a delta patch that applies on top of your v1, which
does:

* Simplify partition_list_bsearch() and partition_lbound_datum_cmp()
* Make qsort_partition_list_value_cmp simply call
partition_lbound_datum_cmp() instead of having its own logic to
compare input bounds
* Move partition_lbound_datum_cmp() into partbounds.c as a static
function (export seems unnecessary)
* Add a comment for PartitionBoundInfo.isnulls and remove that for null_index

--
Amit Langote
EDB: http://www.enterprisedb.com

v1_delta_amit.patch (15K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Multi-Column List Partitioning

Amit Langote
On Fri, Jun 11, 2021 at 12:37 PM Amit Langote <[hidden email]> wrote:

> I will look at other parts of the patch next week hopefully.   For
> now, attached is a delta patch that applies on top of your v1, which
> does:
>
> * Simplify partition_list_bsearch() and partition_lbound_datum_cmp()
> * Make qsort_partition_list_value_cmp simply call
> partition_lbound_datum_cmp() instead of having its own logic to
> compare input bounds
> * Move partition_lbound_datum_cmp() into partbounds.c as a static
> function (export seems unnecessary)
> * Add a comment for PartitionBoundInfo.isnulls and remove that for null_index

One more:

* Add all columns of newly added test query in insert.sql to the order
by clause to get predictably ordered output

--
Amit Langote
EDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Multi-Column List Partitioning

Zhihong Yu
In reply to this post by Amit Langote
CONTENTS DELETED
The author has deleted this message.