inconsistent results querying table partitioned by date

classic Classic list List threaded Threaded
41 messages Options
123
Reply | Threaded
Open this post in threaded view
|

inconsistent results querying table partitioned by date

Alan Jackson
Hi

Im having a problem with querying a table partitioned by date.

Depending on the sequence of operations on a date parameter used in a where clause, I either get the expected results, or no results.

This suggests a bug in the handling of date parameters and partition range handling.

I’ve replicated this down to a single sequence of create table, insert data, query.

This issue occurs for me on postgresql 11.2 on a mac, installed via brew.

In this case the table is partitioned by an id and then by the date, if it is partitioned by only the date everything works as expected.

However, I am attempting to add partitioning to a fairly large sofware-as-a-service platform, so making changes to the table definitions or global code changes is not really practical.

The sql in question is below.

I hope there is something simple I can change in the partition definitions to work around this.

Many Thanks,
Alan Jackson
Data Architect
TVSquared


--SQL STARTS HERE

--drop table dataid;
CREATE TABLE dataid
(
id integer not null,
datadatetime timestamp without time zone NOT NULL,
CONSTRAINT dataid_pkey PRIMARY KEY (id, datadatetime)
) PARTITION BY RANGE (id, datadatetime)
;

CREATE TABLE dataid_201902 PARTITION OF dataid FOR VALUES FROM (1, '2019-02-01 00:00:00') TO (1, '2019-03-01 00:00:00');

CREATE TABLE dataid_default PARTITION OF dataid DEFAULT;

insert into dataid values (1,'2019-02-24T00:00:00');

--- returns 1 row as expected
select * from dataid where id=1 and datadatetime <  (('2019-02-26T00:00:00'::timestamp::timestamp at time zone 'America/New_York' ) at time zone 'UTC' + '2 days'::interval);

--- returns no rows
select * from dataid where id=1 and datadatetime <  (('2019-02-26T00:00:00'::timestamp::timestamp at time zone 'America/New_York' + '2 days'::interval) at time zone 'UTC');

-- both date expressions evaluate to the same date.
select
(('2019-02-26T00:00:00'::timestamp::timestamp at time zone 'America/New_York' ) at time zone 'UTC' + '2 days'::interval) as workingdate,
(('2019-02-26T00:00:00'::timestamp::timestamp at time zone 'America/New_York' + '2 days'::interval) at time zone 'UTC') as notworkingdate;

--SQL ENDS HERE





--
TV Squared Limited is a company registered in Scotland.  Registered number:
SC421072.  Registered office: CodeBase, Argyle House, 3 Lady Lawson Street,
Edinburgh, EH3 9DR.
 
TV Squared Inc (File No. 5600204) is an Incorporated
company registered in Delaware. Principal office: 1412 Broadway, 22 Fl, New
York, New York, 10018

TV Squared GmbH is a company registered in Munich.
Registered number: HRB 236077. Registered office: Oskar-von-Miller-Ring 20,
c/o wework, 80333 Munchen

This message is private and confidential.  If
you have received this message in error, please notify us and remove it
from your system.


Reply | Threaded
Open this post in threaded view
|

Re: inconsistent results querying table partitioned by date

Tom Lane-2
Alan Jackson <[hidden email]> writes:
> Im having a problem with querying a table partitioned by date.
> Depending on the sequence of operations on a date parameter used in a where clause, I either get the expected results, or no results.

Yeah, this is pretty clearly broken.  It looks to me like the partition
pruning code is making insupportable assumptions about a comparison to
a stable expression.  Using your example table:

regression=# explain select * from dataid where id=1 and datadatetime < localtimestamp;                                      
                                    QUERY PLAN                                    
----------------------------------------------------------------------------------
 Bitmap Heap Scan on dataid_default  (cost=4.19..11.31 rows=3 width=12)
   Recheck Cond: ((id = 1) AND (datadatetime < LOCALTIMESTAMP))
   ->  Bitmap Index Scan on dataid_default_pkey  (cost=0.00..4.19 rows=3 width=0)
         Index Cond: ((id = 1) AND (datadatetime < LOCALTIMESTAMP))
(4 rows)

It should absolutely not have pruned away the dataid_201902 partition,
but it did.  It's okay with an immutable expression:

regression=# explain select * from dataid where id=1 and datadatetime < '2019-05-09'::timestamp;
                                                  QUERY PLAN                                                  
--------------------------------------------------------------------------------------------------------------
 Append  (cost=4.18..22.63 rows=6 width=12)
   ->  Bitmap Heap Scan on dataid_201902  (cost=4.18..11.30 rows=3 width=12)
         Recheck Cond: ((id = 1) AND (datadatetime < '2019-05-09 00:00:00'::timestamp without time zone))
         ->  Bitmap Index Scan on dataid_201902_pkey  (cost=0.00..4.18 rows=3 width=0)
               Index Cond: ((id = 1) AND (datadatetime < '2019-05-09 00:00:00'::timestamp without time zone))
   ->  Bitmap Heap Scan on dataid_default  (cost=4.18..11.30 rows=3 width=12)
         Recheck Cond: ((id = 1) AND (datadatetime < '2019-05-09 00:00:00'::timestamp without time zone))
         ->  Bitmap Index Scan on dataid_default_pkey  (cost=0.00..4.18 rows=3 width=0)
               Index Cond: ((id = 1) AND (datadatetime < '2019-05-09 00:00:00'::timestamp without time zone))
(9 rows)

or a volatile one:

regression=# explain select * from dataid where id=1 and datadatetime < '2019-05-09'::timestamp + random()*'1 day'::interval;
                                                       QUERY PLAN                                                      
------------------------------------------------------------------------------------------------------------------------
 Append  (cost=4.23..29.80 rows=6 width=12)
   ->  Bitmap Heap Scan on dataid_201902  (cost=4.23..14.88 rows=3 width=12)
         Recheck Cond: (id = 1)
         Filter: (datadatetime < ('2019-05-09 00:00:00'::timestamp without time zone + (random() * '1 day'::interval)))
         ->  Bitmap Index Scan on dataid_201902_pkey  (cost=0.00..4.23 rows=10 width=0)
               Index Cond: (id = 1)
   ->  Bitmap Heap Scan on dataid_default  (cost=4.23..14.88 rows=3 width=12)
         Recheck Cond: (id = 1)
         Filter: (datadatetime < ('2019-05-09 00:00:00'::timestamp without time zone + (random() * '1 day'::interval)))
         ->  Bitmap Index Scan on dataid_default_pkey  (cost=0.00..4.23 rows=10 width=0)
               Index Cond: (id = 1)
(11 rows)

but somebody's confused about what can be done with stable expressions.

While I'm on about it, this behavior is also insupportable:

regression=# explain select * from dataid where id=1 and datadatetime < '2018-05-09'::timestamptz ;
                                               QUERY PLAN                                              
--------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on dataid_default  (cost=4.18..11.30 rows=3 width=12)
   Recheck Cond: ((id = 1) AND (datadatetime < '2018-05-09 00:00:00-04'::timestamp with time zone))
   ->  Bitmap Index Scan on dataid_default_pkey  (cost=0.00..4.18 rows=3 width=0)
         Index Cond: ((id = 1) AND (datadatetime < '2018-05-09 00:00:00-04'::timestamp with time zone))
(4 rows)

because timestamp-against-timestamptz comparison is inherently only
stable; the pruning code is way exceeding its authority by supposing
that a comparison that holds at plan time will hold at runtime,
even with a constant comparison value.

The reason for the difference in your results is that one expression
is immutable and the other is only stable:

regression=# explain verbose select
(('2019-02-26T00:00:00'::timestamp::timestamp at time zone 'America/New_York' ) at time zone 'UTC' + '2 days'::interval) as workingdate,
(('2019-02-26T00:00:00'::timestamp::timestamp at time zone 'America/New_York' + '2 days'::interval) at time zone 'UTC') as notworkingdate;
                                                                           QUERY PLAN                                                                          
----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=0.00..0.01 rows=1 width=16)
   Output: '2019-02-28 05:00:00'::timestamp without time zone, timezone('UTC'::text, ('2019-02-26 00:00:00-05'::timestamp with time zone + '2 days'::interval))
(2 rows)

the reason being that timestamptz + interval depends on the timezone
setting (for some intervals) but timestamp + interval never does.

Seems to be equally broken in v11 and HEAD.  I didn't try v10.

> I hope there is something simple I can change in the partition definitions to work around this.

Until we fix the bug, I think the best you can do is not use stable
expressions in this context.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: inconsistent results querying table partitioned by date

Amit Langote-2
On 2019/05/10 8:22, Tom Lane wrote:
> Alan Jackson <[hidden email]> writes:
>> Im having a problem with querying a table partitioned by date.
>> Depending on the sequence of operations on a date parameter used in a where clause, I either get the expected results, or no results.
>
> Yeah, this is pretty clearly broken.

Indeed it is.

> It looks to me like the partition
> pruning code is making insupportable assumptions about a comparison to
> a stable expression.  Using your example table:
>
> regression=# explain select * from dataid where id=1 and datadatetime < localtimestamp;                                      
>                                     QUERY PLAN                                    
> ----------------------------------------------------------------------------------
>  Bitmap Heap Scan on dataid_default  (cost=4.19..11.31 rows=3 width=12)
>    Recheck Cond: ((id = 1) AND (datadatetime < LOCALTIMESTAMP))
>    ->  Bitmap Index Scan on dataid_default_pkey  (cost=0.00..4.19 rows=3 width=0)
>          Index Cond: ((id = 1) AND (datadatetime < LOCALTIMESTAMP))
> (4 rows)
>
> It should absolutely not have pruned away the dataid_201902 partition,
> but it did.
>
> While I'm on about it, this behavior is also insupportable:
>
> regression=# explain select * from dataid where id=1 and datadatetime < '2018-05-09'::timestamptz ;
(likely a typo in the condition: s/2018/2019)

>                                                QUERY PLAN                                              
> --------------------------------------------------------------------------------------------------------
>  Bitmap Heap Scan on dataid_default  (cost=4.18..11.30 rows=3 width=12)
>    Recheck Cond: ((id = 1) AND (datadatetime < '2018-05-09 00:00:00-04'::timestamp with time zone))
>    ->  Bitmap Index Scan on dataid_default_pkey  (cost=0.00..4.18 rows=3 width=0)
>          Index Cond: ((id = 1) AND (datadatetime < '2018-05-09 00:00:00-04'::timestamp with time zone))
> (4 rows)
>
> because timestamp-against-timestamptz comparison is inherently only
> stable; the pruning code is way exceeding its authority by supposing
> that a comparison that holds at plan time will hold at runtime,
> even with a constant comparison value.
I looked into it and the problem is not really that plan-time pruning is
comparing stable expressions against partition bounds.  If it had, it
wouldn't have pruned dataid_201902 anyway, because its bounding range for
datadatetime is '2019-02-01' to '2019-03-01', which is clearly less than
'2019-05-09' (Tom's localtimestamp).

The real problem seems to be with the way range partition pruning assumes
an operator strategy to perform pruning with.  Quoted query's WHERE clause
looks something like this: (firstkey = CONSTANT AND secondkey <
STABLE_EXPRESSION).  From this set of clauses, a (CONSTANT,
STABLE_EXPRESSION) tuple is formed to be compared against partition bounds
using row-comparison-like semantics.  As things stand today, the actual
comparison function (partprune.c: get_matching_range_bounds()) receives
only the strategy of the last expression, which in this case is that of a
LESS operator.  When the tuple (CONSTANT, STABLE_EXPRESSION) passes
through the last step to extract Datum values to be passed to
get_matching_range_bounds, it's correctly determined that
STABLE_EXPRESSION cannot be computed during planning and so the tuple is
truncated to just (CONSTANT-Datum), but the strategy to assume during
pruning is still that of the LESS operator, whereas now it should really
be EQUAL.  With LESS semantics, get_matching_range_bounds() concludes that
no partition bounds are smaller than 1 (extracted from id=1 in the above
query), except the default partition, so it prunes 'dataid_201902'.

I've attached a patch to fix that.  Actually, I've attached two patches --
the 1st one adds a test for the misbehaving case with *wrong* output
wherein a partition is incorrectly pruned, and the 2nd actually fixes the
bug and updates the output of the test added by the 1st patch.  Divided
the patch this way just to show the bug clearly.

> Seems to be equally broken in v11 and HEAD.  I didn't try v10.

v10 is fine, as it uses constraint exclusion.

Attached patches apply to both v11 and HEAD.

Thanks,
Amit

0001-Add-test.patch (3K) Download Attachment
0002-Bug-fix.patch (3K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: inconsistent results querying table partitioned by date

Kyotaro HORIGUCHI-2
At Fri, 10 May 2019 14:37:34 +0900, Amit Langote <[hidden email]> wrote in <[hidden email]>

> > because timestamp-against-timestamptz comparison is inherently only
> > stable; the pruning code is way exceeding its authority by supposing
> > that a comparison that holds at plan time will hold at runtime,
> > even with a constant comparison value.
>
> I looked into it and the problem is not really that plan-time pruning is
> comparing stable expressions against partition bounds.  If it had, it
> wouldn't have pruned dataid_201902 anyway, because its bounding range for
> datadatetime is '2019-02-01' to '2019-03-01', which is clearly less than
> '2019-05-09' (Tom's localtimestamp).
>
> The real problem seems to be with the way range partition pruning assumes
> an operator strategy to perform pruning with.  Quoted query's WHERE clause
> looks something like this: (firstkey = CONSTANT AND secondkey <
> STABLE_EXPRESSION).  From this set of clauses, a (CONSTANT,
> STABLE_EXPRESSION) tuple is formed to be compared against partition bounds
> using row-comparison-like semantics.  As things stand today, the actual
> comparison function (partprune.c: get_matching_range_bounds()) receives
> only the strategy of the last expression, which in this case is that of a
> LESS operator.  When the tuple (CONSTANT, STABLE_EXPRESSION) passes
> through the last step to extract Datum values to be passed to
> get_matching_range_bounds, it's correctly determined that
> STABLE_EXPRESSION cannot be computed during planning and so the tuple is
> truncated to just (CONSTANT-Datum), but the strategy to assume during
> pruning is still that of the LESS operator, whereas now it should really
> be EQUAL.  With LESS semantics, get_matching_range_bounds() concludes that
> no partition bounds are smaller than 1 (extracted from id=1 in the above
> query), except the default partition, so it prunes 'dataid_201902'.

I concluded the same.

> I've attached a patch to fix that.  Actually, I've attached two patches --
> the 1st one adds a test for the misbehaving case with *wrong* output
> wherein a partition is incorrectly pruned, and the 2nd actually fixes the
> bug and updates the output of the test added by the 1st patch.  Divided
> the patch this way just to show the bug clearly.

But this seems a bit wrong.

If the two partition keys were in reverse order, pruning still
fails.

CREATE TABLE dataid2 (
       datadatetime timestamp without time zone NOT NULL,
       id integer not null,
       CONSTRAINT dataid2_pkey PRIMARY KEY (datadatetime, id)
) PARTITION BY RANGE (datadatetime, id);

CREATE TABLE dataid2_201902 PARTITION OF dataid2 FOR VALUES FROM ('2019-02-01 00:00:00', 1) TO ('2019-03-01 00:00:00', 1);

CREATE TABLE dataid2_default PARTITION OF dataid2 DEFAULT;

insert into dataid2 values ('2019-02-24T00:00:00', 1);

select * from dataid2 where id = 1 and datadatetime <  (('2019-02-26T00:00:00'::timestamp::timestamp at time zone 'America/New_York' + '2 days'::interval) at time zone 'UTC');
 datadatetime | id
--------------+----
(0 rows)

This is wrong.

The condition is divided into two part (id = 1) and (datadatetime
< ..) and the latter reduces to nothing and the former remains
unchanged. Pruning continues using id = 1 and (I suppose) but
that is not partition_range_datum_bsearch()'s assumption. As the
result all partitions (other than default) are gone.

In passing I found a typo while looking this issue.

|   case BTLessStrategyNumber:
|
|     /*
|      * Look for the greatest bound that is < or <= lookup value and
|      * set minoff to its offset.

I think the "minoff" is typo of "maxoff".


> > Seems to be equally broken in v11 and HEAD.  I didn't try v10.
>
> v10 is fine, as it uses constraint exclusion.
>
> Attached patches apply to both v11 and HEAD.

Mmm. This doesn't apply on head on my environment.

> patching file src/test/regress/expected/partition_prune.out
> Hunk #1 FAILED at 951.

git rev-parse --short HEAD
d0bbf871ca

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center



Reply | Threaded
Open this post in threaded view
|

Re: inconsistent results querying table partitioned by date

Amit Langote-2
Horiguchi-san,

Thanks for checking.

On 2019/05/10 15:33, Kyotaro HORIGUCHI wrote:

> At Fri, 10 May 2019 14:37:34 +0900, Amit Langote <[hidden email]> wrote:
>> I've attached a patch to fix that.  Actually, I've attached two patches --
>> the 1st one adds a test for the misbehaving case with *wrong* output
>> wherein a partition is incorrectly pruned, and the 2nd actually fixes the
>> bug and updates the output of the test added by the 1st patch.  Divided
>> the patch this way just to show the bug clearly.
>
> But this seems a bit wrong.
>
> If the two partition keys were in reverse order, pruning still
> fails.
>
> CREATE TABLE dataid2 (
>        datadatetime timestamp without time zone NOT NULL,
>        id integer not null,
>        CONSTRAINT dataid2_pkey PRIMARY KEY (datadatetime, id)
> ) PARTITION BY RANGE (datadatetime, id);
>
> CREATE TABLE dataid2_201902 PARTITION OF dataid2 FOR VALUES FROM ('2019-02-01 00:00:00', 1) TO ('2019-03-01 00:00:00', 1);
>
> CREATE TABLE dataid2_default PARTITION OF dataid2 DEFAULT;
>
> insert into dataid2 values ('2019-02-24T00:00:00', 1);
>
> select * from dataid2 where id = 1 and datadatetime <  (('2019-02-26T00:00:00'::timestamp::timestamp at time zone 'America/New_York' + '2 days'::interval) at time zone 'UTC');
>  datadatetime | id
> --------------+----
> (0 rows)
>
> This is wrong.
Ah, indeed.

> The condition is divided into two part (id = 1) and (datadatetime
> < ..) and the latter reduces to nothing and the former remains
> unchanged. Pruning continues using id = 1 and (I suppose) but
> that is not partition_range_datum_bsearch()'s assumption. As the
> result all partitions (other than default) are gone.

I'd have expected this to result in no values being passed to
get_matching_range_bounds (nvalues = 0), because the value of the
expression compared against the 1st key is unavailable at planning time,
meaning the values for subsequent keys should be ignored.  That would have
meant there's nothing to prune with and hence no pruning should have
occurred.  The problem however does not seem to be with the new logic I
proposed but what turned out to be another bug in
gen_prune_steps_from_opexps().

Given that datadatetime is the first key in your example table and it's
being compared using a non-inclusive operator, clauses for subsequent key
columns (in this case id = 1) should have been ignored by pruning, which
fails to happen due to a bug in gen_prune_steps_from_opexps().  I've fixed
that in the attached updated patch.  With the patch, queries like yours
return the correct result as shown below:

truncate dataid2;

insert into dataid2 (datadatetime, id) select
(('2019-02-22T00:00:00'::timestamp::timestamp at time zone
'America/New_York' + '2 days'::interval) at time zone 'UTC'), 1;

insert into dataid2 (datadatetime, id) select
(('2019-02-22T00:00:00'::timestamp::timestamp at time zone
'America/New_York' + '1 days'::interval) at time zone 'UTC'), 1;

-- pruning steps generated for only 1st column (key1 < expr1)
-- but no pruning occurs during planning because the value for
-- datadatetime's clause is unavailable

explain select * from dataid2 where id = 1 and datadatetime <
(('2019-02-22T00:00:00'::timestamp::timestamp at time zone
'America/New_York' + '2 days'::interval) at time zone 'UTC');

QUERY PLAN

────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 Append  (cost=10.96..36.21 rows=6 width=12)
   ->  Bitmap Heap Scan on dataid2_201902  (cost=10.96..18.09 rows=3 width=12)
         Recheck Cond: ((datadatetime < timezone('UTC'::text, ('2019-02-22
14:00:00+09'::timestamp with time zone + '2 days'::interval))) AND (id = 1))
         ->  Bitmap Index Scan on dataid2_201902_pkey  (cost=0.00..10.96
rows=3 width=0)
               Index Cond: ((datadatetime < timezone('UTC'::text,
('2019-02-22 14:00:00+09'::timestamp with time zone + '2
days'::interval))) AND (id = 1))
   ->  Bitmap Heap Scan on dataid2_default  (cost=10.96..18.09 rows=3
width=12)
         Recheck Cond: ((datadatetime < timezone('UTC'::text, ('2019-02-22
14:00:00+09'::timestamp with time zone + '2 days'::interval))) AND (id = 1))
         ->  Bitmap Index Scan on dataid2_default_pkey  (cost=0.00..10.96
rows=3 width=0)
               Index Cond: ((datadatetime < timezone('UTC'::text,
('2019-02-22 14:00:00+09'::timestamp with time zone + '2
days'::interval))) AND (id = 1))
(9 rows)

select * from dataid2 where id = 1 and datadatetime <
(('2019-02-22T00:00:00'::timestamp::timestamp at time zone
'America/New_York' + '2 days'::interval) at time zone 'UTC');
    datadatetime     │ id
─────────────────────┼────
 2019-02-23 05:00:00 │  1
(1 row)

-- pruning steps generated for both columns (key1 = expr1, key2 = expr2),
-- but no pruning occurs during planning because the value for
-- datadatetime's clause is unavailable

explain select * from dataid2 where id = 1 and datadatetime =
(('2019-02-22T00:00:00'::timestamp::timestamp at time zone
'America/New_York' + '2 days'::interval) at time zone 'UTC');

QUERY PLAN

──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 Append  (cost=0.16..16.37 rows=2 width=12)
   Subplans Removed: 1
   ->  Index Only Scan using dataid2_201902_pkey on dataid2_201902
(cost=0.16..8.18 rows=1 width=12)
         Index Cond: ((datadatetime = timezone('UTC'::text, ('2019-02-22
14:00:00+09'::timestamp with time zone + '2 days'::interval))) AND (id = 1))
(4 rows)


select * from dataid2 where id = 1 and datadatetime =
(('2019-02-22T00:00:00'::timestamp::timestamp at time zone
'America/New_York' + '2 days'::interval) at time zone 'UTC');
    datadatetime     │ id
─────────────────────┼────
 2019-02-24 05:00:00 │  1
(1 row)

> In passing I found a typo while looking this issue.
>
> |   case BTLessStrategyNumber:
> |
> |     /*
> |      * Look for the greatest bound that is < or <= lookup value and
> |      * set minoff to its offset.
>
> I think the "minoff" is typo of "maxoff".

Ah, fixed in the updated patch.

>>> Seems to be equally broken in v11 and HEAD.  I didn't try v10.
>>
>> v10 is fine, as it uses constraint exclusion.
>>
>> Attached patches apply to both v11 and HEAD.
>
> Mmm. This doesn't apply on head on my environment.
>
>> patching file src/test/regress/expected/partition_prune.out
>> Hunk #1 FAILED at 951.
>
> git rev-parse --short HEAD
> d0bbf871ca
Hmm, I rebased the patch's branch over master and didn't get any
conflicts.  Please check with the new patch.

Thanks,
Amit

v2-0001-Add-test.patch (4K) Download Attachment
v2-0002-Bug-fix.patch (9K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: inconsistent results querying table partitioned by date

Alan Jackson
Hi

Many thanks for the quick and patch-filled response to this issue. It’s great to see such an able and active response!

I’m glad my bug report highlighted the problem sufficiently clearly.

Is this likely to turn into an official patch, and if so, where can I track which release etc it will be included in?

Thanks again,
Alan Jackson
Data Architect
TVSquared



> On 10 May 2019, at 10:18, Amit Langote <[hidden email]> wrote:
>
> Horiguchi-san,
>
> Thanks for checking.
>
>> On 2019/05/10 15:33, Kyotaro HORIGUCHI wrote:
>> At Fri, 10 May 2019 14:37:34 +0900, Amit Langote <[hidden email]> wrote:
>>> I've attached a patch to fix that.  Actually, I've attached two patches --
>>> the 1st one adds a test for the misbehaving case with *wrong* output
>>> wherein a partition is incorrectly pruned, and the 2nd actually fixes the
>>> bug and updates the output of the test added by the 1st patch.  Divided
>>> the patch this way just to show the bug clearly.
>>
>> But this seems a bit wrong.
>>
>> If the two partition keys were in reverse order, pruning still
>> fails.
>>
>> CREATE TABLE dataid2 (
>>       datadatetime timestamp without time zone NOT NULL,
>>       id integer not null,
>>       CONSTRAINT dataid2_pkey PRIMARY KEY (datadatetime, id)
>> ) PARTITION BY RANGE (datadatetime, id);
>>
>> CREATE TABLE dataid2_201902 PARTITION OF dataid2 FOR VALUES FROM ('2019-02-01 00:00:00', 1) TO ('2019-03-01 00:00:00', 1);
>>
>> CREATE TABLE dataid2_default PARTITION OF dataid2 DEFAULT;
>>
>> insert into dataid2 values ('2019-02-24T00:00:00', 1);
>>
>> select * from dataid2 where id = 1 and datadatetime <  (('2019-02-26T00:00:00'::timestamp::timestamp at time zone 'America/New_York' + '2 days'::interval) at time zone 'UTC');
>> datadatetime | id
>> --------------+----
>> (0 rows)
>>
>> This is wrong.
>
> Ah, indeed.
>
>> The condition is divided into two part (id = 1) and (datadatetime
>> < ..) and the latter reduces to nothing and the former remains
>> unchanged. Pruning continues using id = 1 and (I suppose) but
>> that is not partition_range_datum_bsearch()'s assumption. As the
>> result all partitions (other than default) are gone.
>
> I'd have expected this to result in no values being passed to
> get_matching_range_bounds (nvalues = 0), because the value of the
> expression compared against the 1st key is unavailable at planning time,
> meaning the values for subsequent keys should be ignored.  That would have
> meant there's nothing to prune with and hence no pruning should have
> occurred.  The problem however does not seem to be with the new logic I
> proposed but what turned out to be another bug in
> gen_prune_steps_from_opexps().
>
> Given that datadatetime is the first key in your example table and it's
> being compared using a non-inclusive operator, clauses for subsequent key
> columns (in this case id = 1) should have been ignored by pruning, which
> fails to happen due to a bug in gen_prune_steps_from_opexps().  I've fixed
> that in the attached updated patch.  With the patch, queries like yours
> return the correct result as shown below:
>
> truncate dataid2;
>
> insert into dataid2 (datadatetime, id) select
> (('2019-02-22T00:00:00'::timestamp::timestamp at time zone
> 'America/New_York' + '2 days'::interval) at time zone 'UTC'), 1;
>
> insert into dataid2 (datadatetime, id) select
> (('2019-02-22T00:00:00'::timestamp::timestamp at time zone
> 'America/New_York' + '1 days'::interval) at time zone 'UTC'), 1;
>
> -- pruning steps generated for only 1st column (key1 < expr1)
> -- but no pruning occurs during planning because the value for
> -- datadatetime's clause is unavailable
>
> explain select * from dataid2 where id = 1 and datadatetime <
> (('2019-02-22T00:00:00'::timestamp::timestamp at time zone
> 'America/New_York' + '2 days'::interval) at time zone 'UTC');
>
> QUERY PLAN
>
> ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
> Append  (cost=10.96..36.21 rows=6 width=12)
>   ->  Bitmap Heap Scan on dataid2_201902  (cost=10.96..18.09 rows=3 width=12)
>         Recheck Cond: ((datadatetime < timezone('UTC'::text, ('2019-02-22
> 14:00:00+09'::timestamp with time zone + '2 days'::interval))) AND (id = 1))
>         ->  Bitmap Index Scan on dataid2_201902_pkey  (cost=0.00..10.96
> rows=3 width=0)
>               Index Cond: ((datadatetime < timezone('UTC'::text,
> ('2019-02-22 14:00:00+09'::timestamp with time zone + '2
> days'::interval))) AND (id = 1))
>   ->  Bitmap Heap Scan on dataid2_default  (cost=10.96..18.09 rows=3
> width=12)
>         Recheck Cond: ((datadatetime < timezone('UTC'::text, ('2019-02-22
> 14:00:00+09'::timestamp with time zone + '2 days'::interval))) AND (id = 1))
>         ->  Bitmap Index Scan on dataid2_default_pkey  (cost=0.00..10.96
> rows=3 width=0)
>               Index Cond: ((datadatetime < timezone('UTC'::text,
> ('2019-02-22 14:00:00+09'::timestamp with time zone + '2
> days'::interval))) AND (id = 1))
> (9 rows)
>
> select * from dataid2 where id = 1 and datadatetime <
> (('2019-02-22T00:00:00'::timestamp::timestamp at time zone
> 'America/New_York' + '2 days'::interval) at time zone 'UTC');
>    datadatetime     │ id
> ─────────────────────┼────
> 2019-02-23 05:00:00 │  1
> (1 row)
>
> -- pruning steps generated for both columns (key1 = expr1, key2 = expr2),
> -- but no pruning occurs during planning because the value for
> -- datadatetime's clause is unavailable
>
> explain select * from dataid2 where id = 1 and datadatetime =
> (('2019-02-22T00:00:00'::timestamp::timestamp at time zone
> 'America/New_York' + '2 days'::interval) at time zone 'UTC');
>
> QUERY PLAN
>
> ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
> Append  (cost=0.16..16.37 rows=2 width=12)
>   Subplans Removed: 1
>   ->  Index Only Scan using dataid2_201902_pkey on dataid2_201902
> (cost=0.16..8.18 rows=1 width=12)
>         Index Cond: ((datadatetime = timezone('UTC'::text, ('2019-02-22
> 14:00:00+09'::timestamp with time zone + '2 days'::interval))) AND (id = 1))
> (4 rows)
>
>
> select * from dataid2 where id = 1 and datadatetime =
> (('2019-02-22T00:00:00'::timestamp::timestamp at time zone
> 'America/New_York' + '2 days'::interval) at time zone 'UTC');
>    datadatetime     │ id
> ─────────────────────┼────
> 2019-02-24 05:00:00 │  1
> (1 row)
>
>> In passing I found a typo while looking this issue.
>>
>> |   case BTLessStrategyNumber:
>> |
>> |     /*
>> |      * Look for the greatest bound that is < or <= lookup value and
>> |      * set minoff to its offset.
>>
>> I think the "minoff" is typo of "maxoff".
>
> Ah, fixed in the updated patch.
>
>>>> Seems to be equally broken in v11 and HEAD.  I didn't try v10.
>>>
>>> v10 is fine, as it uses constraint exclusion.
>>>
>>> Attached patches apply to both v11 and HEAD.
>>
>> Mmm. This doesn't apply on head on my environment.
>>
>>> patching file src/test/regress/expected/partition_prune.out
>>> Hunk #1 FAILED at 951.
>>
>> git rev-parse --short HEAD
>> d0bbf871ca
>
> Hmm, I rebased the patch's branch over master and didn't get any
> conflicts.  Please check with the new patch.
>
> Thanks,
> Amit
> <v2-0001-Add-test.patch>
> <v2-0002-Bug-fix.patch>

--
TV Squared Limited is a company registered in Scotland.  Registered number:
SC421072.  Registered office: CodeBase, Argyle House, 3 Lady Lawson Street,
Edinburgh, EH3 9DR.
 
TV Squared Inc (File No. 5600204) is an Incorporated
company registered in Delaware. Principal office: 1412 Broadway, 22 Fl, New
York, New York, 10018

TV Squared GmbH is a company registered in Munich.
Registered number: HRB 236077. Registered office: Oskar-von-Miller-Ring 20,
c/o wework, 80333 Munchen

This message is private and confidential.  If
you have received this message in error, please notify us and remove it
from your system.


Reply | Threaded
Open this post in threaded view
|

Re: inconsistent results querying table partitioned by date

Tom Lane-2
Alan Jackson <[hidden email]> writes:
> Is this likely to turn into an official patch, and if so, where can I track which release etc it will be included in?

We're definitely going to do *something*, though whether the current
patches are the last word is too early to say.

Unfortunately, we just missed the regular quarterly minor releases [1].
Unless a bug turns up that's sufficiently urgent to justify an
off-schedule release (no, this one isn't), the next patch release
for v11 will be 11.4 in August.  I'd say the odds are near 100%
that some fix will be committed before then.

Actually, the odds are high that we'll have committed a fix within
a few days; we don't normally let this sort of discussion drag on.
If this problem is enough of a show-stopper for you, you could
consider applying the patch locally once it's committed.

                        regards, tom lane

[1] https://www.postgresql.org/developer/roadmap/


Reply | Threaded
Open this post in threaded view
|

Re: inconsistent results querying table partitioned by date

Tom Lane-2
In reply to this post by Amit Langote-2
Amit Langote <[hidden email]> writes:
> [ v2 patches ]

While this does seem to be fixing real bugs, it's failing to address my
point about stable vs. immutable operators.

Taking Alan's test case again (and using the v2 patch), consider:

regression=# explain select * from dataid where id=1 and datadatetime < '2018-01-01'::timestamp;
                                               QUERY PLAN                                              
--------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on dataid_default  (cost=4.18..11.30 rows=3 width=12)
   Recheck Cond: ((id = 1) AND (datadatetime < '2018-01-01 00:00:00'::timestamp without time zone))
   ->  Bitmap Index Scan on dataid_default_pkey  (cost=0.00..4.18 rows=3 width=0)
         Index Cond: ((id = 1) AND (datadatetime < '2018-01-01 00:00:00'::timestamp without time zone))
(4 rows)

That's fine.  The given date is older than anything in the dataid_201902
partition, so we can prune.  But change it to this:

regression=# explain select * from dataid where id=1 and datadatetime < '2018-01-01'::timestamptz;
                                               QUERY PLAN                                              
--------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on dataid_default  (cost=4.18..11.30 rows=3 width=12)
   Recheck Cond: ((id = 1) AND (datadatetime < '2018-01-01 00:00:00-05'::timestamp with time zone))
   ->  Bitmap Index Scan on dataid_default_pkey  (cost=0.00..4.18 rows=3 width=0)
         Index Cond: ((id = 1) AND (datadatetime < '2018-01-01 00:00:00-05'::timestamp with time zone))
(4 rows)

That's not fine.  What we have here is a "timestamp < timestamptz"
operator, which is only stable, therefore it might give different
results at runtime than at plan time.  You can't make plan-time
pruning decisions with that.  What we should have gotten here was
an Append node that could do run-time pruning.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: inconsistent results querying table partitioned by date

Amit Langote-2
On 2019/05/11 6:05, Tom Lane wrote:

> regression=# explain select * from dataid where id=1 and datadatetime < '2018-01-01'::timestamptz;
>                                                QUERY PLAN                                              
> --------------------------------------------------------------------------------------------------------
>  Bitmap Heap Scan on dataid_default  (cost=4.18..11.30 rows=3 width=12)
>    Recheck Cond: ((id = 1) AND (datadatetime < '2018-01-01 00:00:00-05'::timestamp with time zone))
>    ->  Bitmap Index Scan on dataid_default_pkey  (cost=0.00..4.18 rows=3 width=0)
>          Index Cond: ((id = 1) AND (datadatetime < '2018-01-01 00:00:00-05'::timestamp with time zone))
> (4 rows)
>
> That's not fine.  What we have here is a "timestamp < timestamptz"
> operator, which is only stable, therefore it might give different
> results at runtime than at plan time.  You can't make plan-time
> pruning decisions with that.  What we should have gotten here was
> an Append node that could do run-time pruning.
You're right.  It seems that prune_append_rel_partitions() is forgetting
to filter mutable clauses from rel->baserestrictinfo, like
relation_excluded_by_constraints() does.  I fixed that in the attached
0003 patch, which also adds a test for this scenario.  I needed to also
tweak run-time pruning support code a bit so that it considers the cases
involving mutable functions as requiring (startup) run-time pruning, in
addition to the cases with mutable expressions.  Adding David if he wants
to comment.

0003 patch cannot be applied as-is to both REL_11_STABLE and HEAD
branches, so I've attached two files, one for each branch.

BTW, while looking at this, I came across this comment in parse_coerce.c:
coerce_type():

       * XXX if the typinput function is not immutable, we really ought to
       * postpone evaluation of the function call until runtime. But there
       * is no way to represent a typinput function call as an expression
       * tree, because C-string values are not Datums. (XXX This *is*
       * possible as of 7.3, do we want to do it?)

Should something be done about that?

Thanks,
Amit

pg11-v1-0003-Fix-planner-to-not-prune-partitions-with-non-immu.patch (8K) Download Attachment
v1-0003-Fix-planner-to-not-prune-partitions-with-non-immu.patch (7K) Download Attachment
v2-0001-Add-test.patch (4K) Download Attachment
v2-0002-Fix-bugs-in-pruning-with-composite-range-partitio.patch (10K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: inconsistent results querying table partitioned by date

David Rowley-3
On Mon, 13 May 2019 at 17:40, Amit Langote
<[hidden email]> wrote:

>
> On 2019/05/11 6:05, Tom Lane wrote:
> > regression=# explain select * from dataid where id=1 and datadatetime < '2018-01-01'::timestamptz;
> >                                                QUERY PLAN
> > --------------------------------------------------------------------------------------------------------
> >  Bitmap Heap Scan on dataid_default  (cost=4.18..11.30 rows=3 width=12)
> >    Recheck Cond: ((id = 1) AND (datadatetime < '2018-01-01 00:00:00-05'::timestamp with time zone))
> >    ->  Bitmap Index Scan on dataid_default_pkey  (cost=0.00..4.18 rows=3 width=0)
> >          Index Cond: ((id = 1) AND (datadatetime < '2018-01-01 00:00:00-05'::timestamp with time zone))
> > (4 rows)
> >
> > That's not fine.  What we have here is a "timestamp < timestamptz"
> > operator, which is only stable, therefore it might give different
> > results at runtime than at plan time.  You can't make plan-time
> > pruning decisions with that.  What we should have gotten here was
> > an Append node that could do run-time pruning.
>
> You're right.  It seems that prune_append_rel_partitions() is forgetting
> to filter mutable clauses from rel->baserestrictinfo, like
> relation_excluded_by_constraints() does.  I fixed that in the attached
> 0003 patch, which also adds a test for this scenario.  I needed to also
> tweak run-time pruning support code a bit so that it considers the cases
> involving mutable functions as requiring (startup) run-time pruning, in
> addition to the cases with mutable expressions.  Adding David if he wants
> to comment.
Yeah. I don't think you're going about this the right way.  I don't
really see why we need to make any changes to the run-time pruning
code here, that part seems fine to me.  The problem seems to be that
match_clause_to_partition_key() thinks it can use a non-const
expression to compare to the partition key.  All immutable function
calls will already be folded to constants by this time, so what's
wrong with just insisting that the value being compared to the
partition key is a constant when generating steps during planning?
When we generate steps for run-time pruning we can use stable function
return values, but obviously still not volatile functions.   So isn't
the fix just to provide gen_partprune_steps with some sort of context
as to the environment that it's generating steps for?  Fixing it by
insisting the planner pruning only uses consts saves us some cycles
where we check for Vars and volatile expressions too. That's a pretty
good saving when you consider the fact that the expression sometimes
could be fairly complex.

I did it that way in the attached then tried running with your tests
and got failures.  After looking at those your tests look like they're
expecting the wrong results.

E.g

+explain (analyze, costs off, summary off, timing off) select * from
mc3p where a = 1 and abs(b) < (select 2);
+                       QUERY PLAN
+--------------------------------------------------------
+ Append (actual rows=0 loops=1)
+   InitPlan 1 (returns $0)
+     ->  Result (actual rows=1 loops=1)
+   ->  Seq Scan on mc3p0 (actual rows=0 loops=1)
+         Filter: ((a = 1) AND (abs(b) < $0))
+   ->  Seq Scan on mc3p_default (actual rows=0 loops=1)
+         Filter: ((a = 1) AND (abs(b) < $0))
+(7 rows)

to which, it does not seem very good that you pruned "mc3p1", since
(with my patch)

postgres=# insert into mc3p (a,b,c) values(1,1,1);
INSERT 0 1
postgres=# select tableoid::regclass,* from mc3p where a = 1 and
abs(b) < (select 2);
 tableoid | a | b | c
----------+---+---+---
 mc3p1    | 1 | 1 | 1
(1 row)

If those tests were passing in your code then you'd have missed that
row... Oops.

I've attached how I think the fix should look and included some tests
too.  I was a bit unsure if it was worth adding the new enum or not.
Probably could be just a bool, but I thought enum as it makes looking
at function calls easier on the human eye.  I also included a test
that ensures an immutable function's value is used for pruning during
planning...  Half of me thinks that test is pretty useless since the
return value is converted to a const well before pruning, but I left
it there anyway...

I've not done the PG11 version of the patch but I can do it if the
PG12 one looks okay.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

only_prune_with_const_pruning_planning.patch (13K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: inconsistent results querying table partitioned by date

Amit Langote-2
On 2019/05/14 10:50, David Rowley wrote:

> On Mon, 13 May 2019 at 17:40, Amit Langote wrote:
>> On 2019/05/11 6:05, Tom Lane wrote:
>>> regression=# explain select * from dataid where id=1 and datadatetime < '2018-01-01'::timestamptz;
>>>                                                QUERY PLAN
>>> --------------------------------------------------------------------------------------------------------
>>>  Bitmap Heap Scan on dataid_default  (cost=4.18..11.30 rows=3 width=12)
>>>    Recheck Cond: ((id = 1) AND (datadatetime < '2018-01-01 00:00:00-05'::timestamp with time zone))
>>>    ->  Bitmap Index Scan on dataid_default_pkey  (cost=0.00..4.18 rows=3 width=0)
>>>          Index Cond: ((id = 1) AND (datadatetime < '2018-01-01 00:00:00-05'::timestamp with time zone))
>>> (4 rows)
>>>
>>> That's not fine.  What we have here is a "timestamp < timestamptz"
>>> operator, which is only stable, therefore it might give different
>>> results at runtime than at plan time.  You can't make plan-time
>>> pruning decisions with that.  What we should have gotten here was
>>> an Append node that could do run-time pruning.
>>
>> You're right.  It seems that prune_append_rel_partitions() is forgetting
>> to filter mutable clauses from rel->baserestrictinfo, like
>> relation_excluded_by_constraints() does.  I fixed that in the attached
>> 0003 patch, which also adds a test for this scenario.  I needed to also
>> tweak run-time pruning support code a bit so that it considers the cases
>> involving mutable functions as requiring (startup) run-time pruning, in
>> addition to the cases with mutable expressions.  Adding David if he wants
>> to comment.
>
> Yeah. I don't think you're going about this the right way.  I don't
> really see why we need to make any changes to the run-time pruning
> code here, that part seems fine to me.  The problem seems to be that
> match_clause_to_partition_key() thinks it can use a non-const
> expression to compare to the partition key.  All immutable function
> calls will already be folded to constants by this time, so what's
> wrong with just insisting that the value being compared to the
> partition key is a constant when generating steps during planning?

The problem is different.  '2018-01-01'::timestamptz' in the condition
datadatetime < '2018-01-01'::timestamptz as presented to
match_clause_to_partition_key() is indeed a Const node, making it think
that it's OK to prune using it, that is, with or without your patch.
Here's the result for Tom's query quoted above, with your patch applied:

explain analyze select * from dataid where id=1 and datadatetime <
'2018-01-01'::timestamptz;
                                                         QUERY PLAN

────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 Bitmap Heap Scan on dataid_default  (cost=4.18..11.30 rows=3 width=12)
(actual time=0.050..0.058 rows=0 loops=1)
   Recheck Cond: ((id = 1) AND (datadatetime < '2018-01-01
00:00:00+09'::timestamp with time zone))
   ->  Bitmap Index Scan on dataid_default_pkey  (cost=0.00..4.18 rows=3
width=0) (actual time=0.027..0.035 rows=0 loops=1)
         Index Cond: ((id = 1) AND (datadatetime < '2018-01-01
00:00:00+09'::timestamp with time zone))
 Planning Time: 33660.807 ms
 Execution Time: 0.236 ms
(6 rows)

which is same as without the patch and is wrong as Tom complains.  His
complaint is that planning-time pruning should not have considered this
clause, because its result is only stable, not immutable.  That is, the
operator '<' (function timestamp_lt_timestamptz() in this case) is only
stable, not immutable.  The expected plan in this case is Append node with
run-time pruning set up and initial pruning will do the pruning, which you
get with my patch:

explain select * from dataid where id=1 and datadatetime <
'2018-01-01'::timestamptz;
                                                  QUERY PLAN

──────────────────────────────────────────────────────────────────────────────────────────────────────────────
 Append  (cost=4.18..22.63 rows=6 width=12)
   Subplans Removed: 1
   ->  Bitmap Heap Scan on dataid_default  (cost=4.18..11.30 rows=3 width=12)
         Recheck Cond: ((id = 1) AND (datadatetime < '2018-01-01
00:00:00+09'::timestamp with time zone))
         ->  Bitmap Index Scan on dataid_default_pkey  (cost=0.00..4.18
rows=3 width=0)
               Index Cond: ((id = 1) AND (datadatetime < '2018-01-01
00:00:00+09'::timestamp with time zone))
(6 rows)

The case you seem to be thinking of is where the condition is of shape
"partkey op stable-valued-function", but that case works fine today, so
needs no fixing.

Thanks,
Amit



Reply | Threaded
Open this post in threaded view
|

Re: inconsistent results querying table partitioned by date

David Rowley-3
On Tue, 14 May 2019 at 16:07, Amit Langote
<[hidden email]> wrote:
>
> The problem is different.  '2018-01-01'::timestamptz' in the condition
> datadatetime < '2018-01-01'::timestamptz as presented to
> match_clause_to_partition_key() is indeed a Const node, making it think
> that it's OK to prune using it, that is, with or without your patch.

Looks like I misunderstood the problem.

Is it not still better to ignore the unsupported quals in
match_clause_to_partition_key() rather than crafting a new List of
just the valid ones like you've done in your patch?

I admit that at some point during our parallel development of pruning
and run-time pruning, I thought that the pruning steps generated for
plan-time pruning could have been reused for run-time pruning, but we
can't do that since we must also use parameterised quals, which are
not in the baserestrictinfo therefore not there when we generate the
steps for the plan-time pruning. I feel that what you've got spreads
the logic out a bit too much. match_clause_to_partition_key() has
traditionally been in charge of deciding what quals can be used and
which ones can't. You've gone and added logic in
prune_append_rel_partitions() that makes part of this decision now and
that feels a bit wrong to me.

I've attached patch of how I think it should work.  This includes the
changes you made to analyze_partkey_exprs() and your tests from
v1-0003-Fix-planner-to-not-prune-partitions-with-non-immu.patch

Do you think it's a bad idea to do it this way?

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

dont_prune_with_nonimmutable_opfuncs_during_planning.patch (12K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: inconsistent results querying table partitioned by date

Amit Langote-2
On 2019/05/14 15:09, David Rowley wrote:

> On Tue, 14 May 2019 at 16:07, Amit Langote
> <[hidden email]> wrote:
>>
>> The problem is different.  '2018-01-01'::timestamptz' in the condition
>> datadatetime < '2018-01-01'::timestamptz as presented to
>> match_clause_to_partition_key() is indeed a Const node, making it think
>> that it's OK to prune using it, that is, with or without your patch.
>
> Looks like I misunderstood the problem.
>
> Is it not still better to ignore the unsupported quals in
> match_clause_to_partition_key() rather than crafting a new List of
> just the valid ones like you've done in your patch?

That's another way...

> I feel that what you've got spreads
> the logic out a bit too much. match_clause_to_partition_key() has
> traditionally been in charge of deciding what quals can be used and
> which ones can't.  You've gone and added logic in> prune_append_rel_partitions() that makes part of this decision now and
> that feels a bit wrong to me.

Actually, I had considered a solution like yours of distinguishing
prune_append_rel_partitions()'s invocations of gen_partprune_steps() from
make_partition_pruneinfo()'s, but thought it would be too much churn.

Also, another thing that pushed me toward the approach I took is what I
saw in predtest.c.  I mentioned upthread that constraint exclusion doesn't
have this problem, that is, it knows to skip stable-valued clauses, so I
went into predicate_refuted_by() and underlings to see how they do that,
but the logic wasn't down there.  It was in the following code in
relation_excluded_by_constraints():

    /*
     * ... We dare not make
     * deductions with non-immutable functions
     * ...
     */
    safe_restrictions = NIL;
    foreach(lc, rel->baserestrictinfo)
    {
        RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);

        if (!contain_mutable_functions((Node *) rinfo->clause))
            safe_restrictions = lappend(safe_restrictions, rinfo->clause);
    }

I think prune_append_rel_partitions() is playing a similar role as
relation_excluded_by_constraints(), that is, of dispatching the
appropriate clauses to the main pruning logic.  So, I thought enforcing
this policy in prune_append_rel_partitions() wouldn't be such a bad idea.

> I've attached patch of how I think it should work.  This includes the
> changes you made to analyze_partkey_exprs() and your tests from
> v1-0003-Fix-planner-to-not-prune-partitions-with-non-immu.patch

Thanks for updating the patch.  It works now.

> Do you think it's a bad idea to do it this way?

As I mentioned above, I think this may be quite a bit of code churn for
enforcing a policy that's elsewhere enforced in a much simpler manner.
How about deferring this to committer?

Thanks,
Amit



Reply | Threaded
Open this post in threaded view
|

Re: inconsistent results querying table partitioned by date

Tom Lane-2
Amit Langote <[hidden email]> writes:
> On 2019/05/14 15:09, David Rowley wrote:
>> Do you think it's a bad idea to do it this way?

> As I mentioned above, I think this may be quite a bit of code churn for
> enforcing a policy that's elsewhere enforced in a much simpler manner.
> How about deferring this to committer?

Not sure I'll be the one to commit this, but ...

I don't much like the approach in
v1-0003-Fix-planner-to-not-prune-partitions-with-non-immu.patch
because (a) it appears to me to add duplicative expression mutability
testing, which is kind of expensive since that adds syscache lookups,
plus it adds such tests even for clauses that aren't going to match any
partition keys; and (b) it's extremely nonobvious that this restricts
plan-time pruning and not run-time pruning.  I do see that
prune_append_rel_partitions is used only for the former; but you sure
as heck can't deduce that from any of its comments, so somebody might
try to use it for run-time pruning.

So I think David's got the right idea that match_clause_to_partition_key
is where to handle this, and I like the fact that his patch explicitly
represents whether we're trying to do run-time or plan-time pruning.
I agree it's kind of invasive, and I wonder why.  Shouldn't the
additional flag just be a field in the "context" struct, instead of
being a separate parameter that has to be laboriously passed through
many call levels?

(For myself, I'd have made it just a bool not an enum, given that
I don't foresee a need for additional values.  But that's definitely
a matter of preference.)

Also, I'm a bit confused by the fact that we seem to have a bunch
of spare-parts patches in this thread.  What combination is actually
being proposed for commit?

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: inconsistent results querying table partitioned by date

David Rowley-3
On Wed, 15 May 2019 at 08:58, Tom Lane <[hidden email]> wrote:
> So I think David's got the right idea that match_clause_to_partition_key
> is where to handle this, and I like the fact that his patch explicitly
> represents whether we're trying to do run-time or plan-time pruning.
> I agree it's kind of invasive, and I wonder why.  Shouldn't the
> additional flag just be a field in the "context" struct, instead of
> being a separate parameter that has to be laboriously passed through
> many call levels?

Thanks for having a look.  I originally stayed clear of "context"
since it looks more like how we output the pruning steps, rather than
a context as to how they should be generated. But it's true, using it
saves having to pass the extra argument around, so I made that change.

> (For myself, I'd have made it just a bool not an enum, given that
> I don't foresee a need for additional values.  But that's definitely
> a matter of preference.)

I've changed to bool, mostly because it makes the patch a bit smaller.

> Also, I'm a bit confused by the fact that we seem to have a bunch
> of spare-parts patches in this thread.  What combination is actually
> being proposed for commit?

I was also confused at the extra two patches Amit posted. The extra
one for the tests didn't look very valid to me, as I mentioned
yesterday.

I propose the attached for master, and I'll post something for v11 shortly.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

dont_prune_with_nonimmutable_opfuncs_during_planning_v2.patch (13K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: inconsistent results querying table partitioned by date

David Rowley-3
On Wed, 15 May 2019 at 11:52, David Rowley <[hidden email]> wrote:
> I propose the attached for master, and I'll post something for v11 shortly.

Here's the v11 version, as promised.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pg11_dont_prune_with_nonimmutable_opfuncs_during_planning_v2.patch (13K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: inconsistent results querying table partitioned by date

Amit Langote-2
In reply to this post by Tom Lane-2
On 2019/05/15 5:58, Tom Lane wrote:
> Amit Langote <[hidden email]> writes:
>> On 2019/05/14 15:09, David Rowley wrote:
>>> Do you think it's a bad idea to do it this way?
>
>> As I mentioned above, I think this may be quite a bit of code churn for
>> enforcing a policy that's elsewhere enforced in a much simpler manner.
>> How about deferring this to committer?
>
> Not sure I'll be the one to commit this, but ...

Thanks for the feedback.

> I don't much like the approach in
> v1-0003-Fix-planner-to-not-prune-partitions-with-non-immu.patch
> because (a) it appears to me to add duplicative expression mutability
> testing, which is kind of expensive since that adds syscache lookups,
> plus it adds such tests even for clauses that aren't going to match any
> partition keys; and (b) it's extremely nonobvious that this restricts
> plan-time pruning and not run-time pruning.  I do see that
> prune_append_rel_partitions is used only for the former; but you sure
> as heck can't deduce that from any of its comments, so somebody might
> try to use it for run-time pruning.

OK, both a and b are good arguments.

> So I think David's got the right idea that match_clause_to_partition_key
> is where to handle this, and I like the fact that his patch explicitly
> represents whether we're trying to do run-time or plan-time pruning.

Agreed that David's latest patch is good for this.

> Also, I'm a bit confused by the fact that we seem to have a bunch
> of spare-parts patches in this thread.  What combination is actually
> being proposed for commit?

Sorry for the confusion.

0001 adds tests for the other bugs but with the wrong expected output

0002 fixes those bugs and shows the corrected output

0003 was for the bug that stable operators are being used for plan-time
pruning which has its own test (only 0003 needed separate versions for
HEAD and v11)

Parts of 0003 are now superseded by David's patch, of which he has posted
both for-HEAD and for-v11 versions.

0001+0002 is obviously meant to be squashed, which I'll do in a bit.

Thanks,
Amit



Reply | Threaded
Open this post in threaded view
|

Re: inconsistent results querying table partitioned by date

Amit Langote-2
In reply to this post by David Rowley-3
Thanks for the updated patches.

On 2019/05/15 8:52, David Rowley wrote:

> On Wed, 15 May 2019 at 08:58, Tom Lane <[hidden email]> wrote:
>> So I think David's got the right idea that match_clause_to_partition_key
>> is where to handle this, and I like the fact that his patch explicitly
>> represents whether we're trying to do run-time or plan-time pruning.
>> I agree it's kind of invasive, and I wonder why.  Shouldn't the
>> additional flag just be a field in the "context" struct, instead of
>> being a separate parameter that has to be laboriously passed through
>> many call levels?
>
> Thanks for having a look.  I originally stayed clear of "context"
> since it looks more like how we output the pruning steps, rather than
> a context as to how they should be generated. But it's true, using it
> saves having to pass the extra argument around, so I made that change.
>
>> (For myself, I'd have made it just a bool not an enum, given that
>> I don't foresee a need for additional values.  But that's definitely
>> a matter of preference.)
>
> I've changed to bool, mostly because it makes the patch a bit smaller.
Looks good.  Thanks for tweaking many comments for clarity, except this
one could be improved a bit more:

+ * 'forplanner' must be true when being called from the query planner.

This description of 'forplanner' might make it sound like it's redundant
(always true), because gen_partprune_steps() is only called from the
planner today.  How about making it say the same thing as the comment next
to its definition in GeneratePruningStepsContext struct):

'forplanner' must be true if generating steps to be used during query
planning.

>> Also, I'm a bit confused by the fact that we seem to have a bunch
>> of spare-parts patches in this thread.  What combination is actually
>> being proposed for commit?
>
> I was also confused at the extra two patches Amit posted. The extra
> one for the tests didn't look very valid to me, as I mentioned
> yesterday.

As I said in my previous reply, 0001 contains wrong expected output
because of the other bugs discussed earlier.  I guess that's not very helpful.

Anyway, I'm attaching a squashed version of those two patches as
0001-Fix-bugs-in-pruning-with-composite-range-partition-k.patch.  It
applies as-is to both master and v11 branches.

> I propose the attached for master, and I'll post something for v11 shortly.

Thanks.  I'm re-attaching both of your patches here just to have all the
patches in one place.

Thanks,
Amit

0001-Fix-bugs-in-pruning-with-composite-range-partition-k.patch (13K) Download Attachment
dont_prune_with_nonimmutable_opfuncs_during_planning_v2.patch (13K) Download Attachment
pg11_dont_prune_with_nonimmutable_opfuncs_during_planning_v2.patch (13K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: inconsistent results querying table partitioned by date

David Rowley-3
On Wed, 15 May 2019 at 14:10, Amit Langote
<[hidden email]> wrote:
> Anyway, I'm attaching a squashed version of those two patches as
> 0001-Fix-bugs-in-pruning-with-composite-range-partition-k.patch.  It
> applies as-is to both master and v11 branches.

I had a look at
0001-Fix-bugs-in-pruning-with-composite-range-partition-k.patch, and
due to my patch no longer generating pruning steps during planning for
non-consts, I don't think the test you have there is doing a very good
job of highlighting the bug. Maybe you could exploit this bug in the
tests by having both initplan run-time pruning and exec-time run-time
pruning run. Something like:

set plan_cache_mode = 'force_generic_plan';
prepare q1 (int) as select * from mc3p where a = $1 and abs(b) < (select 2);
explain (costs off) execute q1(1);

Additionally, I'm wondering if we should also apply the attached as
part of my dont_prune_with_nonimmutable_opfuncs_during_planning_v2.patch
patch. We should never get a non-Const in the pruning steps for
planning, so there's likely no point in doing a run-time check to
ensure we have a planstate.  This code can execute quite often during
run-time pruning, once each time a parameter changes, which could be
each row.   I think I'll go do that now, and also fix up that
forplanner comment you mentioned.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

partkey_datum_from_expr_ensure_planstat_exists_with_non_const.patch (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: inconsistent results querying table partitioned by date

David Rowley-3
On Wed, 15 May 2019 at 14:20, David Rowley <[hidden email]> wrote:
> Additionally, I'm wondering if we should also apply the attached as
> part of my dont_prune_with_nonimmutable_opfuncs_during_planning_v2.patch
> patch. We should never get a non-Const in the pruning steps for
> planning, so there's likely no point in doing a run-time check to
> ensure we have a planstate.  This code can execute quite often during
> run-time pruning, once each time a parameter changes, which could be
> each row.   I think I'll go do that now, and also fix up that
> forplanner comment you mentioned.

I've made the change to partkey_datum_from_expr() in master's version
only. I did this because I had to document that the context argument
in get_matching_partitions() must have a valid planstate set when the
given pruning_steps were generated for the executor. It's probably
unlikely that anything else is using these functions, but still
thought it was a bad idea to change that in v11. At this stage, I
don't see any big issues changing that in master.

The only change in
pg11_dont_prune_with_nonimmutable_opfuncs_during_planning_v3.patch
since v2 is the inaccurate comment mentioned by Amit.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pg11_dont_prune_with_nonimmutable_opfuncs_during_planning_v3.patch (13K) Download Attachment
dont_prune_with_nonimmutable_opfuncs_during_planning_v3.patch (15K) Download Attachment
123