Extended statistics is not working on Vars hidden under a RelabelType

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

Extended statistics is not working on Vars hidden under a RelabelType

David Rowley-3
Basically, $subject is causing us not to properly find matching
extended stats in this case.

The attached patch fixes it.

The following test cases is an example of the misbehaviour. Note
rows=1 vs rows=98 in the Gather node.

create table ab (a varchar, b varchar);
insert into ab select (x%1000)::varchar, (x%10000)::Varchar from
generate_Series(1,1000000)x;
create statistics ab_a_b_stats (dependencies) on a,b from ab;
analyze ab;

-- Unpatched
explain analyze select * from ab where a = '1' and b = '1';
                                                   QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..12466.10 rows=1 width=7) (actual
time=0.441..90.515 rows=100 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on ab  (cost=0.00..11466.00 rows=1 width=7)
(actual time=1.081..74.944 rows=33 loops=3)
         Filter: (((a)::text = '1'::text) AND ((b)::text = '1'::text))
         Rows Removed by Filter: 333300
 Planning time: 0.184 ms
 Execution time: 105.878 ms
(8 rows)

-- Patched
explain analyze select * from ab where a = '1' and b = '1';
                                                    QUERY PLAN
------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..12475.80 rows=98 width=7) (actual
time=1.076..92.595 rows=100 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on ab  (cost=0.00..11466.00 rows=41 width=7)
(actual time=0.491..77.833 rows=33 loops=3)
         Filter: (((a)::text = '1'::text) AND ((b)::text = '1'::text))
         Rows Removed by Filter: 333300
 Planning time: 2.175 ms
 Execution time: 106.326 ms
(8 rows)

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

allow_relabelled_vars_in_dependency_stats.patch (992 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Extended statistics is not working on Vars hidden under a RelabelType

Tomas Vondra-4
On 10/10/2017 05:03 AM, David Rowley wrote:
> Basically, $subject is causing us not to properly find matching
> extended stats in this case.
>
> The attached patch fixes it.
>
> The following test cases is an example of the misbehaviour. Note
> rows=1 vs rows=98 in the Gather node.
>

Thanks for noticing this. The patch seems fine to me.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Extended statistics is not working on Vars hidden under a RelabelType

Álvaro Herrera
Tomas Vondra wrote:

> On 10/10/2017 05:03 AM, David Rowley wrote:
> > Basically, $subject is causing us not to properly find matching
> > extended stats in this case.
> >
> > The attached patch fixes it.
> >
> > The following test cases is an example of the misbehaviour. Note
> > rows=1 vs rows=98 in the Gather node.
>
> Thanks for noticing this. The patch seems fine to me.
I propose this slightly larger change.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

0001-Infer-functional-dependency-past-RelabelType.patch (3K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Extended statistics is not working on Vars hidden under a RelabelType

David Rowley-3
On 13 October 2017 at 02:17, Alvaro Herrera <[hidden email]> wrote:
> I propose this slightly larger change.

hmm, this is not right. You're not checking that there's a Var below
the RelabelType.

I tried with:

explain select * from ab where (a||a)::varchar = '' and b = '';

and your code assumed the OpExpr was a Var.

The reason Tomas coded it the way it was coded is due to the fact that
there's already code that works exactly the same way in
clauselist_selectivity(). Personally, I don't particularly like that
code, but I'd rather not invent a new way to do the same thing.


--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Extended statistics is not working on Vars hidden under a RelabelType

Álvaro Herrera
David Rowley wrote:

> The reason Tomas coded it the way it was coded is due to the fact that
> there's already code that works exactly the same way in
> clauselist_selectivity(). Personally, I don't particularly like that
> code, but I'd rather not invent a new way to do the same thing.

I pushed your original fix.  I still maintain that this should be
written differently, but I'm not going to try to change that in a hurry
and together with a bugfix.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Extended statistics is not working on Vars hidden under a RelabelType

David Rowley-3
On 13 October 2017 at 04:56, Alvaro Herrera <[hidden email]> wrote:
> I pushed your original fix.

Thanks for committing

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Extended statistics is not working on Vars hidden under a RelabelType

Robert Haas
In reply to this post by David Rowley-3
On Mon, Oct 9, 2017 at 11:03 PM, David Rowley
<[hidden email]> wrote:
> -- Unpatched
>  Planning time: 0.184 ms
>  Execution time: 105.878 ms
>
> -- Patched
>  Planning time: 2.175 ms
>  Execution time: 106.326 ms

This might not be the best example to show the advantages of the
patch, honestly.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Extended statistics is not working on Vars hidden under a RelabelType

Tomas Vondra-4

On 10/13/2017 10:04 PM, Robert Haas wrote:

> On Mon, Oct 9, 2017 at 11:03 PM, David Rowley
> <[hidden email]> wrote:
>> -- Unpatched
>>  Planning time: 0.184 ms
>>  Execution time: 105.878 ms
>>
>> -- Patched
>>  Planning time: 2.175 ms
>>  Execution time: 106.326 ms
>
> This might not be the best example to show the advantages of the
> patch, honestly.
>

Not sure what exactly is your point? If you're suggesting this example
is bad because the planning time increased from 0.184 to 2.175 ms, then
perhaps consider the plans were likely generated on a assert-enabled
build and on a laptop (both of which adds quite a bit of noise to
occasional timings). The patch has no impact on planning time (at least
I've been unable to measure any).

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Extended statistics is not working on Vars hidden under a RelabelType

David Rowley-3
In reply to this post by Robert Haas
On 14 October 2017 at 09:04, Robert Haas <[hidden email]> wrote:

> On Mon, Oct 9, 2017 at 11:03 PM, David Rowley
> <[hidden email]> wrote:
>> -- Unpatched
>>  Planning time: 0.184 ms
>>  Execution time: 105.878 ms
>>
>> -- Patched
>>  Planning time: 2.175 ms
>>  Execution time: 106.326 ms
>
> This might not be the best example to show the advantages of the
> patch, honestly.

The focus was on the row estimate. I try to highlight that by
mentioning "Note rows=1 vs rows=98 in the Gather node.". I can't
imagine the test I added would have made the planner about 12 times
slower, but just for the record:

create table ab (a varchar, b varchar);
insert into ab select (x%1000)::varchar, (x%10000)::Varchar from
generate_Series(1,1000000)x;
create statistics ab_a_b_stats (dependencies) on a,b from ab;
vacuum analyze ab;

$ cat a.sql
explain select * from ab where a = '1' and b = '1';

e9ef11ac8bb2acc2d2462fc17ec3291a959589e7 (Patched)

$ pgbench -f a.sql -T 60 -n
transaction type: a.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 496950
latency average = 0.121 ms
tps = 8282.481310 (including connections establishing)
tps = 8282.750821 (excluding connections establishing)

e9ef11ac8bb2acc2d2462fc17ec3291a959589e7~1 (Unpatched)

$ pgbench -f a.sql -T 60 -n
transaction type: a.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 511250
latency average = 0.117 ms
tps = 8520.822410 (including connections establishing)
tps = 8521.132784 (excluding connections establishing)

With the patch we are making use of the extended statistics, which we
do expect to be more work for the planner. Although, we didn't add
extended statistics to speed up the planner.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Extended statistics is not working on Vars hidden under a RelabelType

Robert Haas
In reply to this post by Tomas Vondra-4
On Fri, Oct 13, 2017 at 4:44 PM, Tomas Vondra
<[hidden email]> wrote:

> On 10/13/2017 10:04 PM, Robert Haas wrote:
>> On Mon, Oct 9, 2017 at 11:03 PM, David Rowley
>> <[hidden email]> wrote:
>>> -- Unpatched
>>>  Planning time: 0.184 ms
>>>  Execution time: 105.878 ms
>>>
>>> -- Patched
>>>  Planning time: 2.175 ms
>>>  Execution time: 106.326 ms
>>
>> This might not be the best example to show the advantages of the
>> patch, honestly.
>
> Not sure what exactly is your point? If you're suggesting this example
> is bad because the planning time increased from 0.184 to 2.175 ms, then
> perhaps consider the plans were likely generated on a assert-enabled
> build and on a laptop (both of which adds quite a bit of noise to
> occasional timings). The patch has no impact on planning time (at least
> I've been unable to measure any).

I don't really think there's a problem with the patch; I just noticed
that with the patch applied both the planning and execution time went
up.  I understand that's because this is a toy example, not a real
one.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Extended statistics is not working on Vars hidden under a RelabelType

Robert Haas
In reply to this post by David Rowley-3
On Fri, Oct 13, 2017 at 4:49 PM, David Rowley
<[hidden email]> wrote:
> tps = 8282.481310 (including connections establishing)
> tps = 8282.750821 (excluding connections establishing)

vs.

> tps = 8520.822410 (including connections establishing)
> tps = 8521.132784 (excluding connections establishing)
>
> With the patch we are making use of the extended statistics, which we
> do expect to be more work for the planner. Although, we didn't add
> extended statistics to speed up the planner.

Sure, I understand.  That's actually a pretty substantial regression -
I guess that means that it's pretty important to avoid creating
extended statistics that are not needed, at least for short-running
queries.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Extended statistics is not working on Vars hidden under a RelabelType

Tomas Vondra-4


On 10/14/2017 07:49 PM, Robert Haas wrote:

> On Fri, Oct 13, 2017 at 4:49 PM, David Rowley
> <[hidden email]> wrote:
>> tps = 8282.481310 (including connections establishing)
>> tps = 8282.750821 (excluding connections establishing)
>
> vs.
>
>> tps = 8520.822410 (including connections establishing)
>> tps = 8521.132784 (excluding connections establishing)
>>
>> With the patch we are making use of the extended statistics, which
>> we do expect to be more work for the planner. Although, we didn't
>> add extended statistics to speed up the planner.
>
> Sure, I understand. That's actually a pretty substantial regression
> - I guess that means that it's pretty important to avoid creating
> extended statistics that are not needed, at least for short-running
> queries.
>

Well, it's only about 3% difference in a single run, which may be easily
due to slightly different binary layout, random noise etc. So I wouldn't
call that "substantial regression", at least not based on this one test.

I've done more thorough testing, and what I see is 1.0-1.2% drop, but on
a test that's rather extreme (statistics on empty table). So again,
likely well within noise, and on larger tables it'll get even less
significant.

But of course - it's not free. It's a bit more work we need to do. But
if you don't need multi-column statistics, don't create them. If your
queries are already fast, you probably don't need them at all.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Extended statistics is not working on Vars hidden under a RelabelType

David Rowley-3
In reply to this post by Robert Haas
On 15 October 2017 at 06:49, Robert Haas <[hidden email]> wrote:

> On Fri, Oct 13, 2017 at 4:49 PM, David Rowley
> <[hidden email]> wrote:
>> tps = 8282.481310 (including connections establishing)
>> tps = 8282.750821 (excluding connections establishing)
>
> vs.
>
>> tps = 8520.822410 (including connections establishing)
>> tps = 8521.132784 (excluding connections establishing)
>>
>> With the patch we are making use of the extended statistics, which we
>> do expect to be more work for the planner. Although, we didn't add
>> extended statistics to speed up the planner.
>
> Sure, I understand.  That's actually a pretty substantial regression -
> I guess that means that it's pretty important to avoid creating
> extended statistics that are not needed, at least for short-running
> queries.

To be honest, I ran that on a VM on my laptop. I was getting quite a
bit of noise. I just posted that to show that the 12x slowdown didn't
exist. I don't know what the actual slowdown is. I just know extended
stats are not free and that nobody expected that they ever would be.
The good news is that they're off by default and if the bad ever
outweighs the good then the fix for that starts with "DROP STATISTICS"

I personally think it's great we're starting to see a useful feature
materialise that can help with poor row estimates from the planner.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Extended statistics is not working on Vars hidden under a RelabelType

Robert Haas
On Mon, Oct 16, 2017 at 2:36 AM, David Rowley
<[hidden email]> wrote:
> I personally think it's great we're starting to see a useful feature
> materialise that can help with poor row estimates from the planner.

I agree.  My original post to this thread was more of a throw-away
comment than anything, and I'm not attacking the feature.  I didn't
think it was a very clear example and, TBH, I still don't.  But I
don't want to blow that up into a big debate on the virtues of this
feature, which I never intended to question, or on the correctness of
the patch, which I also did not intend to question.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Previous Thread Next Thread