Optimze usage of immutable functions as relation

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

Optimze usage of immutable functions as relation

Aleksandr Parfenov
Hi hackers,

There is a strange behavior of the query planner in some cases if
stable/immutable was used a relation. In some cases, it affects costs of
operations and leads to a bad plan of the execution. Oleg Bartunov
noticed
such behavior in queries with a to_tsvector as a relation:

=# explain select '|'||subject||'|', ts_rank_cd(body_tsvector,q) from
messages, to_tsquery('tuple&header&overhead') q where body_tsvector @@
q;
                                         QUERY PLAN
------------------------------------------------------------------------------------------
  Nested Loop  (cost=383.37..58547.70 rows=4937 width=36)
    ->  Function Scan on to_tsquery q  (cost=0.25..0.26 rows=1 width=32)
    ->  Bitmap Heap Scan on messages  (cost=383.12..58461.04 rows=4937
width=275)
          Recheck Cond: (body_tsvector @@ q.q)
          ->  Bitmap Index Scan on message_body_idx  (cost=0.00..381.89
rows=4937 width=0)
                Index Cond: (body_tsvector @@ q.q)
(6 rows)

=# explain select '|'||subject||'|', ts_rank_cd(body_tsvector,q) from
messages, to_tsquery('tuple&header&overhead') q where body_tsvector @@
q limit 10;
                                    QUERY PLAN
--------------------------------------------------------------------------------
  Limit  (cost=0.25..425.62 rows=10 width=36)
    ->  Nested Loop  (cost=0.25..210005.80 rows=4937 width=36)
          Join Filter: (messages.body_tsvector @@ q.q)
          ->  Function Scan on to_tsquery q  (cost=0.25..0.26 rows=1
width=32)
          ->  Seq Scan on messages  (cost=0.00..197625.45 rows=987445
width=275)

The idea of the fix for this situation is to check is a result of the
function constant or not during the planning of the query. Attached
patch does
this by processing Var entries at planner stage and replace them with
constant value if it is possible. Plans after applying a patch (SeqScan
query for comparison):

=# explain select '|'||subject||'|', ts_rank_cd(body_tsvector,q) from
messages, to_tsquery('tuple&header&overhead') q where body_tsvector @@
q limit 10;
                                           QUERY PLAN
----------------------------------------------------------------------------------------------
  Limit  (cost=224.66..268.11 rows=3 width=36)
    ->  Nested Loop  (cost=224.66..268.11 rows=3 width=36)
          ->  Function Scan on to_tsquery q  (cost=0.25..0.26 rows=1
width=0)
          ->  Bitmap Heap Scan on messages  (cost=224.41..267.04 rows=3
width=275)
                Recheck Cond: (body_tsvector @@
to_tsquery('tuple&header&overhead'::text))
                ->  Bitmap Index Scan on message_body_idx  
(cost=0.00..224.41 rows=3 width=0)
                      Index Cond: (body_tsvector @@
to_tsquery('tuple&header&overhead'::text))
(7 rows)

=# set enable_bitmapscan=off;
SET
=# explain select '|'||subject||'|', ts_rank_cd(body_tsvector,q) from
messages, to_tsquery('tuple&header&overhead') q where body_tsvector @@
q limit 10;
                                         QUERY PLAN
------------------------------------------------------------------------------------------
  Limit  (cost=1000.25..296754.14 rows=3 width=36)
    ->  Gather  (cost=1000.25..296754.14 rows=3 width=36)
          Workers Planned: 2
          ->  Nested Loop  (cost=0.25..295753.32 rows=1 width=36)
                ->  Parallel Seq Scan on messages
(cost=0.00..295752.80 rows=1 width=275)
                      Filter: (body_tsvector @@
to_tsquery('tuple&header&overhead'::text))
                ->  Function Scan on to_tsquery q  (cost=0.25..0.26
rows=1 width=0)
(7 rows)

--
Aleksandr Parfenov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

funcscan_plan_optimizer.patch (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Optimze usage of immutable functions as relation

Andrew Gierth
>>>>> "Aleksandr" == Aleksandr Parfenov <[hidden email]> writes:

 Aleksandr> The idea of the fix for this situation is to check is a
 Aleksandr> result of the function constant or not during the planning
 Aleksandr> of the query. Attached patch does this by processing Var
 Aleksandr> entries at planner stage and replace them with constant
 Aleksandr> value if it is possible. Plans after applying a patch
 Aleksandr> (SeqScan query for comparison):

From an implementation point of view your patch is obviously broken in
many ways (starting with not checking varattno anywhere, and not
actually checking anywhere if the expression is volatile).

But more importantly the plans that you showed seem _worse_ in that
you've created extra calls to to_tsquery even though the query has been
written to try and avoid that.

I think you need to take a step back and explain more precisely what you
think you're trying to do and what the benefit (and cost) is.

Specific coding style points (not exhaustive):

 Aleksandr>  pointedNode->functions->length == 1

list_length()

 Aleksandr> pointedNode->functions->head->data.ptr_value

linitial() or linitial_node()

 Aleksandr> if (result->type == T_FuncExpr)

IsA()

(what if the result is the inline expansion of a volatile function?)

 Aleksandr> pfree(result);

result is a whole tree of nodes, pfree is pointless here

(don't bother trying to fix the above points in this patch, that won't
address the fundamental flaws)

--
Andrew (irc:RhodiumToad)

Reply | Threaded
Open this post in threaded view
|

Re: Optimze usage of immutable functions as relation

Aleksandr Parfenov
Hello Andrew,

Thank you for the review of the patch.

On Fri, 04 May 2018 08:37:31 +0100
Andrew Gierth <[hidden email]> wrote:
> From an implementation point of view your patch is obviously broken in
> many ways (starting with not checking varattno anywhere, and not
> actually checking anywhere if the expression is volatile).

The actual checking if the expression volatile or not is done inside
evaluate_function(). This is called through few more function in
eval_const_experssion(). If it's volatile, the eval_const_expression()
will return FuncExpr node, Const otherwise. It also checks are arguments
immutable or not.

I agree about varattno, it should be checked. Even in case of SRF not
replaced, it is better to be sure that Var points to first (and the
only) attribute.

> But more importantly the plans that you showed seem _worse_ in that
> you've created extra calls to to_tsquery even though the query has
> been written to try and avoid that.
>
> I think you need to take a step back and explain more precisely what
> you think you're trying to do and what the benefit (and cost) is.

Sure, the first version is some kind of prototype and attempt to
improve execution of the certain type of queries. The best solution I
see is to use the result of the function where it is required and remove
the nested loop in case of immutable functions. In this case, the
planner can choose a better plan if function result is used for tuples
selecting or as a join filter. But it will increase planning time due to
the execution of immutable functions.

It looks like I did something wrong with plans in the example, because
attached patch replaces function-relation usage with result value, not
function call. But nested loop is still in the plan.

I'm open to any suggestions/notices/critics about the idea and approach.

--
Aleksandr Parfenov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Reply | Threaded
Open this post in threaded view
|

Re: Optimze usage of immutable functions as relation

Andrew Gierth
>>>>> "Aleksandr" == Aleksandr Parfenov <[hidden email]> writes:

 >> From an implementation point of view your patch is obviously broken
 >> in many ways (starting with not checking varattno anywhere, and not
 >> actually checking anywhere if the expression is volatile).

 Aleksandr> The actual checking if the expression volatile or not is
 Aleksandr> done inside evaluate_function(). This is called through few
 Aleksandr> more function in eval_const_experssion(). If it's volatile,
 Aleksandr> the eval_const_expression() will return FuncExpr node, Const
 Aleksandr> otherwise. It also checks are arguments immutable or not.

You're missing a ton of other possible cases, of which by far the most
notable is function inlining: eval_const_expressions will inline even a
volatile function and return a new expression tree (which could be
almost anything depending on the function body).

 Aleksandr> I agree about varattno, it should be checked. Even in case
 Aleksandr> of SRF not replaced, it is better to be sure that Var points
 Aleksandr> to first (and the only) attribute.

It's not a matter of "better", but of basic correctness. Functions can
return composite values with columns.

--
Andrew (irc:RhodiumToad)

Reply | Threaded
Open this post in threaded view
|

Re: Optimze usage of immutable functions as relation

Aleksandr Parfenov
Hello,

I reworked a patch to make more stable in different cases. I decided to
use simplify_function instead of eval_const_expression to prevent
inlining of the function. The only possible outputs of the
simplify_function are Const node and NULL.

Also, I block pre-evaluation of functions with types other than
TYPTYPE_BASE, cause there is no special logic for compound (and others)
values yet.

There is still a problem with memory leak in case of simplified
arguments. The only way I see is a creation of temporary memory context,
but it cost some performance. Maybe we can store simplified arguments
in the pointed function itself for later use. But eval_const_expression
and friends doesn't change the content of the nodes inside the tree, it
generates new nodes and returns it as a result.

The last point to mention is a fixed plan for the query in the initial
letter of the thread. As I mentioned before, new versions of the patch
replace var not with a function call, but with a function execution
result. After the patch, the following plan is used instead of Nested
Loop with Sequence Scan:

explain select '|'||subject||'|', ts_rank_cd(body_tsvector,q) from messages, to_tsquery('english', 'tuple&header&overhead') q where body_tsvector @@ q limit 10;
                                             QUERY PLAN
----------------------------------------------------------------------------------------------------
 Limit  (cost=224.16..266.11 rows=3 width=36)
   ->  Nested Loop  (cost=224.16..266.11 rows=3 width=36)
         ->  Function Scan on q  (cost=0.00..0.01 rows=1 width=0)
         ->  Bitmap Heap Scan on messages  (cost=224.16..266.04 rows=3 width=275)
               Recheck Cond: (body_tsvector @@ '''tupl'' & ''header'' & ''overhead'''::tsquery)
               ->  Bitmap Index Scan on message_body_idx  (cost=0.00..224.16 rows=3 width=0)
                     Index Cond: (body_tsvector @@ '''tupl'' & ''header'' & ''overhead'''::tsquery)

--
Aleksandr Parfenov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

funcscan_plan_optimizer_v2.patch (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Optimze usage of immutable functions as relation

Heikki Linnakangas
On 16/05/18 13:47, Aleksandr Parfenov wrote:
> Hello,
>
> I reworked a patch to make more stable in different cases. I decided to
> use simplify_function instead of eval_const_expression to prevent
> inlining of the function. The only possible outputs of the
> simplify_function are Const node and NULL.

Hmm. That's still a bit inefficient, we'll try to simplify the function
on every reference to it.

We already simplify functions in function RTEs, but we currently do it
after preprocessing all other expressions in the query. See
subquery_planner(), around comment "/* Also need to preprocess
expressions within RTEs */". If we moved that so that we simplify
expressions in the range table first, then in the code that you're
adding to eval_const_expression_mutator(), you could just check if the
function expression is already a Const.


But stepping back a bit, it's a bit weird that we're handling this
differently from VALUES and other subqueries. The planner knows how to
do this trick for simple subqueries:

postgres=# explain select * from tenk1, (select abs(100)) as a (a) where
unique1 < a;
                         QUERY PLAN
-----------------------------------------------------------
  Seq Scan on tenk1  (cost=0.00..483.00 rows=100 width=248)
    Filter: (unique1 < 100)
(2 rows)

Note that it not only evaluated the function into a constant, but also
got rid of the join. For a function RTE, however, it can't do that:

postgres=# explain select * from tenk1, abs(100) as a (a) where unique1 < a;
                             QUERY PLAN
-------------------------------------------------------------------
  Nested Loop  (cost=0.00..583.01 rows=3333 width=248)
    Join Filter: (tenk1.unique1 < a.a)
    ->  Function Scan on a  (cost=0.00..0.01 rows=1 width=4)
    ->  Seq Scan on tenk1  (cost=0.00..458.00 rows=10000 width=244)
(4 rows)

Could we handle this in pull_up_subqueries(), similar to the
RTE_SUBQUERY and RTE_VALUES cases?

- Heikki

Reply | Threaded
Open this post in threaded view
|

Re: Optimze usage of immutable functions as relation

Tom Lane-2
Heikki Linnakangas <[hidden email]> writes:
> But stepping back a bit, it's a bit weird that we're handling this
> differently from VALUES and other subqueries. The planner knows how to
> do this trick for simple subqueries:

> postgres=# explain select * from tenk1, (select abs(100)) as a (a) where
> unique1 < a;
>                          QUERY PLAN
> -----------------------------------------------------------
>   Seq Scan on tenk1  (cost=0.00..483.00 rows=100 width=248)
>     Filter: (unique1 < 100)
> (2 rows)

> Note that it not only evaluated the function into a constant, but also
> got rid of the join. For a function RTE, however, it can't do that:

> postgres=# explain select * from tenk1, abs(100) as a (a) where unique1 < a;
>                              QUERY PLAN
> -------------------------------------------------------------------
>   Nested Loop  (cost=0.00..583.01 rows=3333 width=248)
>     Join Filter: (tenk1.unique1 < a.a)
>     ->  Function Scan on a  (cost=0.00..0.01 rows=1 width=4)
>     ->  Seq Scan on tenk1  (cost=0.00..458.00 rows=10000 width=244)
> (4 rows)

> Could we handle this in pull_up_subqueries(), similar to the
> RTE_SUBQUERY and RTE_VALUES cases?

Perhaps.  You could only do it for non-set-returning functions, which
isn't the typical use of function RTEs, which is probably why we've not
thought hard about it before.  I'm not sure what would need to happen for
lateral functions.  Also to be considered, if it's not foldable to a
constant, is whether we're risking evaluating it more times than before.

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: Optimze usage of immutable functions as relation

Aleksandr Parfenov
On Tue, 10 Jul 2018 17:34:20 -0400
Tom Lane <[hidden email]> wrote:

>Heikki Linnakangas <[hidden email]> writes:
>> But stepping back a bit, it's a bit weird that we're handling this
>> differently from VALUES and other subqueries. The planner knows how
>> to do this trick for simple subqueries:  
>
>> postgres=# explain select * from tenk1, (select abs(100)) as a (a)
>> where unique1 < a;
>>                          QUERY PLAN
>> -----------------------------------------------------------
>>   Seq Scan on tenk1  (cost=0.00..483.00 rows=100 width=248)
>>     Filter: (unique1 < 100)
>> (2 rows)  
>
>> Note that it not only evaluated the function into a constant, but
>> also got rid of the join. For a function RTE, however, it can't do
>> that:  
>
>> postgres=# explain select * from tenk1, abs(100) as a (a) where
>> unique1 < a; QUERY PLAN
>> -------------------------------------------------------------------
>>   Nested Loop  (cost=0.00..583.01 rows=3333 width=248)
>>     Join Filter: (tenk1.unique1 < a.a)  
>>     ->  Function Scan on a  (cost=0.00..0.01 rows=1 width=4)
>>     ->  Seq Scan on tenk1  (cost=0.00..458.00 rows=10000 width=244)  
>> (4 rows)  
>
>> Could we handle this in pull_up_subqueries(), similar to the
>> RTE_SUBQUERY and RTE_VALUES cases?  
>
>Perhaps.  You could only do it for non-set-returning functions, which
>isn't the typical use of function RTEs, which is probably why we've not
>thought hard about it before.  I'm not sure what would need to happen
>for lateral functions.  Also to be considered, if it's not foldable to
>a constant, is whether we're risking evaluating it more times than
>before.
>
> regards, tom lane
I reworked the patch and implemented processing of FuncScan in
pull_up_subqueries() in a way similar to VALUES processing. In order to
prevent folding of non-foldable functions it checks provolatile of the
function and are arguments const or not and return type to prevent
folding of SRF.

--
Aleksandr Parfenov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

funcscan_plan_optimizer_v3.patch (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Optimze usage of immutable functions as relation

a.bykov
The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:       not tested
Spec compliant:           not tested
Documentation:            not tested

Hello,
I was trying to review your patch, but I couldn't install it:

prepjointree.c: In function ‘pull_up_simple_function’:
prepjointree.c:1793:41: error: ‘functions’ undeclared (first use in this function); did you mean ‘PGFunction’?
  Assert(!contain_vars_of_level((Node *) functions, 0));

Was it a typo?

The new status of this patch is: Waiting on Author
Reply | Threaded
Open this post in threaded view
|

Re: Optimze usage of immutable functions as relation

Aleksandr Parfenov-2
Hi,

Thank you for the review.
I fixed a typo and some comments. Please find new version attached.

---
Best regards,
Parfenov Aleksandr

On Fri, 19 Oct 2018 at 16:40, Anthony Bykov <[hidden email]> wrote:
The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:       not tested
Spec compliant:           not tested
Documentation:            not tested

Hello,
I was trying to review your patch, but I couldn't install it:

prepjointree.c: In function ‘pull_up_simple_function’:
prepjointree.c:1793:41: error: ‘functions’ undeclared (first use in this function); did you mean ‘PGFunction’?
  Assert(!contain_vars_of_level((Node *) functions, 0));

Was it a typo?

The new status of this patch is: Waiting on Author

funcscan_plan_optimizer_v4.patch (8K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Optimze usage of immutable functions as relation

Kyotaro HORIGUCHI-2
Hello.

# It might be better that you provided self-contained test case.

As the discussion between Heikki and Tom just upthread, it would
be doable but that usage isn't typical. The query would be
normally written as followings and they are transformed as
desired.

select '|'||subject||'|', ts_rank_cd(body_tsvector,q) from
messages where body_tsvector @@ to_tsquery('tuple&header&overhead');

or (this doesn't look normal, thought..)

select '|'||subject||'|', ts_rank_cd(body_tsvector,q) from
messages, (select to_tsquery('tuple&header&overhead') as q) q
where body_tsvector @@ q;

This means that the wanted pull up logic is almost already
there. You should look on how it is handled.


At Sat, 20 Oct 2018 01:31:04 +0700, Aleksandr Parfenov <[hidden email]> wrote in <[hidden email]>
> Hi,
>
> Thank you for the review.
> I fixed a typo and some comments. Please find new version attached.

I had the following error with the following query.

=# explain select * from pg_stat_get_activity(NULL) a join log(100000.0) p on a.pid = p.p;
ERROR:  no relation entry for relid 2


As Andrew pointed, you are missing many things to do.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


Reply | Threaded
Open this post in threaded view
|

Re: Optimze usage of immutable functions as relation

Antonin Houska-2
Kyotaro HORIGUCHI <[hidden email]> wrote:

> I had the following error with the following query.
>
> =# explain select * from pg_stat_get_activity(NULL) a join log(100000.0) p on a.pid = p.p;
> ERROR:  no relation entry for relid 2
>

I think that the problem is that RTE_VALUES is wrapped in a subquery by parser
while RTE_FUNCTION is not. Thus some additional processing is done by
pull_up_simple_subquery() for VALUES. What seems to make difference here is
the call of flatten_join_alias_vars() on the query targetlist, although
pull_up_simple_subquery() does it for other reasons.

Actually the comment about flatten_join_alias_vars() in
pull_up_simple_subquery() makes me think if it's o.k. that
pull_up_simple_function() sets rvcontext.need_phvs=false regardless strictness
of the function: I think PlaceHolderVar might be needed if the function is not
strict. (In contrast, pull_up_simple_values() does not have to care because
pull_up_simple_subquery() handles such cases when it's processing the owning
subquery).

--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26, A-2700 Wiener Neustadt
Web: https://www.cybertec-postgresql.com

Reply | Threaded
Open this post in threaded view
|

Re: Optimze usage of immutable functions as relation

Antonin Houska-2
In reply to this post by Aleksandr Parfenov-2
Aleksandr Parfenov <[hidden email]> wrote:

> I fixed a typo and some comments. Please find new version attached.

I've checked this verstion too.

* is_simple_stable_function()

instead of fetching HeapTuple from the syscache manually, you might want to
consider using functions from lsyscache.c (get_func_rettype, get_func_retset,
etc.), or adding a function that returns (subset of) the fields you need in a
single call.

* pull_up_simple_function():

As you assume that ret->functions is a single-item list

        Assert(list_length(rte->functions) == 1);

the following iteration is not necessary:

        foreach(lc, functions_list)

Also, there seems to be a lot of copy & paste from pull_up_simple_values(), so
some refactoring would make sense.


--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26, A-2700 Wiener Neustadt
Web: https://www.cybertec-postgresql.com

Reply | Threaded
Open this post in threaded view
|

Re: Optimze usage of immutable functions as relation

Tom Lane-2
In reply to this post by Aleksandr Parfenov-2
Aleksandr Parfenov <[hidden email]> writes:
> [ funcscan_plan_optimizer_v4.patch ]

A small note here --- this would be noticeably cleaner if removal of
the no-longer-needed function RTE were done using the dummy-relation
infrastructure I proposed in

https://commitfest.postgresql.org/20/1827/

Maybe we should get that done first and then come back to this.

I'm a bit suspicious of the premise of this though, because it's
going to result in a single call of a function being converted
into possibly many calls, if the RTE is referenced in a lot of
places.  Even if the function is immutable so that those calls
get evaluated at plan time not run time, it's still N calls not
one, which'd be bad if the function is expensive.  See e.g.

https://www.postgresql.org/message-id/flat/CAOYf6edujEOGB-9fGG_V9PGQ5O9yoyWmTnE9RyBYTGw%2BDq3SpA%40mail.gmail.com

for an example where we're specifically telling people that
they can put a function into FROM to avoid multiple evaluation.
This patch breaks that guarantee.

A possible fix for this is to do eval_const_expressions() on
function RTE expressions at this stage (and then not need to
do it later), and then pull up only when we find that the
RTE expression has been reduced to a single Const.  I'm not
sure whether invoking eval_const_expressions() so early
would create any issues.  But if it can be made to work, this
would eliminate quite a lot of the ad-hoc conditions that
you have in the patch now.

                        regards, tom lane