The flinfo->fn_extra question, from me this time.

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

The flinfo->fn_extra question, from me this time.

Chapman Flack
Hi hackers,

I see evidence on this list that it's sort of a rite of passage
to ask the flinfo->fn_extra question, and my time has come.

So please let me know if I seem to correctly understand the limits
on its use.

I gather that various extensions use it to stash various things. But
(I assume) ... they will only touch fn_extra in FmgrInfo structs that
pertain to *their own functions*. (Please say that's true?)

IOW, it is assured that, if I am a language handler, when I am called
to handle a function in my language, fn_extra is mine to use as I please ...

... with the one big exception, if I am handling a function in my language
that returns a set, and I will use SFRM_ValuePerCall mode, I have to leave
fn_extra NULL before SRF_FIRSTCALL_INIT(), which plants its own gunk there,
and then I can stash my stuff in gunk->user_fctx for the duration of that
SRF call.

Does that seem to catch the essentials?

Thanks,
-Chap


p.s.: noticed in fmgr/README: "Note that simple "strict" functions can
ignore both isnull and args[i].isnull, since they won't even get called
when there are any TRUE values in args[].isnull."

I get why a strict function can ignore args[i].isnull, but is the part
about ignoring isnull a mistake? A strict function can be passed all
non-null arguments and still return null if it wants to, right?


Reply | Threaded
Open this post in threaded view
|

Re: The flinfo->fn_extra question, from me this time.

Tom Lane-2
Chapman Flack <[hidden email]> writes:
> So please let me know if I seem to correctly understand the limits
> on its use.

> I gather that various extensions use it to stash various things. But
> (I assume) ... they will only touch fn_extra in FmgrInfo structs that
> pertain to *their own functions*. (Please say that's true?)

> IOW, it is assured that, if I am a language handler, when I am called
> to handle a function in my language, fn_extra is mine to use as I please ...

Yup.

> ... with the one big exception, if I am handling a function in my language
> that returns a set, and I will use SFRM_ValuePerCall mode, I have to leave
> fn_extra NULL before SRF_FIRSTCALL_INIT(), which plants its own gunk there,
> and then I can stash my stuff in gunk->user_fctx for the duration of that
> SRF call.

Yup.  (Of course, you don't have to use the SRF_FIRSTCALL_INIT
infrastructure.)

Keep in mind that in most contexts, whatever you cache in fn_extra
will only be there for the life of the current query.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: The flinfo->fn_extra question, from me this time.

Chapman Flack
On 06/15/19 21:21, Tom Lane wrote:
> Yup.  (Of course, you don't have to use the SRF_FIRSTCALL_INIT
> infrastructure.)

That had crossed my mind ... but it seems there's around 80 or 100
lines of good stuff there that'd be a shame to duplicate. If only
init_MultiFuncCall() took an extra void ** argument, and the stock
SRF_FIRSTCALL_INIT passed &(fcinfo->flinfo->fn_extra), seems like
most of it would be reusable. shutdown_MultiFuncCall would need to work
slightly differently, and a caller who wanted to be different would need
a customized variant of SRF_PERCALL_SETUP, but that's two lines.

Cheers,
-Chap


Reply | Threaded
Open this post in threaded view
|

Re: The flinfo->fn_extra question, from me this time.

Chapman Flack
On 06/15/19 21:46, Chapman Flack wrote:
> On 06/15/19 21:21, Tom Lane wrote:
>> Yup.  (Of course, you don't have to use the SRF_FIRSTCALL_INIT
>> infrastructure.)
>
> That had crossed my mind ... but it seems there's around 80 or 100
> lines of good stuff there that'd be a shame to duplicate. If only

I suppose that's only if I want to continue using SFRM_ValuePerCall mode.

SFRM_Materialize mode could remove a good deal of complexity currently
in PL/Java around managing memory contexts, SPI_connect, etc. through
multiple calls ... and I'd also have fn_extra all to myself.

Until now, I had assumed that SFRM_ValuePerCall mode might offer some
benefits, such as the possibility of pipelining certain queries and not
building up a whole tuplestore in advance.

But looking in the code, I'm getting the impression that those
benefits are only theoretical future ones, as ExecMakeTableFunctionResult
implements SFRM_ValuePerCall mode by ... repeatedly calling the function
to build up a whole tuplestore in advance.

Am I right about that? Are there other sites from which a SRF might be
called that I haven't found, where ValuePerCall mode might actually
support some form of pipelining? Are there actual cases where allowedModes
might not contain SFRM_Materialize?

Or is the ValuePerCall variant currently there just to support possible
future such cases, none of which exist at the moment?

Regards,
-Chap


Reply | Threaded
Open this post in threaded view
|

Re: The flinfo->fn_extra question, from me this time.

Tom Lane-2
Chapman Flack <[hidden email]> writes:
> Until now, I had assumed that SFRM_ValuePerCall mode might offer some
> benefits, such as the possibility of pipelining certain queries and not
> building up a whole tuplestore in advance.

> But looking in the code, I'm getting the impression that those
> benefits are only theoretical future ones, as ExecMakeTableFunctionResult
> implements SFRM_ValuePerCall mode by ... repeatedly calling the function
> to build up a whole tuplestore in advance.

Yes, that's the case for a SRF in FROM.  A SRF in the targetlist
actually does get the chance to pipeline, if it implements ValuePerCall.

The FROM case could be improved perhaps, if somebody wanted to put
time into it.  You'd still need to be prepared to build a tuplestore,
in case of rescan or backwards fetch; but in principle you could return
rows immediately while stashing them aside in a tuplestore.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: The flinfo->fn_extra question, from me this time.

denty
On 21 Jul 2019, at 22:54, Tom Lane <[hidden email]> wrote:

>
> Chapman Flack <[hidden email]> writes:
>> Until now, I had assumed that SFRM_ValuePerCall mode might offer some
>> benefits, such as the possibility of pipelining certain queries and not
>> building up a whole tuplestore in advance.
>
>> But looking in the code, I'm getting the impression that those
>> benefits are only theoretical future ones, as ExecMakeTableFunctionResult
>> implements SFRM_ValuePerCall mode by ... repeatedly calling the function
>> to build up a whole tuplestore in advance.
>
> Yes, that's the case for a SRF in FROM.  A SRF in the targetlist
> actually does get the chance to pipeline, if it implements ValuePerCall.
>
> The FROM case could be improved perhaps, if somebody wanted to put
> time into it.

While looking at whether REFCURSOR output could be pipelined into the executor [1], I’ve stumbled upon the same.

By any chance, do either of you know if there are initiatives to make the changes mentioned?

> You'd still need to be prepared to build a tuplestore,
> in case of rescan or backwards fetch; but […]

I’m also interested in your comment here. If the function was STABLE, could not the function scan simply be restarted? (Rather than needing to create the tuplestore for all cases.)

I guess perhaps the backwards scan is where it falls down though...

> […] in principle you could return
> rows immediately while stashing them aside in a tuplestore.

Does the planner have any view on this? When I first saw what was going on, I presumed the planner had decided the cost of multiple function scans was greater than the cost of materialising it in a temporary store.

It occurs to me that, if we made a switch towards pipelining the function scan results directly out, then we might be loose efficiency where there are a significant number of scans and/or the function cost high. Is that why you were suggesting to as well stash them aside?

denty.

[1] https://www.postgresql.org/message-id/B2AFCAB5-FACD-44BF-963F-7DD2735FAB5D%40QQdd.eu

Reply | Threaded
Open this post in threaded view
|

Re: The flinfo->fn_extra question, from me this time.

Tom Lane-2
Dent John <[hidden email]> writes:
> On 21 Jul 2019, at 22:54, Tom Lane <[hidden email]> wrote:
>> Chapman Flack <[hidden email]> writes:
>>> But looking in the code, I'm getting the impression that those
>>> benefits are only theoretical future ones, as ExecMakeTableFunctionResult
>>> implements SFRM_ValuePerCall mode by ... repeatedly calling the function
>>> to build up a whole tuplestore in advance.

>> Yes, that's the case for a SRF in FROM.  A SRF in the targetlist
>> actually does get the chance to pipeline, if it implements ValuePerCall.
>> The FROM case could be improved perhaps, if somebody wanted to put
>> time into it.

> By any chance, do either of you know if there are initiatives to make the changes mentioned?

I don't know of anybody working on it.

>> You'd still need to be prepared to build a tuplestore,
>> in case of rescan or backwards fetch; but […]

> I’m also interested in your comment here. If the function was STABLE, could not the function scan simply be restarted? (Rather than needing to create the tuplestore for all cases.)
> I guess perhaps the backwards scan is where it falls down though...

My point was that you can't simply remove the tuplestore-building code
path.  The exact boundary conditions for that might be negotiable.
But I'd be very dubious of an assumption that re-running the function
would be cheaper than building a tuplestore, regardless of whether it's
safe.

> Does the planner have any view on this?

cost_functionscan and cost_rescan would likely need some adjustment if
possible.  However, I'm not sure that the planner has any way to know
whether a given SRF will support ValuePerCall or not.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: The flinfo->fn_extra question, from me this time.

Robert Haas
In reply to this post by Tom Lane-2
On Sun, Jul 21, 2019 at 5:55 PM Tom Lane <[hidden email]> wrote:
> The FROM case could be improved perhaps, if somebody wanted to put
> time into it.  You'd still need to be prepared to build a tuplestore,
> in case of rescan or backwards fetch; but in principle you could return
> rows immediately while stashing them aside in a tuplestore.

But you could skip it if you could prove that no rescans or backward
fetches are possible for a particular node, something that we also
want for Gather, as discussed not long ago.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: The flinfo->fn_extra question, from me this time.

denty
In reply to this post by Tom Lane-2
On 22 Sep 2019, at 16:01, Tom Lane <[hidden email]> wrote:

Hi Tom,

I don't know of anybody working on it.

Okay. I had a look at this. I tried to apply Andre’s patch [1] from some time ago, but that turned out not so easy. I guess the code has moved on since. So I’ve attempted to re-invent the same spirit, stealing from his patch, and from how the tSRF code does things. The patch isn’t final, but it demonstrates a concept.

However, given your comments below, I wonder if you might comment on the approach before I go further?

(Patch is presently still against 12beta2.)

You'd still need to be prepared to build a tuplestore,
in case of rescan or backwards fetch; but […]

I do recognise this. The patch teaches ExecMaterializesOutput() and ExecSupportsBackwardScan() that T_FunctionScan nodes don't materialise their output.

(Actually, Andre’s patch did the educating of ExecMaterializesOutput() and ExecSupportsBackwardScan() — it’s not my invention.)

I haven’t worked out how to easily demonstrate the backward scan case, but joins (which presumably are the typical cause of rescan) now yield an intermediate Materialize node.

postgres=# explain (analyze, buffers) select * from unnest (array_fill ('scanner'::text, array[10])) t1, unnest (array_fill ('dummy'::text, array[10000000])) t2 limit 100;
                                                            QUERY PLAN                                                            
----------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.01..1.36 rows=100 width=64) (actual time=0.009..0.067 rows=100 loops=1)
   ->  Nested Loop  (cost=0.01..1350000.13 rows=100000000 width=64) (actual time=0.008..0.049 rows=100 loops=1)
         ->  Function Scan on unnest t2  (cost=0.00..100000.00 rows=10000000 width=32) (actual time=0.003..0.006 rows=10 loops=1)
         ->  Materialize  (cost=0.00..0.15 rows=10 width=32) (actual time=0.000..0.002 rows=10 loops=10)
               ->  Function Scan on unnest t1  (cost=0.00..0.10 rows=10 width=32) (actual time=0.001..0.004 rows=10 loops=1)
 Planning Time: 127.875 ms
 Execution Time: 0.102 ms
(7 rows)

My point was that you can't simply remove the tuplestore-building code
path.  The exact boundary conditions for that might be negotiable.
But I'd be very dubious of an assumption that re-running the function
would be cheaper than building a tuplestore, regardless of whether it's
safe.

Understood, and I agree. I think it’s preferable to allow the planner control over when to explicitly materialise.

But if I’m not wrong, at present, the planner doesn’t really trade-off the cost of rescan versus materialisation, but instead adopts a simple heuristic of materialising one or other side during a join. We can see this in the plans if the unnest()s are moved into the target list and buried in a subquery. For example:

postgres=# explain (analyze, buffers) select * from (select unnest (array_fill ('scanner'::text, array[10]))) t1, (select unnest (array_fill ('dummy'::text, array[10000000]))) t2 limit 100;
                                                   QUERY PLAN                                                    
-----------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..1.40 rows=100 width=64) (actual time=0.011..0.106 rows=100 loops=1)
   ->  Nested Loop  (cost=0.00..1400000.21 rows=100000000 width=64) (actual time=0.010..0.081 rows=100 loops=1)
         ->  ProjectSet  (cost=0.00..50000.02 rows=10000000 width=32) (actual time=0.004..0.024 rows=10 loops=1)
               ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=1)
         ->  Materialize  (cost=0.00..0.22 rows=10 width=32) (actual time=0.001..0.002 rows=10 loops=10)
               ->  ProjectSet  (cost=0.00..0.07 rows=10 width=32) (actual time=0.001..0.004 rows=10 loops=1)
                     ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.000..0.001 rows=1 loops=1)
 Planning Time: 180.482 ms
 Execution Time: 0.148 ms
(9 rows)

I am tempted to stop short of educating the planner about the possibility to re-scan (thus dropping the materialise node) during a join. It seems feasible, and sometimes advantageous. (Perhaps if the join quals would cause a huge amount of the output to be filtered??) But it also seems better to treat it as an entirely separate issue.

cost_functionscan and cost_rescan would likely need some adjustment if
possible.

I looked at cost_functionscan(), but I think it is already of the view that a function can pipeline. It computes a startup_cost and a run_cost, where run_cost is the per-tuple cost * num_rows. With this understanding, it is actually wrong given the current materialisation-always behaviour. I think this means I don’t need to make any fundamental changes in order to correctly cost the new behaviour.

However, I'm not sure that the planner has any way to know
whether a given SRF will support ValuePerCall or not.

Yes. There is a flaw. But with the costing support function interface, it’s possible to supply costs that correctly relate to the SRF’s abilities.

I guess there can be a case where the SRF supports ValuePerCall, and supplies costs accordingly, but at execution time, decides to not to use it. That seems a curious situation, but it will, at worst, cost us a bit more buffer space.

In the opposite case, where the SRF can’t support ValuePerCall, the risk is that the planner has decided it wants to interject a Materialize node, and the result will be buffer-to-buffer copying. If the function has a costing support function, it should all be costed correctly, but it’s obviously not ideal. Currently, my patch doesn’t do anything about this case. My plan would be to allow the Materialize node to be supplied with a tuplestore from the FunctionScan node at execution time. I guess this optimisation would similarly help non-ValuePerCall tSRFs.

After all this, I’m wondering how you view the proposal?

For sake of comparison, 12beta1 achieves the following plans:

postgres=# create or replace function test1() returns setof record language plpgsql as $$ begin return query (select 'a', generate_series (1, 1e6)); end; $$; -- using plpgsql because it can’t pipeline
CREATE FUNCTION
postgres=# explain (verbose, analyse, buffers) select key, count (value), sum (value) from test1() as (key text, value numeric) group by key;
...
 Planning Time: 0.068 ms
 Execution Time: 589.651 ms

postgres=# explain (verbose, analyse, buffers) select * from test1() as (key text, value numeric) limit 50;
...
 Planning Time: 0.059 ms
 Execution Time: 348.334 ms

postgres=# explain (analyze, buffers) select count (a.a), sum (a.a) from unnest (array_fill (1::numeric, array[10000000])) a;
...
 Planning Time: 165.502 ms
 Execution Time: 5629.094 ms

postgres=# explain (analyze, buffers) select * from unnest (array_fill (1::numeric, array[10000000])) limit 50;
...
 Planning Time: 110.952 ms
 Execution Time: 1080.609 ms

Versus 12beta2+patch, which seem favourable in the main, at least for these pathological cases:

postgres=# explain (verbose, analyse, buffers) select key, count (value), sum (value) from test1() as (key text, value numeric) group by key;
...
 Planning Time: 0.068 ms
 Execution Time: 591.749 ms

postgres=# explain (verbose, analyse, buffers) select * from test1() as (key text, value numeric) limit 50;
...
 Planning Time: 0.051 ms
 Execution Time: 289.820 ms

postgres=# explain (analyze, buffers) select count (a.a), sum (a.a) from unnest (array_fill (1::numeric, array[10000000])) a;
...
 Planning Time: 169.260 ms
 Execution Time: 4759.781 ms

postgres=# explain (analyze, buffers) select * from unnest (array_fill (1::numeric, array[10000000])) limit 50;
...
 Planning Time: 163.374 ms
 Execution Time: 0.051 ms
denty.



pipeline-functionscan.patch (37K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: The flinfo->fn_extra question, from me this time.

denty
Hi,

Turns out — to my embarrassment — that pretty much all of the regression tests failed with my patch. No idea if anyone spotted that and withheld reply in revenge, but I wouldn’t blame if you did!

I have spent a bit more time on it. The attached patch is a much better show, though there are still a few regressions and undoubtedly it’s still rough.

(Attached patch is against 12.0.)

As was perhaps predictable, some of the regression tests do indeed break in the case of rescan. To cite the specific class of fail, it’s this:

SELECT * FROM (VALUES (1),(2),(3)) v(r), ROWS FROM( rngfunc_sql(11,11), rngfunc_mat(10+r,13) );
  r | i  | s | i  | s 
 ---+----+---+----+—
  1 | 11 | 1 | 11 | 1
  1 |    |   | 12 | 2
  1 |    |   | 13 | 3
- 2 | 11 | 1 | 12 | 4
+ 2 | 11 | 2 | 12 | 4
  2 |    |   | 13 | 5
- 3 | 11 | 1 | 13 | 6
+ 3 | 11 | 3 | 13 | 6
 (6 rows)

The reason for the change is that ’s' comes from rngfunc_mat(), which computes s as nextval(). The patch currently prefers to re-execute the function in place of materialising it into a tuplestore.

Tom suggested not dropping the tuplestore creation logic. I can’t fathom a way of avoiding change for folk that have gotten used to the current behaviour without doing that. So I’m tempted to pipeline the rows back from a function (if it returns ValuePerCall), and also record it in a tuplestore, just in case rescan happens. There’s still wastage in this approach, but it would save the current behaviour, while stil enabling the early abort of ValuePerCall SRFs at relatively low cost, which is certainly one of my goals.

I’d welcome opinion on whether there are downsides the that approach, as I might move to integrate that next.

But I would also like to kick around ideas for how to avoid entirely the tuplestore.

Earlier, I suggested that we might make the decision logic prefer to materialise a tuplestore for VOLATILE functions, and prefer to pipeline directly from STABLE (and IMMUTABLE) functions. The docs on volatility categories describe that the optimiser will evaluate a VOLATILE function for every row it is needed, whereas it might cache STABLE and IMMUTABLE with greater aggression. It’s basically the polar opposite of what I want to achieve.

It is arguably also in conflict with current behaviour. I think we should make the docs clearer about that.

So, on second thoughts, I don’t think overloading the meaning of STABLE, et al., is the right thing to do. I wonder if we could invent a new modifier to CREATE FUNCTION, perhaps “PIPELINED”, which would simply declare a function's ability and preference for ValuePerCall mode.

Or perhaps modify the ROWS FROM extension, and adopt WITH’s [ NOT ] MATERIALIZED clause. For example, the following would achieve the above proposed behaviour:

ROWS FROM( rngfunc_sql(11,11) MATERIALIZED, rngfunc_mat(10+r,13) MATERIALIZED ) 

Of course, NOT MATERIALIZED would achieve ValuePerCall mode, and omit materialisation. I guess MATERIALIZED would have to be the default.

I wonder if another alternative would be to decide materialization based on what the outer plan includes. I guess we can tell if we’re part of a join, or if the plan requires the ability to scan backwards. Could that work?

denty.
Reply | Threaded
Open this post in threaded view
|

Re: The flinfo->fn_extra question, from me this time.

denty
(And here’s aforementioned attachment… doh.)




pipeline-functionscan-v2.patch (42K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: The flinfo->fn_extra question, from me this time.

Pavel Stehule
Hi

ne 3. 11. 2019 v 12:51 odesílatel Dent John <[hidden email]> napsal:
(And here’s aforementioned attachment… doh.)

can be nice, if patch has some regress tests - it is good for memory refreshing what is target of patch.

Regards

Pavel
Reply | Threaded
Open this post in threaded view
|

Re: The flinfo->fn_extra question, from me this time.

denty
> On 3 Nov 2019, at 13:33, Pavel Stehule <[hidden email]> wrote:
>
> can be nice, if patch has some regress tests - it is good for memory refreshing what is target of patch.

With a suitably small work_mem constraint, it is possible to show the absence of buffers resulting from the tuplestore. It’ll need some commentary explaining what is being looked for, and why. But it’s a good idea.

I’ll take a look.

denty.

Reply | Threaded
Open this post in threaded view
|

Re: The flinfo->fn_extra question, from me this time.

denty
>> On 3 Nov 2019, at 13:33, Pavel Stehule <[hidden email]> wrote:
>>
>> can be nice, if patch has some regress tests - it is good for memory refreshing what is target of patch.

I’ve updated the patch, and added some regression tests.

denty.


pipeline-functionscan-v3.patch (50K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: The flinfo->fn_extra question, from me this time.

denty
Hi folks,

I’ve updated the patch, addressed the rescan issue, and restructured the tests.

I’ve taken a slightly different approach this time, re-using the (already pipeline-supporting) machinery of the Materialize node, and extended it to allow an SFRM_Materialize SRF to donate the tuplestore it returns. I feel this yields a better code structure, as well getting as more reuse.

It also opens up more informative and transparent EXPLAIN output. For example, the following shows Materialize explicitly, whereas previously a FunctionScan would have silently materialised the result of both generate_series() invocations.

postgres=# explain (analyze, costs off, timing off, summary off) 
select * from generate_series(11,15) r, generate_series(11,14) s;
                            QUERY PLAN                            
------------------------------------------------------------------
 Nested Loop (actual rows=20 loops=1)
   ->  Function Scan on generate_series s (actual rows=4 loops=1)
         ->  SRF Scan (actual rows=4 loops=1)
               SFRM: ValuePerCall
   ->  Function Scan on generate_series r (actual rows=5 loops=4)
         ->  Materialize (actual rows=5 loops=4)
               ->  SRF Scan (actual rows=5 loops=1)
                     SFRM: ValuePerCall

I also thought again about when to materialise, and particularly Robert’s suggestion[1] (which is in also this thread, but I didn’t originally understand the implication of). If I’m not wrong, between occasional explicit use of a Materialize node by the planner, and more careful observation of EXEC_FLAG_REWIND and EXEC_FLAG_BACKWARD in FunctionScan’s initialisation, we do actually have what is needed to pipeline without materialisation in at least some cases. There is not a mechanism to preferentially re-execute a SRF rather than materialise it, but because materialisation only seems to be necessary in the face of a join or a scrollable cursor, I’m not considering much of a problem anymore.

The EXPLAIN output needs a bit of work, costing is still a sore point, and it’s not quite as straight-line performant as my first attempt, as well as there undoubtedly being some unanticipated breakages and rough edges.

But the concept seems to work roughly as I intended (i.e., allowing FunctionScan to pipeline). Unless there are any objections, I will push it into the January commit fest for progressing.

(Revised patch attached.)

denty.




pipeline-functionscan-v4.patch (126K) Download Attachment