[PATCH] Generic type subscripting

classic Classic list List threaded Threaded
129 messages Options
1234567
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Generic type subscripting

a.zakirov
On Sat, Nov 11, 2017 at 04:34:31PM +0100, Dmitry Dolgov wrote:
> > >
> > > Here isAssignment is unused variable, so it could be removed.
> >
> > In this case I disagree - the purpose of these examples is to show
> everything
> > you can use. So I just need to come up with some example that involves
> > `isAssignment`.
>
> I've incorporated this variable into the tutorial.

Great. I think users will know how to use isAssignment now.

> > I have a little complain about how ExprEvalStep gets resvalue. resvalue is
> > assigned in one place (within ExecEvalSubscriptingRefFetch(),
> > ExecEvalSubscriptingRefAssign()), resnull is assigned in another place
> > (within jsonb_subscript_fetch(), jsonb_subscript_assign()).
>
> Hm...I'm afraid I don't get this. `resnull` is never assigned inside
> `jsonb_subscript_fetch` or `jsonb_subscript_assign`, instead it's coming
> from `ExecInterpExp` as `isnull` if I remember correctly. Are we talking
> about
> the same thing?

No, I meant ExprEvalStep struct. For example, within ExecEvalSubscriptingRefFetch() you assign *op->resvalue but *op->resnull is leaved as unchanged:

> ExecEvalSubscriptingRefFetch(ExprState *state, ExprEvalStep *op)
> ...
> *op->resvalue = FunctionCall2(op->d.sbsref.eval_finfo,
>  PointerGetDatum(*op->resvalue),
>  PointerGetDatum(op));

For jsonb *op->resnull is changed within jsonb_subscript_fetch() for arrays within array_subscript_fetch() (which are called by ExecEvalSubscriptingRefFetch()):

> return jsonb_get_element(DatumGetJsonbP(containerSource),
> sbstate->upper,
> sbstate->numupper,
> step->resnull, /* step->resnull is changed within jsonb_get_element() */
> false);

It is not critical, but it may be good to change them in one place.

>
> In this version of the patch I also improved NULL handling, you can see it
> in
> the tests.

New tests passed.


--
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Generic type subscripting

Dmitry Dolgov
> On 13 November 2017 at 14:11, Arthur Zakirov <[hidden email]> wrote:
> > I have a little complain about how ExprEvalStep gets resvalue. resvalue is
> > assigned in one place (within ExecEvalSubscriptingRefFetch(),
> > ExecEvalSubscriptingRefAssign()), resnull is assigned in another place
> > (within jsonb_subscript_fetch(), jsonb_subscript_assign()).
>
> Hm...I'm afraid I don't get this. `resnull` is never assigned inside
> `jsonb_subscript_fetch` or `jsonb_subscript_assign`, instead it's coming
> from `ExecInterpExp` as `isnull` if I remember correctly. Are we talking
> about
> the same thing?
>
> No, I meant ExprEvalStep struct. For example, within
> ExecEvalSubscriptingRefFetch() you assign op->resvalue but op->resnull is
> leaved as unchanged

Oh, I see now, thanks for the explanation. Actually, it's how it was
implemented before in array subscripting, and I tried to be consistent with
this implementation. But now I wonder if `resnull` is really needed in
`jsonb_get_element`, `array_get_element` and it seems to me that I can even get
rid of it so, that everything would be assigned in `jsonb_subscript_fetch`,
`array_subscript_fetch` - I'll send a new version of the patch soon.
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Generic type subscripting

Dmitry Dolgov
> On 14 November 2017 at 22:25, Dmitry Dolgov <[hidden email]> wrote:
> But now I wonder if `resnull` is really needed in `jsonb_get_element`,
> `array_get_element` and it seems to me that I can even get rid of it so

On the second thought, no, looks like I'm wrong and it should be like this. The
reason is that any `fetch` function should be in form

    (container, internal) -> extracted value

which means that we need to return an extracted value (for jsonb it's a `jsonb`,
for array it's an `anyelement`). But at the same time in general case we can
figure out if the result is null only inside a `fetch` function,
(`jsonb_get_element` for jsonb or whatever it may be for a custom data type)
because it returns Datum. So the only way to return this information is by
reference through the `internal` argument. To summarize, If as you said it's
not that critical, I would suggest to leave it as it is.
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Generic type subscripting

a.zakirov
On Wed, Nov 15, 2017 at 11:02:59PM +0100, Dmitry Dolgov wrote:

> On the second thought, no, looks like I'm wrong and it should be like this.
> The
> reason is that any `fetch` function should be in form
>
>     (container, internal) -> extracted value
>
> which means that we need to return an extracted value (for jsonb it's a
> `jsonb`,
> for array it's an `anyelement`). But at the same time in general case we can
> figure out if the result is null only inside a `fetch` function,
> (`jsonb_get_element` for jsonb or whatever it may be for a custom data type)
> because it returns Datum. So the only way to return this information is by
> reference through the `internal` argument. To summarize, If as you said it's
> not that critical, I would suggest to leave it as it is.

Actually it is not only way to return isnull information. You can also return it using pointer to a boolean argument.
*fetch() functions also doesn't need in ExprEvalStep struct, you can pass SubscriptingRefState struct instead.

I mean the following code:

ExecEvalSubscriptingRefFetch(ExprState *state, ExprEvalStep *op)
{
    ...
    *op->resvalue = FunctionCall2(op->d.sbsref.eval_finfo,
                                  PointerGetDatum(*op->resvalue),
                                  PointerGetDatum(op->d.sbsref.state),
                                  PointerGetDatum(op->resnull));
}

Datum
jsonb_subscript_fetch(PG_FUNCTION_ARGS)
{
    Datum containerSource = PG_GETARG_DATUM(0);
    SubscriptingRefState *state = (SubscriptingRefState *) PG_GETARG_POINTER(1);
    bool *isNull = (bool *) PG_GETARG_POINTER(2);

    return jsonb_get_element(DatumGetJsonbP(containerSource),
                             state->upper,
                             state->numupper,
                             isNull,
                             false);
}

> To summarize, If as you said it's
> not that critical, I would suggest to leave it as it is.

Yes, I just wanted to share an opinion how to improve the code. I thought that the current approach may confuse programmers, who will implement subscribting.

Also you can see extractValue() function of GIN [1]. It returns if values is null in same way.

1 - https://www.postgresql.org/docs/current/static/gin-extensibility.html

--
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Generic type subscripting

Dmitry Dolgov
> On 16 November 2017 at 12:40, Arthur Zakirov <[hidden email]> wrote:
>
> Actually it is not only way to return isnull information.

What i've meant is that it's the only way if we want to keep the function
signature. Actually I would prefer this

    (container, internal) -> extracted value

over this (I assume that's exactly what you've suggested?)

    (container, internal, internal) -> extracted value

because it makes the purpose of the function more clear (especially for custom
data types). Also it would be consistent with `assign` functions (since
`isnull` is not assigned there). But I see your point, a separate argument for
`isnull` will make it more straightforward in terms of null handling.

> fetch() functions also doesn't need in ExprEvalStep struct

I had hard time parsing this, but from your examples I assume you're talking
about passing little bit different arguments to `fetch` function (am I right?).
Just from functionality point of view I don't see a big difference in what
argument to use to return `isnull` by reference. So at the end I think we need
to choose between having one or two `internal` arguments for `fetch` functions.
Any other opinions?
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Generic type subscripting

a.zakirov
On Thu, Nov 16, 2017 at 10:37:40PM +0100, Dmitry Dolgov wrote:
> I had hard time parsing this, but from your examples I assume you're talking
> about passing little bit different arguments to `fetch` function (am I
> right?).

Yes, I meant to pass the following arguments:

Datum source, SubscriptingRefState *state, bool *isNull


I think it is time to mark the patch as "Ready for Commiter". I've
marked it.

--
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Generic type subscripting

Dmitry Dolgov
> On 19 November 2017 at 16:13, Arthur Zakirov <[hidden email]> wrote:
>
> I think it is time to mark the patch as "Ready for Commiter". I've
> marked it.

Good, thank you for the comprehensive review.
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Generic type subscripting

Michael Paquier
On Mon, Nov 20, 2017 at 7:47 PM, Dmitry Dolgov <[hidden email]> wrote:
>> On 19 November 2017 at 16:13, Arthur Zakirov <[hidden email]>
>> wrote:
>>
>> I think it is time to mark the patch as "Ready for Commiter". I've
>> marked it.
>
> Good, thank you for the comprehensive review.

Documentation in patch 4 has conflicts. Please rebase. I am moving
this patch to next CF with "waiting on author".
--
Michael

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Generic type subscripting

Dmitry Dolgov
> On 1 December 2017 at 06:34, Michael Paquier <[hidden email]> wrote:
>
> Documentation in patch 4 has conflicts. Please rebase. I am moving
> this patch to next CF with "waiting on author".

Thanks for noticing. Here is the rebased version (the conflict itself was quite
trivial, but I also cleaned up functions signature a bit).

0001-Base-implementation-of-subscripting-mechanism-v2.patch (230K) Download Attachment
0002-Subscripting-for-array-v2.patch (17K) Download Attachment
0003-Subscripting-for-jsonb-v2.patch (44K) Download Attachment
0004-Subscripting-documentation-v2.patch (29K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Generic type subscripting

Dmitry Dolgov
> On 4 December 2017 at 01:26, Dmitry Dolgov <[hidden email]> wrote:
>
> Thanks for noticing. Here is the rebased version (the conflict itself was quite
> trivial, but I also cleaned up functions signature a bit).

Another rebased version of this patch.

0004-Subscripting-documentation-v3.patch (29K) Download Attachment
0003-Subscripting-for-jsonb-v3.patch (44K) Download Attachment
0002-Subscripting-for-arrays-v3.patch (17K) Download Attachment
0001-Base-implementation-of-subscripting-mechanism-v3.patch (229K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Generic type subscripting

Tom Lane-2
Dmitry Dolgov <[hidden email]> writes:
> Another rebased version of this patch.

Apologies for not having paid attention to this patch for so long.
Coming back to it now, I wonder what happened to the plan to separate
assignment and fetch into two different node types.  I can see that
that didn't happen so far as primnodes.h is concerned, but you've
made some changes that seem to assume it did happen, eg this bit
in clauses.c:

@@ -1345,12 +1345,10 @@ contain_nonstrict_functions_walker(Node *node, void *context)
  /* a window function could return non-null with null input */
  return true;
  }
- if (IsA(node, ArrayRef))
+ if (IsA(node, SubscriptingRef))
  {
  /* array assignment is nonstrict, but subscripting is strict */
- if (((ArrayRef *) node)->refassgnexpr != NULL)
- return true;
- /* else fall through to check args */
+ return true;
  }
  if (IsA(node, DistinctExpr))
  {

Treating the two cases alike here is just wrong.

Also, the reason I was looking at clauses.c was I realized that
my recent commit 3decd150a broke this patch, because it introduced
understanding of ArrayRef into eval_const_expressions().  I think
that you can probably just do s/ArrayRef/SubscriptingRef/ there,
but it might deserve a closer look than I've given it.

I'm not terribly happy with the cosmetics of this patch at the moment.
There are too many places where it's achingly obvious that you did
s/ArrayRef/SubscriptingRef/g and nothing else, leaving code that does not
pass the test of "does it look like it was written like that to begin
with".  There are a lot of variables still named "aref" or "arefstate"
or similar when that's no longer an apropos name; there are a lot of
sentences reading "an SubscriptingRef" which is bad English; there are a
lot of comments whose layout is not going to be too hot after pgindent
because "SubscriptingRef" is so much longer than "ArrayRef".  (I'm tempted
to suggest that we call the node type just "Subscript", to buy back some
of that.)  I'm not necessarily putting it on you to fix that stuff --- it
might be easier for a native English speaker --- but I am saying that if
I commit this, there are going to be a lot of differences in detail from
what's here now.

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Generic type subscripting

Dmitry Dolgov
> On 4 January 2018 at 03:05, Tom Lane <[hidden email]> wrote:
>
> I wonder what happened to the plan to separate assignment and fetch into two
> different node types. I can see that that didn't happen so far as primnodes.h
> is concerned, but you've made some changes that seem to assume it did happen.

There was one version of this patch that followed this plan. It turns out that
it's quite unnatural approach (at least within the current implementation),
because I had to duplicate or reuse a lot of code for those two node types.
Eventually we decided to play it back. Unfortunately, I did it somehow sloppy
and forgot about those simple cases, thank you for noticing!

> I'm not terribly happy with the cosmetics of this patch at the moment.
> There are too many places where it's achingly obvious that you did
> s/ArrayRef/SubscriptingRef/g and nothing else, leaving code that does not
> pass the test of "does it look like it was written like that to begin
> with".

Yes, I paid not enough attention to these small details. I cleaned this up to
make it easier for a native speaker. Here is a new rebased version of the patch
with incorporated improvements that you've mentioned.

0001-Base-implementation-of-subscripting-mechanism-v4.patch (238K) Download Attachment
0002-Subscripting-for-array-v4.patch (17K) Download Attachment
0003-Subscripting-for-jsonb-v4.patch (44K) Download Attachment
0004-Subscripting-documentation-v4.patch (29K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Generic type subscripting

Tom Lane-2
Dmitry Dolgov <[hidden email]> writes:
>> On 4 January 2018 at 03:05, Tom Lane <[hidden email]> wrote:
>> I wonder what happened to the plan to separate assignment and fetch into two
>> different node types. I can see that that didn't happen so far as primnodes.h
>> is concerned, but you've made some changes that seem to assume it did happen.

> There was one version of this patch that followed this plan. It turns out that
> it's quite unnatural approach (at least within the current implementation),
> because I had to duplicate or reuse a lot of code for those two node types.

I'm not entirely convinced by that argument.  Sure, there would be a lot of
duplicative code in the node support functions, but that's inherent in our
approach to those functions.  I'm more concerned about readability,
especially given that exposing this feature to extensions is going to set
all these decisions in concrete forever.

> Here is a new rebased version of the patch
> with incorporated improvements that you've mentioned.

I spent a couple of hours looking through this.  I'm still pretty unhappy
with the state of the parser APIs.  In the first place, you haven't
documented what those APIs are in any meaningful way.  I do not think it's
adequate to tell people to go read array_subscript_parse as the first and
only way to understand what a subscript parse function must do.  We do
often expect extension authors to pick up small details that way, but
there should be a textual API spec of some sort --- for example, look at
the index AM API specs in indexam.sgml, which give pretty clear high-level
definitions of what each AM API function has to do.

Part of the reason why I'm insistent on that is that I think it will
expose that the division of labor between the core parser and the
datatype-specific parse function is still a mess.  One particular sore
spot is the question of who decides what the return data type of a
subscripting function is.  Right now you seem to be making that decision
in the core parser, at least for the assignment case, which is pretty
bad from an extensibility standpoint and also leads to all of this
messiness:

* You changed transformArrayType so that it doesn't throw an error if
the given type isn't an array --- without, I note, updating either the
function header comment or the internal comment that you falsified.

* That in turn means that where transformAssignmentSubscripts thinks
it's determining the result type, it may or may not be producing
InvalidOid instead (again, with no comment to warn the reader).

* And then you had to kludge transformAssignmentIndirection horribly
(and I'm not sure you kludged it enough, either, because there are
still a bunch of places where it uses targetTypeId without any concern
for the possibility that that's zero).  It doesn't seem to me to be
acceptable to just ignore coercion failure, as that code does now.
If it's no longer the responsibility of this function to guarantee
that the result is of the right type, why is it attempting coercion
at all?  In any case you failed to update its header comment to
explain what it's doing differently than before.

In short the division of labor in this area still needs to be thought
about.  I don't think we really want transformAssignmentSubscripts
determining the required input data type at all; that should be
farmed out to the datatype-specific code.  I'm also pretty unconvinced
about refnestedfunc --- why do we need that?

I also notice that you still haven't done anything about the problem
of the subscripting operation possibly yielding a different typmod
or collation than the container type has.  It was okay to let that
go for awhile, but we're not shipping it like this, because it's
going to be awfully hard to change that struct type once extensions
are building them.

While I'm on the topic, I am not really happy with s/array/container/
as you've done in some of this code.  To my mind, "container type"
includes composite types.  Particularly in the parse_target code, where
we're currently dealing with either composites or arrays, making it say
that we're dealing with either composites or containers is just a recipe
for confusion.  Unfortunately I can't think of a better word offhand,
but some attention to this is needed.  As far as the comments go,
we might be able to use the term "subscriptable type", but not sure if
that will work for function/variable names.

On the executor side of things, I suspect Andres will be unhappy that
you are making ExprEvalStep part of the API for datatypes --- he
objected to my exposing it to plpgsql param eval in
https://postgr.es/m/20171220174243.n4y3hgzf7xd3mm5e@...
and there was a lot more reason to do so there than there is here, IMO.
It looks like what you actually need is just the SubscriptingRefState and
an isnull flag, so it might be better to specify that the fetch and assign
functions have signatures like
        Datum fetch(Datum val, SubscriptingRefState *state, bool *isnull)
(representing both of the last two arguments as INTERNAL at SQL level).

Now on the other hand, maybe the right way to go is to embrace a similar
approach to what I did for plpgsql param eval, and let the datatype
control what gets generated as the expression execution step.  The main
point here would be to let the datatype provide the address of a callback
function that gets executed for a subscripting step, rather than having it
specify the OID of a pg_proc entry to call.  There would be two big wins
from that:

* The callback function would have a plain C call signature, so we would
not have to go through FunctionCallN, saving a few cycles.  This is
attractive because it would pretty much eliminate any concern about this
patch making array access slower at execution time.

* There would no longer be a wired-in restriction that there be two and
only two subscripting execution functions per datatype, since there would
not be any need for those functions to be identified in pg_type.

Basically, with this approach, a subscriptable data type would need to
provide two cataloged support functions: parse, as we have now, and
compile.  Actual execution functions would be outside that.  (Possibly
we could merge the support functions into one function that takes an
operation code, similar to one of your earlier designs.  Not sure that
that's better, but it'd be easier to extend in future if we decide we
need three support operations...)

The two disadvantages I can see of approaching things this way are:

* There'd be at least some connection of subscriptable types to
expression compilation, which is what Andres was objecting to in the
message I cited above.  Possibly we could alleviate that a bit by
providing helper functions that mask exactly what goes into the
expression step structs, but I'm not sure that that gets us far.

* We'd not have OIDs of execution functions in the parse trees for
subscripting operations, which would mean that we would not have
a convenient way to identify subscripting operations that are
mutable, parallel-unsafe, or leaky.  Probably it'd be fine to assume
that subscripting is always immutable and parallel-safe, although
I'm slightly more concerned about whether people would want the
option to label it leaky vs leakproof.  As against that, the approach
that's there right now adds planning overhead that wasn't there before
for exactly those function property lookups, and again I'm a bit worried
about the performance impact.  (I did some crude performance tests
today that indicated that the existing patch has small but measurable
penalties, maybe on the order of 10 percent; and it'd be more by the
time we're done because I'm pretty sure you've missed some places that
ought to check these function properties if we're going to have them.
So I'm afraid that we'll get pushback from people who don't care about
extensible subscripts and do care about array performance.)

So roughly speaking, I'm imagining that we'd go back to a design similar
to one I recall you had at one point, where there's a single SQL-visible
subscripting support function per datatype, with a signature like

        subscript_support(int opcode, internal other_info) returns internal

but the opcodes would now be "parse" and "compile".  Actual execution
would use callback functions that don't have to be SQL-visible.  This is
closer to the approach we've been using of late for things like AM APIs:
to the extent possible, there's just one SQL-registered handler function
and all else is a callback.  Actually, we could make it *exactly* like
that, and have the registered handler give back a struct full of function
pointers rather than doing anything much itself.  Maybe that's an even
better way to go.

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Generic type subscripting

Andres Freund
Hi,

Tom pointed me towards this thread. I've not followed the topic, so
I might miss a bit of context while commenting on expression eval
related bits...

On 2018-01-07 17:39:00 -0500, Tom Lane wrote:
> On the executor side of things, I suspect Andres will be unhappy that
> you are making ExprEvalStep part of the API for datatypes --- he
> objected to my exposing it to plpgsql param eval in
> https://postgr.es/m/20171220174243.n4y3hgzf7xd3mm5e@...
> and there was a lot more reason to do so there than there is here,
> IMO.

Indeed.


> It looks like what you actually need is just the SubscriptingRefState and
> an isnull flag, so it might be better to specify that the fetch and assign
> functions have signatures like
> Datum fetch(Datum val, SubscriptingRefState *state, bool *isnull)
> (representing both of the last two arguments as INTERNAL at SQL level).

That'd definitely be better.


> Now on the other hand, maybe the right way to go is to embrace a similar
> approach to what I did for plpgsql param eval, and let the datatype
> control what gets generated as the expression execution step.  The main
> point here would be to let the datatype provide the address of a callback
> function that gets executed for a subscripting step, rather than having it
> specify the OID of a pg_proc entry to call.  There would be two big wins
> from that:
>
> * The callback function would have a plain C call signature, so we would
> not have to go through FunctionCallN, saving a few cycles.  This is
> attractive because it would pretty much eliminate any concern about this
> patch making array access slower at execution time.

I'll note that I'm not convinced that the goal this paragraph states and
having the datatype control the entire expression step are full
dependent on each other. It seems quite possible to have
ExecInitSubscriptingRef() call a datatype specific function that returns
C callbacks.

> The two disadvantages I can see of approaching things this way are:
>
> * There'd be at least some connection of subscriptable types to
> expression compilation, which is what Andres was objecting to in the
> message I cited above.  Possibly we could alleviate that a bit by
> providing helper functions that mask exactly what goes into the
> expression step structs, but I'm not sure that that gets us far.

Yea, I'm not the greatest fan of that. With plpgsql it's at least
something in-core that's exposed, but I suspect the subscripotion
interface will get used outside of core, and I really want to whack some
of the expression stuff around some more.


> Actually, we could make it *exactly* like that, and have the
> registered handler give back a struct full of function pointers rather
> than doing anything much itself.  Maybe that's an even better way to
> go.

I'd definitely advocate for that.

Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Generic type subscripting

Tom Lane-2
Andres Freund <[hidden email]> writes:
> On 2018-01-07 17:39:00 -0500, Tom Lane wrote:
>> Now on the other hand, maybe the right way to go is to embrace a similar
>> approach to what I did for plpgsql param eval, and let the datatype
>> control what gets generated as the expression execution step.

> I'll note that I'm not convinced that the goal this paragraph states and
> having the datatype control the entire expression step are full
> dependent on each other. It seems quite possible to have
> ExecInitSubscriptingRef() call a datatype specific function that returns
> C callbacks.

Yeah, that's a good point.  We could define the compile support function
as simply being allowed to examine the expression tree and give back
the address of the callback function to use, with the rest of the compiled
expression structure being predetermined.  The more general approach would
only be useful if you imagine some sort of high-performance extension that
wants to compile specialized expr steps for its subscripting activity.
Probably the need for that is pretty far away yet.

BTW, if you wanted to change the way plpgsql param callbacks work to be
like this design (ie push the actual generation of the ExprStep back into
core, with plpgsql just supplying a callback address), I wouldn't object.

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Generic type subscripting

Dmitry Dolgov
> On 7 January 2018 at 23:39, Tom Lane <[hidden email]> wrote:
>> Dmitry Dolgov <[hidden email]> writes:
>>> On 4 January 2018 at 03:05, Tom Lane <[hidden email]> wrote:
>>> I wonder what happened to the plan to separate assignment and fetch into two
>>> different node types. I can see that that didn't happen so far as primnodes.h
>>> is concerned, but you've made some changes that seem to assume it did happen.
>
>> There was one version of this patch that followed this plan. It turns out that
>> it's quite unnatural approach (at least within the current implementation),
>> because I had to duplicate or reuse a lot of code for those two node types.
>
> I'm not entirely convinced by that argument.  Sure, there would be a lot of
> duplicative code in the node support functions, but that's inherent in our
> approach to those functions.  I'm more concerned about readability,
> especially given that exposing this feature to extensions is going to set
> all these decisions in concrete forever.

That version I'm talking about actually have got not that much readability from
this separation (at least from my perspective). For extenions it maybe more
justified, but at the same time `isAssignment` flag the only thing you need to
check to figure out whether it's a fetch or assignment operation - it doesn't
sound as something significantly worse for readability. Or do you mean
something else?

Anyway, it's quite possible that I just failed to implement a proper approach
to implement this. I'm going to keep pondering this in case I'll come up with
something better - it should be more or less straightforward to change this
logic at any time.

> * You changed transformArrayType so that it doesn't throw an error if
> the given type isn't an array --- without, I note, updating either the
> function header comment or the internal comment that you falsified.
>
> * That in turn means that where transformAssignmentSubscripts thinks
> it's determining the result type, it may or may not be producing
> InvalidOid instead (again, with no comment to warn the reader).
>
> * And then you had to kludge transformAssignmentIndirection horribly
> (and I'm not sure you kludged it enough, either, because there are
> still a bunch of places where it uses targetTypeId without any concern
> for the possibility that that's zero).  It doesn't seem to me to be
> acceptable to just ignore coercion failure, as that code does now.
> If it's no longer the responsibility of this function to guarantee
> that the result is of the right type, why is it attempting coercion
> at all?  In any case you failed to update its header comment to
> explain what it's doing differently than before.
>
> In short the division of labor in this area still needs to be thought
> about.  I don't think we really want transformAssignmentSubscripts
> determining the required input data type at all; that should be
> farmed out to the datatype-specific code.  I'm also pretty unconvinced
> about refnestedfunc --- why do we need that?

Yes, I see, there is a room for improvements. I'm going to post a new version
soon, where I'll try to address this. About `refnestedfunc` - I added it as
part of my modification for functions caching, but looks like now this function
is useless.

> While I'm on the topic, I am not really happy with s/array/container/
> as you've done in some of this code.  To my mind, "container type"
> includes composite types.  Particularly in the parse_target code, where
> we're currently dealing with either composites or arrays, making it say
> that we're dealing with either composites or containers is just a recipe
> for confusion.  Unfortunately I can't think of a better word offhand,
> but some attention to this is needed.  As far as the comments go,
> we might be able to use the term "subscriptable type", but not sure if
> that will work for function/variable names.

Is there a plausible use case when "container type" can be something else than
a "composite type"? Maybe it's ok just to say that "container type" is equal to
"composite type"?

> On the executor side of things, I suspect Andres will be unhappy that
> you are making ExprEvalStep part of the API for datatypes --- he
> objected to my exposing it to plpgsql param eval in
> https://postgr.es/m/20171220174243.n4y3hgzf7xd3mm5e@...
> and there was a lot more reason to do so there than there is here, IMO.
> It looks like what you actually need is just the SubscriptingRefState and
> an isnull flag, so it might be better to specify that the fetch and assign
> functions have signatures like
>         Datum fetch(Datum val, SubscriptingRefState *state, bool *isnull)
> (representing both of the last two arguments as INTERNAL at SQL level).

Yes, I agree.

> Now on the other hand, maybe the right way to go is to embrace a similar
> approach to what I did for plpgsql param eval, and let the datatype
> control what gets generated as the expression execution step.  The main
> point here would be to let the datatype provide the address of a callback
> function that gets executed for a subscripting step, rather than having it
> specify the OID of a pg_proc entry to call.

I had one implementation that resembles this approach. But as far as I see
(please correct me if something in the following chain is wrong) it's necessary
to store pointers to these callback functions in a `SubscriptingRef` node,
which means that they would be missing in functions from
`readfuncs`/`outfuncs`/`copyfuncs`. Is there any situation when this can lead
to undesirable consequences?

> * We'd not have OIDs of execution functions in the parse trees for
> subscripting operations, which would mean that we would not have
> a convenient way to identify subscripting operations that are
> mutable, parallel-unsafe, or leaky.  Probably it'd be fine to assume
> that subscripting is always immutable and parallel-safe, although
> I'm slightly more concerned about whether people would want the
> option to label it leaky vs leakproof.  As against that, the approach
> that's there right now adds planning overhead that wasn't there before
> for exactly those function property lookups, and again I'm a bit worried
> about the performance impact.  (I did some crude performance tests
> today that indicated that the existing patch has small but measurable
> penalties, maybe on the order of 10 percent; and it'd be more by the
> time we're done because I'm pretty sure you've missed some places that
> ought to check these function properties if we're going to have them.
> So I'm afraid that we'll get pushback from people who don't care about
> extensible subscripts and do care about array performance.)

We also wouldn't have a convenient way to specify costs of subscripting
operations. I had in mind a situation, when someone wants to have a quite heavy
subscripting function, which should affect planner's decisions. It maybe a rare
case though.

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Generic type subscripting

Andres Freund
In reply to this post by Dmitry Dolgov
Hi,

On 2018-01-04 16:47:50 +0100, Dmitry Dolgov wrote:

> diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
> index 16f908037c..ee50fda4ef 100644
> --- a/src/backend/executor/execExpr.c
> +++ b/src/backend/executor/execExpr.c
> @@ -64,7 +64,8 @@ static void ExecInitExprSlots(ExprState *state, Node *node);
>  static bool get_last_attnums_walker(Node *node, LastAttnumInfo *info);
>  static void ExecInitWholeRowVar(ExprEvalStep *scratch, Var *variable,
>   ExprState *state);
> -static void ExecInitArrayRef(ExprEvalStep *scratch, ArrayRef *aref,
> +static void ExecInitSubscriptingRef(ExprEvalStep *scratch,
> + SubscriptingRef *sbsref,
>   ExprState *state,
>   Datum *resv, bool *resnull);
>  static bool isAssignmentIndirectionExpr(Expr *expr);
> @@ -857,11 +858,11 @@ ExecInitExprRec(Expr *node, ExprState *state,
>   break;
>   }
>  
> - case T_ArrayRef:
> + case T_SubscriptingRef:
>   {
> - ArrayRef   *aref = (ArrayRef *) node;
> + SubscriptingRef   *sbsref = (SubscriptingRef *) node;
>  
> - ExecInitArrayRef(&scratch, aref, state, resv, resnull);
> + ExecInitSubscriptingRef(&scratch, sbsref, state, resv, resnull);
>   break;
>   }
>  
> @@ -1176,7 +1177,7 @@ ExecInitExprRec(Expr *node, ExprState *state,
>   /*
>   * Use the CaseTestExpr mechanism to pass down the old
>   * value of the field being replaced; this is needed in
> - * case the newval is itself a FieldStore or ArrayRef that
> + * case the newval is itself a FieldStore or SubscriptingRef that
>   * has to obtain and modify the old value.  It's safe to
>   * reuse the CASE mechanism because there cannot be a CASE
>   * between here and where the value would be needed, and a
> @@ -2401,34 +2402,40 @@ ExecInitWholeRowVar(ExprEvalStep *scratch, Var *variable, ExprState *state)
>  }
>  
>  /*
> - * Prepare evaluation of an ArrayRef expression.
> + * Prepare evaluation of a SubscriptingRef expression.
>   */
>  static void
> -ExecInitArrayRef(ExprEvalStep *scratch, ArrayRef *aref,
> +ExecInitSubscriptingRef(ExprEvalStep *scratch, SubscriptingRef *sbsref,
>   ExprState *state, Datum *resv, bool *resnull)
>  {
> - bool isAssignment = (aref->refassgnexpr != NULL);
> - ArrayRefState *arefstate = palloc0(sizeof(ArrayRefState));
> - List   *adjust_jumps = NIL;
> - ListCell   *lc;
> - int i;
> + bool isAssignment = (sbsref->refassgnexpr != NULL);
> + SubscriptingRefState *sbsrefstate = palloc0(sizeof(SubscriptingRefState));
> + List *adjust_jumps = NIL;
> + ListCell   *lc;
> + int    i;
> + FmgrInfo   *eval_finfo, *nested_finfo;
> +
> + eval_finfo = palloc0(sizeof(FmgrInfo));
> + nested_finfo = palloc0(sizeof(FmgrInfo));
> +
> + fmgr_info(sbsref->refevalfunc, eval_finfo);
> + if (OidIsValid(sbsref->refnestedfunc))
> + {
> + fmgr_info(sbsref->refnestedfunc, nested_finfo);
> + }
>  
> - /* Fill constant fields of ArrayRefState */
> - arefstate->isassignment = isAssignment;
> - arefstate->refelemtype = aref->refelemtype;
> - arefstate->refattrlength = get_typlen(aref->refarraytype);
> - get_typlenbyvalalign(aref->refelemtype,
> - &arefstate->refelemlength,
> - &arefstate->refelembyval,
> - &arefstate->refelemalign);
> + /* Fill constant fields of SubscriptingRefState */
> + sbsrefstate->isassignment = isAssignment;
> + sbsrefstate->refelemtype = sbsref->refelemtype;
> + sbsrefstate->refattrlength = get_typlen(sbsref->refcontainertype);
>  
>   /*
>   * Evaluate array input.  It's safe to do so into resv/resnull, because we
>   * won't use that as target for any of the other subexpressions, and it'll
> - * be overwritten by the final EEOP_ARRAYREF_FETCH/ASSIGN step, which is
> + * be overwritten by the final EEOP_SBSREF_FETCH/ASSIGN step, which is
>   * pushed last.
>   */
> - ExecInitExprRec(aref->refexpr, state, resv, resnull);
> + ExecInitExprRec(sbsref->refexpr, state, resv, resnull);
>  
>   /*
>   * If refexpr yields NULL, and it's a fetch, then result is NULL.  We can
> @@ -2440,92 +2447,95 @@ ExecInitArrayRef(ExprEvalStep *scratch, ArrayRef *aref,
>   scratch->opcode = EEOP_JUMP_IF_NULL;
>   scratch->d.jump.jumpdone = -1; /* adjust later */
>   ExprEvalPushStep(state, scratch);
> +
>   adjust_jumps = lappend_int(adjust_jumps,
>     state->steps_len - 1);
>   }
>  
>   /* Verify subscript list lengths are within limit */
> - if (list_length(aref->refupperindexpr) > MAXDIM)
> + if (list_length(sbsref->refupperindexpr) > MAX_SUBSCRIPT_DEPTH)
>   ereport(ERROR,
>   (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
>   errmsg("number of array dimensions (%d) exceeds the maximum allowed (%d)",
> - list_length(aref->refupperindexpr), MAXDIM)));
> + list_length(sbsref->refupperindexpr), MAX_SUBSCRIPT_DEPTH)));
>  
> - if (list_length(aref->reflowerindexpr) > MAXDIM)
> + if (list_length(sbsref->reflowerindexpr) > MAX_SUBSCRIPT_DEPTH)
>   ereport(ERROR,
>   (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
>   errmsg("number of array dimensions (%d) exceeds the maximum allowed (%d)",
> - list_length(aref->reflowerindexpr), MAXDIM)));
> + list_length(sbsref->reflowerindexpr), MAX_SUBSCRIPT_DEPTH)));
>  
>   /* Evaluate upper subscripts */
>   i = 0;
> - foreach(lc, aref->refupperindexpr)
> + foreach(lc, sbsref->refupperindexpr)
>   {
>   Expr   *e = (Expr *) lfirst(lc);
>  
>   /* When slicing, individual subscript bounds can be omitted */
>   if (!e)
>   {
> - arefstate->upperprovided[i] = false;
> + sbsrefstate->upperprovided[i] = false;
>   i++;
>   continue;
>   }
>  
> - arefstate->upperprovided[i] = true;
> + sbsrefstate->upperprovided[i] = true;
>  
>   /* Each subscript is evaluated into subscriptvalue/subscriptnull */
>   ExecInitExprRec(e, state,
> - &arefstate->subscriptvalue, &arefstate->subscriptnull);
> -
> - /* ... and then ARRAYREF_SUBSCRIPT saves it into step's workspace */
> - scratch->opcode = EEOP_ARRAYREF_SUBSCRIPT;
> - scratch->d.arrayref_subscript.state = arefstate;
> - scratch->d.arrayref_subscript.off = i;
> - scratch->d.arrayref_subscript.isupper = true;
> - scratch->d.arrayref_subscript.jumpdone = -1; /* adjust later */
> + &sbsrefstate->subscriptvalue, &sbsrefstate->subscriptnull);
> +
> + /* ... and then SBSREF_SUBSCRIPT saves it into step's workspace */
> + scratch->opcode = EEOP_SBSREF_SUBSCRIPT;
> + scratch->d.sbsref_subscript.state = sbsrefstate;
> + scratch->d.sbsref_subscript.off = i;
> + scratch->d.sbsref_subscript.isupper = true;
> + scratch->d.sbsref_subscript.jumpdone = -1; /* adjust later */
>   ExprEvalPushStep(state, scratch);
> +
>   adjust_jumps = lappend_int(adjust_jumps,
>     state->steps_len - 1);
>   i++;
>   }
> - arefstate->numupper = i;
> + sbsrefstate->numupper = i;
>  
>   /* Evaluate lower subscripts similarly */
>   i = 0;
> - foreach(lc, aref->reflowerindexpr)
> + foreach(lc, sbsref->reflowerindexpr)
>   {
>   Expr   *e = (Expr *) lfirst(lc);
>  
>   /* When slicing, individual subscript bounds can be omitted */
>   if (!e)
>   {
> - arefstate->lowerprovided[i] = false;
> + sbsrefstate->lowerprovided[i] = false;
>   i++;
>   continue;
>   }
>  
> - arefstate->lowerprovided[i] = true;
> + sbsrefstate->lowerprovided[i] = true;
>  
>   /* Each subscript is evaluated into subscriptvalue/subscriptnull */
>   ExecInitExprRec(e, state,
> - &arefstate->subscriptvalue, &arefstate->subscriptnull);
> -
> - /* ... and then ARRAYREF_SUBSCRIPT saves it into step's workspace */
> - scratch->opcode = EEOP_ARRAYREF_SUBSCRIPT;
> - scratch->d.arrayref_subscript.state = arefstate;
> - scratch->d.arrayref_subscript.off = i;
> - scratch->d.arrayref_subscript.isupper = false;
> - scratch->d.arrayref_subscript.jumpdone = -1; /* adjust later */
> + &sbsrefstate->subscriptvalue, &sbsrefstate->subscriptnull);
> +
> + /* ... and then SBSREF_SUBSCRIPT saves it into step's workspace */
> + scratch->opcode = EEOP_SBSREF_SUBSCRIPT;
> + scratch->d.sbsref_subscript.state = sbsrefstate;
> + scratch->d.sbsref_subscript.off = i;
> + scratch->d.sbsref_subscript.isupper = false;
> + scratch->d.sbsref_subscript.jumpdone = -1; /* adjust later */
>   ExprEvalPushStep(state, scratch);
> +
>   adjust_jumps = lappend_int(adjust_jumps,
>     state->steps_len - 1);
>   i++;
>   }
> - arefstate->numlower = i;
> + sbsrefstate->numlower = i;
>  
>   /* Should be impossible if parser is sane, but check anyway: */
> - if (arefstate->numlower != 0 &&
> - arefstate->numupper != arefstate->numlower)
> + if (sbsrefstate->numlower != 0 &&
> + sbsrefstate->numupper != sbsrefstate->numlower)
>   elog(ERROR, "upper and lower index lists are not same length");
>  
>   if (isAssignment)
> @@ -2535,7 +2545,7 @@ ExecInitArrayRef(ExprEvalStep *scratch, ArrayRef *aref,
>  
>   /*
>   * We might have a nested-assignment situation, in which the
> - * refassgnexpr is itself a FieldStore or ArrayRef that needs to
> + * refassgnexpr is itself a FieldStore or SubscriptingRef that needs to
>   * obtain and modify the previous value of the array element or slice
>   * being replaced.  If so, we have to extract that value from the
>   * array and pass it down via the CaseTestExpr mechanism.  It's safe
> @@ -2547,37 +2557,45 @@ ExecInitArrayRef(ExprEvalStep *scratch, ArrayRef *aref,
>   * Since fetching the old element might be a nontrivial expense, do it
>   * only if the argument actually needs it.
>   */
> - if (isAssignmentIndirectionExpr(aref->refassgnexpr))
> + if (isAssignmentIndirectionExpr(sbsref->refassgnexpr))
>   {
> - scratch->opcode = EEOP_ARRAYREF_OLD;
> - scratch->d.arrayref.state = arefstate;
> + scratch->opcode = EEOP_SBSREF_OLD;
> + scratch->d.sbsref.state = sbsrefstate;
> + scratch->d.sbsref.eval_finfo = eval_finfo;
> + scratch->d.sbsref.nested_finfo = nested_finfo;
>   ExprEvalPushStep(state, scratch);
>   }
>  
> - /* ARRAYREF_OLD puts extracted value into prevvalue/prevnull */
> + /* SBSREF_OLD puts extracted value into prevvalue/prevnull */
>   save_innermost_caseval = state->innermost_caseval;
>   save_innermost_casenull = state->innermost_casenull;
> - state->innermost_caseval = &arefstate->prevvalue;
> - state->innermost_casenull = &arefstate->prevnull;
> + state->innermost_caseval = &sbsrefstate->prevvalue;
> + state->innermost_casenull = &sbsrefstate->prevnull;
>  
>   /* evaluate replacement value into replacevalue/replacenull */
> - ExecInitExprRec(aref->refassgnexpr, state,
> - &arefstate->replacevalue, &arefstate->replacenull);
> + ExecInitExprRec(sbsref->refassgnexpr, state,
> + &sbsrefstate->replacevalue, &sbsrefstate->replacenull);
>  
>   state->innermost_caseval = save_innermost_caseval;
>   state->innermost_casenull = save_innermost_casenull;
>  
>   /* and perform the assignment */
> - scratch->opcode = EEOP_ARRAYREF_ASSIGN;
> - scratch->d.arrayref.state = arefstate;
> + scratch->opcode = EEOP_SBSREF_ASSIGN;
> + scratch->d.sbsref.state = sbsrefstate;
> + scratch->d.sbsref.eval_finfo = eval_finfo;
> + scratch->d.sbsref.nested_finfo = nested_finfo;
>   ExprEvalPushStep(state, scratch);
> +
>   }
>   else
>   {
>   /* array fetch is much simpler */
> - scratch->opcode = EEOP_ARRAYREF_FETCH;
> - scratch->d.arrayref.state = arefstate;
> + scratch->opcode = EEOP_SBSREF_FETCH;
> + scratch->d.sbsref.state = sbsrefstate;
> + scratch->d.sbsref.eval_finfo = eval_finfo;
> + scratch->d.sbsref.nested_finfo = nested_finfo;
>   ExprEvalPushStep(state, scratch);
> +
>   }
>  
>   /* adjust jump targets */
> @@ -2585,10 +2603,10 @@ ExecInitArrayRef(ExprEvalStep *scratch, ArrayRef *aref,
>   {
>   ExprEvalStep *as = &state->steps[lfirst_int(lc)];
>  
> - if (as->opcode == EEOP_ARRAYREF_SUBSCRIPT)
> + if (as->opcode == EEOP_SBSREF_SUBSCRIPT)
>   {
> - Assert(as->d.arrayref_subscript.jumpdone == -1);
> - as->d.arrayref_subscript.jumpdone = state->steps_len;
> + Assert(as->d.sbsref_subscript.jumpdone == -1);
> + as->d.sbsref_subscript.jumpdone = state->steps_len;
>   }
>   else
>   {
> @@ -2600,8 +2618,8 @@ ExecInitArrayRef(ExprEvalStep *scratch, ArrayRef *aref,
>  }
>  
>  /*
> - * Helper for preparing ArrayRef expressions for evaluation: is expr a nested
> - * FieldStore or ArrayRef that needs the old element value passed down?
> + * Helper for preparing SubscriptingRef expressions for evaluation: is expr a nested
> + * FieldStore or SubscriptingRef that needs the old element value passed down?
>   *
>   * (We could use this in FieldStore too, but in that case passing the old
>   * value is so cheap there's no need.)
> @@ -2624,11 +2642,11 @@ isAssignmentIndirectionExpr(Expr *expr)
>   if (fstore->arg && IsA(fstore->arg, CaseTestExpr))
>   return true;
>   }
> - else if (IsA(expr, ArrayRef))
> + else if (IsA(expr, SubscriptingRef))
>   {
> - ArrayRef   *arrayRef = (ArrayRef *) expr;
> + SubscriptingRef   *sbsRef = (SubscriptingRef *) expr;
>  
> - if (arrayRef->refexpr && IsA(arrayRef->refexpr, CaseTestExpr))
> + if (sbsRef->refexpr && IsA(sbsRef->refexpr, CaseTestExpr))
>   return true;
>   }
>   return false;
> diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
> index 2e88417265..0c72e80b58 100644
> --- a/src/backend/executor/execExprInterp.c
> +++ b/src/backend/executor/execExprInterp.c
> @@ -364,10 +364,10 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
>   &&CASE_EEOP_FIELDSELECT,
>   &&CASE_EEOP_FIELDSTORE_DEFORM,
>   &&CASE_EEOP_FIELDSTORE_FORM,
> - &&CASE_EEOP_ARRAYREF_SUBSCRIPT,
> - &&CASE_EEOP_ARRAYREF_OLD,
> - &&CASE_EEOP_ARRAYREF_ASSIGN,
> - &&CASE_EEOP_ARRAYREF_FETCH,
> + &&CASE_EEOP_SBSREF_SUBSCRIPT,
> + &&CASE_EEOP_SBSREF_OLD,
> + &&CASE_EEOP_SBSREF_ASSIGN,
> + &&CASE_EEOP_SBSREF_FETCH,
>   &&CASE_EEOP_DOMAIN_TESTVAL,
>   &&CASE_EEOP_DOMAIN_NOTNULL,
>   &&CASE_EEOP_DOMAIN_CHECK,
> @@ -1367,43 +1367,43 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
>   EEO_NEXT();
>   }
>  
> - EEO_CASE(EEOP_ARRAYREF_SUBSCRIPT)
> + EEO_CASE(EEOP_SBSREF_SUBSCRIPT)
>   {
>   /* Process an array subscript */
>  
>   /* too complex for an inline implementation */
> - if (ExecEvalArrayRefSubscript(state, op))
> + if (ExecEvalSubscriptingRef(state, op))
>   {
>   EEO_NEXT();
>   }
>   else
>   {
> - /* Subscript is null, short-circuit ArrayRef to NULL */
> - EEO_JUMP(op->d.arrayref_subscript.jumpdone);
> + /* Subscript is null, short-circuit SubscriptingRef to NULL */
> + EEO_JUMP(op->d.sbsref_subscript.jumpdone);
>   }
>   }
>  
> - EEO_CASE(EEOP_ARRAYREF_OLD)
> + EEO_CASE(EEOP_SBSREF_OLD)
>   {
>   /*
> - * Fetch the old value in an arrayref assignment, in case it's
> + * Fetch the old value in an sbsref assignment, in case it's
>   * referenced (via a CaseTestExpr) inside the assignment
>   * expression.
>   */
>  
>   /* too complex for an inline implementation */
> - ExecEvalArrayRefOld(state, op);
> + ExecEvalSubscriptingRefOld(state, op);
>  
>   EEO_NEXT();
>   }
>  
>   /*
> - * Perform ArrayRef assignment
> + * Perform SubscriptingRef assignment
>   */
> - EEO_CASE(EEOP_ARRAYREF_ASSIGN)
> + EEO_CASE(EEOP_SBSREF_ASSIGN)
>   {
>   /* too complex for an inline implementation */
> - ExecEvalArrayRefAssign(state, op);
> + ExecEvalSubscriptingRefAssign(state, op);
>  
>   EEO_NEXT();
>   }
> @@ -1411,10 +1411,10 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
>   /*
>   * Fetch subset of an array.
>   */
> - EEO_CASE(EEOP_ARRAYREF_FETCH)
> + EEO_CASE(EEOP_SBSREF_FETCH)
>   {
>   /* too complex for an inline implementation */
> - ExecEvalArrayRefFetch(state, op);
> + ExecEvalSubscriptingRefFetch(state, op);
>  
>   EEO_NEXT();
>   }
> @@ -2702,197 +2702,115 @@ ExecEvalFieldStoreForm(ExprState *state, ExprEvalStep *op, ExprContext *econtext
>  }
>  
>  /*
> - * Process a subscript in an ArrayRef expression.
> + * Process a subscript in a SubscriptingRef expression.
>   *
>   * If subscript is NULL, throw error in assignment case, or in fetch case
>   * set result to NULL and return false (instructing caller to skip the rest
> - * of the ArrayRef sequence).
> + * of the SubscriptingRef sequence).
>   *
>   * Subscript expression result is in subscriptvalue/subscriptnull.
>   * On success, integer subscript value has been saved in upperindex[] or
>   * lowerindex[] for use later.
>   */
>  bool
> -ExecEvalArrayRefSubscript(ExprState *state, ExprEvalStep *op)
> +ExecEvalSubscriptingRef(ExprState *state, ExprEvalStep *op)
>  {
> - ArrayRefState *arefstate = op->d.arrayref_subscript.state;
> - int   *indexes;
> - int off;
> + SubscriptingRefState *sbsrefstate = op->d.sbsref_subscript.state;
> + Datum *indexes;
> + int off;
>  
>   /* If any index expr yields NULL, result is NULL or error */
> - if (arefstate->subscriptnull)
> + if (sbsrefstate->subscriptnull)
>   {
> - if (arefstate->isassignment)
> + if (sbsrefstate->isassignment)
>   ereport(ERROR,
>   (errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
> - errmsg("array subscript in assignment must not be null")));
> + errmsg("subscript in assignment must not be null")));
>   *op->resnull = true;
>   return false;
>   }
>  
>   /* Convert datum to int, save in appropriate place */
> - if (op->d.arrayref_subscript.isupper)
> - indexes = arefstate->upperindex;
> + if (op->d.sbsref_subscript.isupper)
> + indexes = sbsrefstate->upper;
>   else
> - indexes = arefstate->lowerindex;
> - off = op->d.arrayref_subscript.off;
> + indexes = sbsrefstate->lower;
> + off = op->d.sbsref_subscript.off;
>  
> - indexes[off] = DatumGetInt32(arefstate->subscriptvalue);
> + indexes[off] = sbsrefstate->subscriptvalue;
>  
>   return true;
>  }
>  
>  /*
> - * Evaluate ArrayRef fetch.
> + * Evaluate SubscriptingRef fetch.
>   *
> - * Source array is in step's result variable.
> + * Source container is in step's result variable.
>   */
>  void
> -ExecEvalArrayRefFetch(ExprState *state, ExprEvalStep *op)
> +ExecEvalSubscriptingRefFetch(ExprState *state, ExprEvalStep *op)
>  {
> - ArrayRefState *arefstate = op->d.arrayref.state;
> -
> - /* Should not get here if source array (or any subscript) is null */
> + /* Should not get here if source container (or any subscript) is null */
>   Assert(!(*op->resnull));
>  
> - if (arefstate->numlower == 0)
> - {
> - /* Scalar case */
> - *op->resvalue = array_get_element(*op->resvalue,
> -  arefstate->numupper,
> -  arefstate->upperindex,
> -  arefstate->refattrlength,
> -  arefstate->refelemlength,
> -  arefstate->refelembyval,
> -  arefstate->refelemalign,
> -  op->resnull);
> - }
> - else
> - {
> - /* Slice case */
> - *op->resvalue = array_get_slice(*op->resvalue,
> - arefstate->numupper,
> - arefstate->upperindex,
> - arefstate->lowerindex,
> - arefstate->upperprovided,
> - arefstate->lowerprovided,
> - arefstate->refattrlength,
> - arefstate->refelemlength,
> - arefstate->refelembyval,
> - arefstate->refelemalign);
> - }
> + *op->resvalue = FunctionCall2(op->d.sbsref.eval_finfo,
> +  PointerGetDatum(*op->resvalue),
> +  PointerGetDatum(op));
>  }
>  
>  /*
> - * Compute old array element/slice value for an ArrayRef assignment
> - * expression.  Will only be generated if the new-value subexpression
> - * contains ArrayRef or FieldStore.  The value is stored into the
> - * ArrayRefState's prevvalue/prevnull fields.
> + * Compute old container element/slice value for a SubscriptingRef assignment
> + * expression. Will only be generated if the new-value subexpression
> + * contains SubscriptingRef or FieldStore. The value is stored into the
> + * SubscriptingRefState's prevvalue/prevnull fields.
>   */
>  void
> -ExecEvalArrayRefOld(ExprState *state, ExprEvalStep *op)
> +ExecEvalSubscriptingRefOld(ExprState *state, ExprEvalStep *op)
>  {
> - ArrayRefState *arefstate = op->d.arrayref.state;
> + SubscriptingRefState *sbsrefstate = op->d.sbsref.state;
>  
>   if (*op->resnull)
>   {
> - /* whole array is null, so any element or slice is too */
> - arefstate->prevvalue = (Datum) 0;
> - arefstate->prevnull = true;
> - }
> - else if (arefstate->numlower == 0)
> - {
> - /* Scalar case */
> - arefstate->prevvalue = array_get_element(*op->resvalue,
> - arefstate->numupper,
> - arefstate->upperindex,
> - arefstate->refattrlength,
> - arefstate->refelemlength,
> - arefstate->refelembyval,
> - arefstate->refelemalign,
> - &arefstate->prevnull);
> + /* whole container is null, so any element or slice is too */
> + sbsrefstate->prevvalue = (Datum) 0;
> + sbsrefstate->prevnull = true;
>   }
>   else
>   {
> - /* Slice case */
> - /* this is currently unreachable */
> - arefstate->prevvalue = array_get_slice(*op->resvalue,
> -   arefstate->numupper,
> -   arefstate->upperindex,
> -   arefstate->lowerindex,
> -   arefstate->upperprovided,
> -   arefstate->lowerprovided,
> -   arefstate->refattrlength,
> -   arefstate->refelemlength,
> -   arefstate->refelembyval,
> -   arefstate->refelemalign);
> - arefstate->prevnull = false;
> + sbsrefstate->prevvalue = FunctionCall2(op->d.sbsref.nested_finfo,
> +  PointerGetDatum(*op->resvalue),
> +  PointerGetDatum(op));
> +
> + if (sbsrefstate->numlower != 0)
> + sbsrefstate->prevnull = false;
> +
>   }
>  }
>  
>  /*
> - * Evaluate ArrayRef assignment.
> + * Evaluate SubscriptingRef assignment.
>   *
> - * Input array (possibly null) is in result area, replacement value is in
> - * ArrayRefState's replacevalue/replacenull.
> + * Input container (possibly null) is in result area, replacement value is in
> + * SubscriptingRefState's replacevalue/replacenull.
>   */
>  void
> -ExecEvalArrayRefAssign(ExprState *state, ExprEvalStep *op)
> +ExecEvalSubscriptingRefAssign(ExprState *state, ExprEvalStep *op)
>  {
> - ArrayRefState *arefstate = op->d.arrayref.state;
> -
> + SubscriptingRefState *sbsrefstate = op->d.sbsref.state;
>   /*
> - * For an assignment to a fixed-length array type, both the original array
> - * and the value to be assigned into it must be non-NULL, else we punt and
> - * return the original array.
> + * For an assignment to a fixed-length container type, both the original
> + * container and the value to be assigned into it must be non-NULL, else we
> + * punt and return the original container.
>   */
> - if (arefstate->refattrlength > 0) /* fixed-length array? */
> + if (sbsrefstate->refattrlength > 0)
>   {
> - if (*op->resnull || arefstate->replacenull)
> + if (*op->resnull || sbsrefstate->replacenull)
>   return;
>   }
>  
> - /*
> - * For assignment to varlena arrays, we handle a NULL original array by
> - * substituting an empty (zero-dimensional) array; insertion of the new
> - * element will result in a singleton array value.  It does not matter
> - * whether the new element is NULL.
> - */
> - if (*op->resnull)
> - {
> - *op->resvalue = PointerGetDatum(construct_empty_array(arefstate->refelemtype));
> - *op->resnull = false;
> - }
> -
> - if (arefstate->numlower == 0)
> - {
> - /* Scalar case */
> - *op->resvalue = array_set_element(*op->resvalue,
> -  arefstate->numupper,
> -  arefstate->upperindex,
> -  arefstate->replacevalue,
> -  arefstate->replacenull,
> -  arefstate->refattrlength,
> -  arefstate->refelemlength,
> -  arefstate->refelembyval,
> -  arefstate->refelemalign);
> - }
> - else
> - {
> - /* Slice case */
> - *op->resvalue = array_set_slice(*op->resvalue,
> - arefstate->numupper,
> - arefstate->upperindex,
> - arefstate->lowerindex,
> - arefstate->upperprovided,
> - arefstate->lowerprovided,
> - arefstate->replacevalue,
> - arefstate->replacenull,
> - arefstate->refattrlength,
> - arefstate->refelemlength,
> - arefstate->refelembyval,
> - arefstate->refelemalign);
> - }
> + *op->resvalue = FunctionCall2(op->d.sbsref.eval_finfo,
> +  PointerGetDatum(*op->resvalue),
> +  PointerGetDatum(op));
>  }


You might not love me for this suggestion, but I'd like to see the
renaming here split from the rest of the patch. There's a lot of diff
that's just more or less automatic changes, making it hard to see the
actual meaningful changes.

- Andres

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Generic type subscripting

Tom Lane-2
Andres Freund <[hidden email]> writes:
> You might not love me for this suggestion, but I'd like to see the
> renaming here split from the rest of the patch. There's a lot of diff
> that's just more or less automatic changes, making it hard to see the
> actual meaningful changes.

Yeah, I'm beginning to wonder if we should do the renaming at all.
It's useful for being sure we've found everyplace that needs to change
... but if lots of those places don't actually need more than the
name changes, maybe it's just make-work and code thrashing.


There's a set of other issues that are starting to bother me.
Perhaps it's not in this patch's charter to resolve them, but I think
we need to figure out whether that's true.  It's a bit hard to
explain clearly, but let me see how well I can state these:

* The complaint I had about the "container" terminology isn't just
terminology.  Rather, there is a bunch of knowledge in the system that
some data types can be found embedded in other types; for one example,
see find_composite_type_dependencies.  In the case of standard arrays,
it's clear that the array type does contain its element type in this
sense, and we'd better be aware of that in situations such as applying
DDL that changes the element type.  It's much less clear what it means
if you say that type X has a subscripting function that yields type Y.
I think the issue can be ignored as long as Y is not a type mutable by
any provided DDL commands, but is that OK as a permanent restriction?
If not, do we need to do anything about it in the v1 patch?  If we don't,
do we need to enforce some restriction on what Y can be for types
other than true arrays?

* There are other identifiable array-specific behaviors that people
might expect a generic subscripting feature to let them into.
For example, if we allow JSONB to be subscripted to obtain TEXT,
does that mean a polymorphic function f(x anyarray) should now match
JSONB?  It's even more fun if you decide that subscripting JSONB
should yield JSONB, which is probably a more useful definition than
TEXT really.  Then ANYARRAY and ANYELEMENT would need to be the same
type, which is likely to blow holes in the polymorphic type code,
though I've not looked closely for problems.  In the same vein, if
JSONB is subscriptable, should "x = ANY(y)" work for y of type JSONB?
I'm not actually sure that we'd want these sorts of things to happen,
even as follow-on extensions.  For instance, a polymorphic function
f(x anyarray) would very likely think it can apply array_dims(x) or
iterate from array_lower(x,1) to array_upper(x,1).  Providing it
a subscripting function won't get you far if the subscripts aren't
consecutive integers.

* There's an awful lot of places in the backend that call get_element_type
or get_base_element_type or type_is_array or type_is_array_domain, and
aren't touched by the patch as it stands.  Some of them might represent
dependencies that we need to worry about that don't fall under either of
the above categories.  So just touching the places that mess with ArrayRef
isn't getting us real far in terms of being sure we've considered
everything that needs considering.

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Generic type subscripting

Robert Haas
On Wed, Jan 10, 2018 at 7:09 PM, Tom Lane <[hidden email]> wrote:

> * The complaint I had about the "container" terminology isn't just
> terminology.  Rather, there is a bunch of knowledge in the system that
> some data types can be found embedded in other types; for one example,
> see find_composite_type_dependencies.  In the case of standard arrays,
> it's clear that the array type does contain its element type in this
> sense, and we'd better be aware of that in situations such as applying
> DDL that changes the element type.  It's much less clear what it means
> if you say that type X has a subscripting function that yields type Y.
> I think the issue can be ignored as long as Y is not a type mutable by
> any provided DDL commands, but is that OK as a permanent restriction?
> If not, do we need to do anything about it in the v1 patch?  If we don't,
> do we need to enforce some restriction on what Y can be for types
> other than true arrays?

Well, if I set things up so that subscripting foo by integer returns
bar, that doesn't seem to be much different, from the point of view of
type dependencies, from CREATE FUNCTION some_function_name(foo,
integer) RETURNS bar.  I suppose that if this gets stored in the
pg_type entry for foo, that type just develops a dependencies on the
subscripting functions which, I suppose, makes them indirectly
dependent on the return types of those functions.  I'm not sure why
that wouldn't be sufficient.  It's true that you might be able to
alter the type in some way that would break it, but that's equally
true of the CREATE FUNCTION case:

rhaas=# create table foo (a int, b text);
CREATE TABLE
rhaas=# create or replace function fooify(foo) returns int as $$select
$1.a$$ language sql;
CREATE FUNCTION
rhaas=# select fooify('(1,tgl)'::foo);
 fooify
--------
      1
(1 row)
rhaas=# alter table foo rename column a to c;
ALTER TABLE
rhaas=# select fooify('(1,tgl)'::foo);
ERROR:  column "a" not found in data type foo
LINE 1: select $1.a
               ^
QUERY:  select $1.a
CONTEXT:  SQL function "fooify" during inlining

> * There are other identifiable array-specific behaviors that people
> might expect a generic subscripting feature to let them into.
> For example, if we allow JSONB to be subscripted to obtain TEXT,
> does that mean a polymorphic function f(x anyarray) should now match
> JSONB?  It's even more fun if you decide that subscripting JSONB
> should yield JSONB, which is probably a more useful definition than
> TEXT really.  Then ANYARRAY and ANYELEMENT would need to be the same
> type, which is likely to blow holes in the polymorphic type code,
> though I've not looked closely for problems.  In the same vein, if
> JSONB is subscriptable, should "x = ANY(y)" work for y of type JSONB?
> I'm not actually sure that we'd want these sorts of things to happen,
> even as follow-on extensions.  For instance, a polymorphic function
> f(x anyarray) would very likely think it can apply array_dims(x) or
> iterate from array_lower(x,1) to array_upper(x,1).  Providing it
> a subscripting function won't get you far if the subscripts aren't
> consecutive integers.

Our SQL dialect is statically typed; trying to support duck-typing
seems likely to create a lot of problems.  True, we have a single
datatype for one-dimensional and multi-dimensional arrays, but I think
most people would view that as an anti-feature.  We also have some
provision for dynamic record types, but it feels pretty kludgy.

> * There's an awful lot of places in the backend that call get_element_type
> or get_base_element_type or type_is_array or type_is_array_domain, and
> aren't touched by the patch as it stands.  Some of them might represent
> dependencies that we need to worry about that don't fall under either of
> the above categories.  So just touching the places that mess with ArrayRef
> isn't getting us real far in terms of being sure we've considered
> everything that needs considering.

No argument there.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Generic type subscripting

Tom Lane-2
Robert Haas <[hidden email]> writes:
> On Wed, Jan 10, 2018 at 7:09 PM, Tom Lane <[hidden email]> wrote:
>> * The complaint I had about the "container" terminology isn't just
>> terminology.  Rather, there is a bunch of knowledge in the system that
>> some data types can be found embedded in other types; for one example,
>> see find_composite_type_dependencies.

> Well, if I set things up so that subscripting foo by integer returns
> bar, that doesn't seem to be much different, from the point of view of
> type dependencies, from CREATE FUNCTION some_function_name(foo,
> integer) RETURNS bar.

I think you missed the point.  The question is whether the existence of a
subscripting function means that we need to treat the subscriptable type
as physically containing the subscript result type.  For example, if the
subscript result type is composite, do we need to do something about a
column of the subscriptable type when somebody does an ALTER TYPE
... ALTER ATTRIBUTE TYPE on the result type?  The dependency mechanism
doesn't have enough information to answer that.  It's fairly easy to
imagine cases where it wouldn't be true --- for instance, if you had
a subscripting conversion from JSONB to my_composite_type, changing
my_composite_type would likely change the set of JSONB values for which
the subscripting function would succeed, but it wouldn't create a need
to physically rewrite any JSONB columns.  But perhaps somebody might
try to build a subscriptable type for which they did need that.

After further thought, I think I'm prepared to say (for the moment) that
only true arrays need be deemed to be containers in this sense.  If you
make a subscripting function for anything else, we'll treat it as just a
function that happens to yield the result type but doesn't imply that that
is what is physically stored.  Perhaps at some point that will need to
change, but I'm failing to think of near-term use cases where it would be
important to have such a property.

This is, however, a good reason why I don't like the use of "container"
terminology in the patch.  I think we want to reserve "container" for
types where physical containment is assumed.

>> * There are other identifiable array-specific behaviors that people
>> might expect a generic subscripting feature to let them into.

> Our SQL dialect is statically typed; trying to support duck-typing
> seems likely to create a lot of problems.

Agreed; that's pretty much what my point was too.  I'm just trying
to be clear about how far we expect this capability to reach.

                        regards, tom lane

1234567