contrib/cube - binary input/output handlers

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

contrib/cube - binary input/output handlers

Kohei KaiGai-4
Hello,

I noticed that contrib/cube data type does not support binary
input/output handler
when I tried to dump a table with cube columns, using a tool [*1] that
uses binary data
over libpq.

$ pg2arrow -d postgres -t my_table
../utils/pgsql_client.c:351  SQL execution failed: ERROR:  no binary
output function available for type cube

This patch adds cube_send / cube_recv handlers on the contrib/cube data type.
Once this patch was applied to, the libpq client can obtain the table
data using binary mode.

$ pg2arrow -d postgres -t my_table
NOTICE: -o, --output=FILENAME option was not given,
        so a temporary file '/tmp/CdC68Q.arrow' was built instead.

The internal layout of cube, a kind of varlena, has a leading 32bit
header and the following float8
array. (array size is embedded in the header field).
So, cube_send just put the data stream according to the internal
layout, then cube_recv reconstructs
the values inverse.

Best regards,

[*1] pg2arrow - a utility to convert PostgreSQL table to Apache Arrow
http://heterodb.github.io/pg-strom/arrow_fdw/#using-pg2arrow
--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <[hidden email]>

pgsql-cube-binary-inout-handler.v1.patch (3K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: contrib/cube - binary input/output handlers

Peter Eisentraut-7
On 24.02.21 04:18, Kohei KaiGai wrote:
> This patch adds cube_send / cube_recv handlers on the contrib/cube data type.
> Once this patch was applied to, the libpq client can obtain the table
> data using binary mode.

Seems reasonable.  But you need to write an extension upgrade script and
bump the extension version.


Reply | Threaded
Open this post in threaded view
|

Re: contrib/cube - binary input/output handlers

Kohei KaiGai-4
2021年3月3日(水) 23:33 Peter Eisentraut <[hidden email]>:
>
> On 24.02.21 04:18, Kohei KaiGai wrote:
> > This patch adds cube_send / cube_recv handlers on the contrib/cube data type.
> > Once this patch was applied to, the libpq client can obtain the table
> > data using binary mode.
>
> Seems reasonable.  But you need to write an extension upgrade script and
> bump the extension version.
>
Thanks for your review.

One thing not straightforward is that a new definition of cube type
needs to drop
the old definition once, then it leads cascaded deletion to the
objects that depends
on the "cube" type declared at the cube--1.2.sql.
Do you have any good ideas?

Idea-1) modify system catalog by UPDATE pg_type carefully.
  It can avoid cascaded deletion.

Idea-2) copy & paste all the declaration after CREATE TYPE in
cube--1.2.sql to the
  new script, then create these objects again.

Best regards,





--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <[hidden email]>


Reply | Threaded
Open this post in threaded view
|

Re: contrib/cube - binary input/output handlers

Tom Lane-2
Kohei KaiGai <[hidden email]> writes:
> One thing not straightforward is that a new definition of cube type
> needs to drop
> the old definition once, then it leads cascaded deletion to the
> objects that depends
> on the "cube" type declared at the cube--1.2.sql.
> Do you have any good ideas?

You can add the I/O functions with ALTER TYPE nowadays.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: contrib/cube - binary input/output handlers

Tom Lane-2
I wrote:
> You can add the I/O functions with ALTER TYPE nowadays.

To be concrete, see 949a9f043eb70a4986041b47513579f9a13d6a33

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: contrib/cube - binary input/output handlers

Kohei KaiGai-4
Thanks, the attached patch add cube--1.5 for binary send/recv functions and
modification of cube type using the new ALTER TYPE.

Best regards,

2021年3月4日(木) 0:45 Tom Lane <[hidden email]>:
>
> I wrote:
> > You can add the I/O functions with ALTER TYPE nowadays.
>
> To be concrete, see 949a9f043eb70a4986041b47513579f9a13d6a33
>
>                         regards, tom lane



--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <[hidden email]>

pgsql-cube-binary-inout-handler.v2.patch (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: contrib/cube - binary input/output handlers

Tom Lane-2
Kohei KaiGai <[hidden email]> writes:
> Thanks, the attached patch add cube--1.5 for binary send/recv functions and
> modification of cube type using the new ALTER TYPE.

Hm, that was already superseded by events (112d411fb).
As long as we get this done for v14, we can just treat it
as an add-on to cube 1.5, so here's a quick rebase onto HEAD.

Scanning the code, I have a couple of gripes.  I'm not sure it's
a good plan to just send the "header" field raw like that ---
would breaking it into a dimension field and a point bool be
better?  In any case, the receive function has to be more careful
than this about accepting only valid header values.

Also, I don't think "offsetof(NDBOX, x[nitems])" is per project
style.  It at least used to be true that MSVC couldn't cope
with that, so we prefer

        offsetof(NDBOX, x) + nitems * sizeof(whatever)

                        regards, tom lane


diff --git a/contrib/cube/cube--1.4--1.5.sql b/contrib/cube/cube--1.4--1.5.sql
index 54492e5d18..4b5bf8d205 100644
--- a/contrib/cube/cube--1.4--1.5.sql
+++ b/contrib/cube/cube--1.4--1.5.sql
@@ -6,3 +6,16 @@
 -- Remove @ and ~
 DROP OPERATOR @ (cube, cube);
 DROP OPERATOR ~ (cube, cube);
+
+-- Add binary input/output handlers
+CREATE FUNCTION cube_recv(internal)
+RETURNS cube
+AS 'MODULE_PATHNAME'
+LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;
+
+CREATE FUNCTION cube_send(cube)
+RETURNS bytea
+AS 'MODULE_PATHNAME'
+LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;
+
+ALTER TYPE cube SET ( RECEIVE = cube_recv, SEND = cube_send );
diff --git a/contrib/cube/cube.c b/contrib/cube/cube.c
index 6f810b26c5..d6b0dd75b0 100644
--- a/contrib/cube/cube.c
+++ b/contrib/cube/cube.c
@@ -13,6 +13,7 @@
 #include "access/gist.h"
 #include "access/stratnum.h"
 #include "cubedata.h"
+#include "libpq/pqformat.h"
 #include "utils/array.h"
 #include "utils/float.h"
 
@@ -31,6 +32,8 @@ PG_FUNCTION_INFO_V1(cube_in);
 PG_FUNCTION_INFO_V1(cube_a_f8_f8);
 PG_FUNCTION_INFO_V1(cube_a_f8);
 PG_FUNCTION_INFO_V1(cube_out);
+PG_FUNCTION_INFO_V1(cube_send);
+PG_FUNCTION_INFO_V1(cube_recv);
 PG_FUNCTION_INFO_V1(cube_f8);
 PG_FUNCTION_INFO_V1(cube_f8_f8);
 PG_FUNCTION_INFO_V1(cube_c_f8);
@@ -319,6 +322,55 @@ cube_out(PG_FUNCTION_ARGS)
  PG_RETURN_CSTRING(buf.data);
 }
 
+/*
+ * cube_send - a binary output handler for cube type
+ */
+Datum
+cube_send(PG_FUNCTION_ARGS)
+{
+ NDBOX   *cube = PG_GETARG_NDBOX_P(0);
+ StringInfoData buf;
+ int32 i, nitems = DIM(cube);
+
+ pq_begintypsend(&buf);
+ pq_sendint32(&buf, cube->header);
+ for (i=0; i < nitems; i++)
+ {
+ pq_sendfloat8(&buf, LL_COORD(cube, i));
+ }
+ if (!cube_is_point_internal(cube))
+ {
+ for (i=0; i < nitems; i++)
+ {
+ pq_sendfloat8(&buf, UR_COORD(cube, i));
+ }
+ }
+ PG_RETURN_BYTEA_P(pq_endtypsend(&buf));
+}
+
+/*
+ * cube_recv - a binary input handler for cube type
+ */
+Datum
+cube_recv(PG_FUNCTION_ARGS)
+{
+ StringInfo buf = (StringInfo) PG_GETARG_POINTER(0);
+ int32 header;
+ int i, nitems;
+ NDBOX   *cube;
+
+ header = pq_getmsgint(buf, sizeof(int32));
+ nitems = (header & DIM_MASK);
+ if ((header & POINT_BIT) == 0)
+ nitems += nitems;
+ cube = palloc(offsetof(NDBOX, x[nitems]));
+ SET_VARSIZE(cube, offsetof(NDBOX, x[nitems]));
+ cube->header = header;
+ for (i=0; i < nitems; i++)
+ cube->x[i] = pq_getmsgfloat8(buf);
+
+ PG_RETURN_NDBOX_P(cube);
+}
 
 /*****************************************************************************
  *   GiST functions
Reply | Threaded
Open this post in threaded view
|

Re: contrib/cube - binary input/output handlers

Kohei KaiGai-4
2021年3月6日(土) 1:41 Tom Lane <[hidden email]>:
>
> Kohei KaiGai <[hidden email]> writes:
> > Thanks, the attached patch add cube--1.5 for binary send/recv functions and
> > modification of cube type using the new ALTER TYPE.
>
> Hm, that was already superseded by events (112d411fb).
> As long as we get this done for v14, we can just treat it
> as an add-on to cube 1.5, so here's a quick rebase onto HEAD.
>
Thanks for this revising.

> Scanning the code, I have a couple of gripes.  I'm not sure it's
> a good plan to just send the "header" field raw like that ---
> would breaking it into a dimension field and a point bool be
> better?  In any case, the receive function has to be more careful
> than this about accepting only valid header values.
>
I have a different opinion here.
Do we never reinterpret the unused header fields (bits 8-30) for another purpose
in the future version?
If application saves the raw header field as-is, at least, it can keep
the header field
without information loss.
On the other hand, if cube_send() individually sent num-of-dimension
and point flag,
an application (that is built for the current version) will drop the
bit fields currently unused,
but the new version of server may reinterpret the field for something.

Of course, it's better to have more careful validation at cuda_recv()
when it received
the header field.

> Also, I don't think "offsetof(NDBOX, x[nitems])" is per project
> style.  It at least used to be true that MSVC couldn't cope
> with that, so we prefer
>
>         offsetof(NDBOX, x) + nitems * sizeof(whatever)
>
Ok, I'll fix it on the next patch.

Best regards,
--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <[hidden email]>


Reply | Threaded
Open this post in threaded view
|

Re: contrib/cube - binary input/output handlers

Tom Lane-2
Kohei KaiGai <[hidden email]> writes:

> 2021年3月6日(土) 1:41 Tom Lane <[hidden email]>:
>> Scanning the code, I have a couple of gripes.  I'm not sure it's
>> a good plan to just send the "header" field raw like that ---
>> would breaking it into a dimension field and a point bool be
>> better?  In any case, the receive function has to be more careful
>> than this about accepting only valid header values.
>>
> I have a different opinion here.
> Do we never reinterpret the unused header fields (bits 8-30) for another purpose
> in the future version?

Right, that's what to be concerned about.

The best way might be to send the header as-is, as you've done,
but for cube_recv to throw error if the reserved bits aren't
all zero.  That way we can't get into a situation where we
aren't sure what's in stored values.  If we do expand the header
in future, values should be forward compatible.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: contrib/cube - binary input/output handlers

Kohei KaiGai-4
2021年3月6日(土) 11:21 Tom Lane <[hidden email]>:

>
> Kohei KaiGai <[hidden email]> writes:
> > 2021年3月6日(土) 1:41 Tom Lane <[hidden email]>:
> >> Scanning the code, I have a couple of gripes.  I'm not sure it's
> >> a good plan to just send the "header" field raw like that ---
> >> would breaking it into a dimension field and a point bool be
> >> better?  In any case, the receive function has to be more careful
> >> than this about accepting only valid header values.
> >>
> > I have a different opinion here.
> > Do we never reinterpret the unused header fields (bits 8-30) for another purpose
> > in the future version?
>
> Right, that's what to be concerned about.
>
> The best way might be to send the header as-is, as you've done,
> but for cube_recv to throw error if the reserved bits aren't
> all zero.  That way we can't get into a situation where we
> aren't sure what's in stored values.  If we do expand the header
> in future, values should be forward compatible.
>
Ok, the attached v4 sends the raw header as-is, then cube_recv
validates the header.
If num-of-dimension is larger than CUBE_MAX_DIM, it is obviously
unused bits (8-30)
are used or out of the range.

It also changes the manner of offsetof() as you suggested.

Best regards,
--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <[hidden email]>

pgsql-cube-binary-inout-handler.v4.patch (3K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: contrib/cube - binary input/output handlers

Tom Lane-2
Kohei KaiGai <[hidden email]> writes:
> Ok, the attached v4 sends the raw header as-is, then cube_recv
> validates the header.
> If num-of-dimension is larger than CUBE_MAX_DIM, it is obviously
> unused bits (8-30)
> are used or out of the range.

Works for me.

I noted one additional bug: you have to condition whether to dump
the upper coordinates just on the IS_POINT flag, because that is
all that cube_recv will see.  The cube_is_point_internal() hack
can be used in some other places, but not here.

Also, as a matter of style, I didn't like that cube_send was using
the LL_COORD/UR_COORD abstraction but cube_recv wasn't.  In the
worst case (if someone tried to change that abstraction) this could
turn into an actual bug, with cube_recv storing the coordinates in
the wrong order.  Could have gone either way on which one to change
to look like the other, but I chose to simplify cube_send to look
like cube_recv.  This essentially means that we're locking the
binary representation to use the physical storage order of the
coordinates even if someone gets fancy about their meaning.

Pushed with those fixes.

                        regards, tom lane