GSoC 2017: Foreign Key Arrays

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
114 messages Options
1234 ... 6
Reply | Threaded
Open this post in threaded view
|

GSoC 2017: Foreign Key Arrays

Mark Rofail
Dear PostgreSQL hacker community,

I am working on Foreign Key Arrays as part of the Google Summer of Code 2017. 

I will be logging my progress on this thread as I progress, twice a week (Mondays and Fridays), so anyone who is willing to comment, please do.

The Problem
Foreign Key Arrays were introduced by Marco Nenciarini[1], however, the proposed patch had some performance issues. Let's assume we have two tables, table B has a foreign key array that references table A, any change in table A (INSERT, UPDATE, DELETE) would trigger a referential integrity check on table B. The current implementation uses sequential scans to accomplish this. This limits the size of tables using Foreign Key Arrays to ~100 records which is not practical in real life applications.

The Proposed Solution
Ultimately, as proposed by Tom Lane[2], we would like to replace the sequential scan with a GIN-indexed scan which would greatly enhance the performance. 

To achieve this, introducing a number of new operators is required. However, for the scope of the project, we will focus on the most basic case where the Primary Keys are of pseudo-type anyelement and the Foreign Keys are of pseudo-type anyarray, thus the main operator of concern will be @>(anyarray,anyelement).

Progress So Far
The actual coding begins on 30th of May, till then I will use my time to research, to settle the technical details of my plan.
I am yet to identify the files concerned with the GIN opclass. <-- if anyone can help with this
Next Step
I still don't have a solid grasp of how I am going to approach creating an operator, so I would like to experiment till the next report on creating a very simple operator.

I have attached the original proposal here.





--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

GSoCproposal2017.pdf (214K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: GSoC 2017: Foreign Key Arrays

Robert Haas
On Mon, May 22, 2017 at 7:51 PM, Mark Rofail <[hidden email]> wrote:
> Cloned the git repo found @ https://github.com/postgres/postgres and
> identified the main two files I will be concerned with. (I know I may need
> to edit other files but these seem to where I will spend most of my summer)
>
> src/backend/commands/tablecmds.c
> src/backend/utils/ri_triggers.c
>
> I am yet to identify the files concerned with the GIN opclass. <-- if anyone
> can help with this

There's not only one GIN opclass.  You can get a list like this:

select oid, * from pg_opclass where opcmethod = 2742;

Actually, you probably want to look for GIN opfamilies:

rhaas=# select oid, * from pg_opfamily where opfmethod = 2742;
 oid  | opfmethod |    opfname     | opfnamespace | opfowner
------+-----------+----------------+--------------+----------
 2745 |      2742 | array_ops      |           11 |       10
 3659 |      2742 | tsvector_ops   |           11 |       10
 4036 |      2742 | jsonb_ops      |           11 |       10
 4037 |      2742 | jsonb_path_ops |           11 |       10
(4 rows)

To see which SQL functions are used to implement a particular
opfamily, use the OID from the previous step in a query like this:

rhaas=# select prosrc from pg_amop, pg_operator, pg_proc where
amopfamily = 2745 and amopopr = pg_operator.oid and oprcode =
pg_proc.oid;
     prosrc
----------------
 array_eq
 arrayoverlap
 arraycontains
 arraycontained
(4 rows)

Then, you can look for those in the source tree.  You can also search
for the associated support functions, e.g.:

rhaas=# select distinct amprocnum, prosrc from pg_amproc, pg_proc
where amprocfamily = 2745 and amproc = pg_proc.oid order by 1, 2;
 amprocnum |        prosrc
-----------+-----------------------
         1 | bitcmp
         1 | bpcharcmp
         1 | btabstimecmp
         1 | btboolcmp
         1 | btcharcmp
         1 | btfloat4cmp
         1 | btfloat8cmp
         1 | btint2cmp
         1 | btint4cmp
         1 | btint8cmp
         1 | btnamecmp
         1 | btoidcmp
         1 | btoidvectorcmp
         1 | btreltimecmp
         1 | bttextcmp
         1 | bttintervalcmp
         1 | byteacmp
         1 | cash_cmp
         1 | date_cmp
         1 | interval_cmp
         1 | macaddr_cmp
         1 | network_cmp
         1 | numeric_cmp
         1 | time_cmp
         1 | timestamp_cmp
         1 | timetz_cmp
         2 | ginarrayextract
         3 | ginqueryarrayextract
         4 | ginarrayconsistent
         6 | ginarraytriconsistent
(30 rows)

You might want to read https://www.postgresql.org/docs/devel/static/xindex.html

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: GSoC 2017: Foreign Key Arrays

Mark Rofail
rhaas=# select oid, * from pg_opfamily where opfmethod = 2742;
 oid  | opfmethod |    opfname     | opfnamespace | opfowner
------+-----------+----------------+--------------+----------
 2745 |      2742 | array_ops      |           11 |       10
 3659 |      2742 | tsvector_ops   |           11 |       10
 4036 |      2742 | jsonb_ops      |           11 |       10
 4037 |      2742 | jsonb_path_ops |           11 |       10
(4 rows)
I am particulary intrested in array_ops but I have failed in locating the code behind it. Where is it reflected in the source code

Best Regards,
Mark Rofail 

Reply | Threaded
Open this post in threaded view
|

Re: GSoC 2017: Foreign Key Arrays

Alexander Korotkov-4
Hi, Mark!

On Tue, May 30, 2017 at 2:18 AM, Mark Rofail <[hidden email]> wrote:
rhaas=# select oid, * from pg_opfamily where opfmethod = 2742;
 oid  | opfmethod |    opfname     | opfnamespace | opfowner
------+-----------+----------------+--------------+----------
 2745 |      2742 | array_ops      |           11 |       10
 3659 |      2742 | tsvector_ops   |           11 |       10
 4036 |      2742 | jsonb_ops      |           11 |       10
 4037 |      2742 | jsonb_path_ops |           11 |       10
(4 rows)
I am particulary intrested in array_ops but I have failed in locating the code behind it. Where is it reflected in the source code

Let's look what particular opclass is consisting of.  Besides records in pg_opfamily, it also contains records in pg_opclass, pg_amproc and pg_amop.

=# select * from pg_opclass where opcfamily = 2745;
 opcmethod |  opcname  | opcnamespace | opcowner | opcfamily | opcintype | opcdefault | opckeytype
-----------+-----------+--------------+----------+-----------+-----------+------------+------------
      2742 | array_ops |           11 |       10 |      2745 |      2277 | t          |       2283
(1 row)

=# select * from pg_amproc where amprocfamily = 2745;
 amprocfamily | amproclefttype | amprocrighttype | amprocnum |           amproc
--------------+----------------+-----------------+-----------+----------------------------
         2745 |           2277 |            2277 |         2 | pg_catalog.ginarrayextract
         2745 |           2277 |            2277 |         3 | ginqueryarrayextract
         2745 |           2277 |            2277 |         4 | ginarrayconsistent
         2745 |           2277 |            2277 |         6 | ginarraytriconsistent
(4 rows)

=# select * from pg_amop where amopfamily = 2745;
 amopfamily | amoplefttype | amoprighttype | amopstrategy | amoppurpose | amopopr | amopmethod | amopsortfamily
------------+--------------+---------------+--------------+-------------+---------+------------+----------------
       2745 |         2277 |          2277 |            1 | s           |    2750 |       2742 |              0
       2745 |         2277 |          2277 |            2 | s           |    2751 |       2742 |              0
       2745 |         2277 |          2277 |            3 | s           |    2752 |       2742 |              0
       2745 |         2277 |          2277 |            4 | s           |    1070 |       2742 |              0
(4 rows)

These records of system catalog are defined in special headers the source code:
src/include/catalog/pg_amop.h
src/include/catalog/pg_amproc.h
src/include/catalog/pg_opclass.h
src/include/catalog/pg_opfamily.h
These records are written to system catalog during bootstrap process (see src/backend/catalog/README).

As you can see pg_amproc records refer some procedures.  Those procedures are actually the majority of source code behind of opclass.  Those procedures are defined in src/backend/access/gin/ginarrayproc.c.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Reply | Threaded
Open this post in threaded view
|

Re: GSoC 2017: Foreign Key Arrays

Mark Rofail
src/include/catalog/pg_amop.h
src/include/catalog/pg_amproc.h
src/include/catalog/pg_opclass.h
src/include/catalog/pg_opfamily.h
Thanks to Alexander's reply I have been able to jump from catalog table to table till I found the function I was looking for.

My goal is to add a new operator (@>(anyarray,anyelement)) to the (array_ops) op class.
I am going to post the steps I took to locate the procedure, the following is the trail of tables I followed. 

 

pg_opfamily


pg_opfamily defines operator families.

Link to docs

{

opfmethod; /* index access method opfamily is for */

opfname; /* name of this opfamily */

opfnamespace; /* namespace of this opfamily */

opfowner; /* opfamily owner */

}

 

gin=# select oid, * from pg_opfamily where opfmethod = 2742;

oid  | opfmethod |    opfname     | opfnamespace | opfowner

------+-----------+----------------+--------------+----------

2745 |      2742 | array_ops      |           11 |       10

3659 |      2742 | tsvector_ops   |           11 |       10

4036 |      2742 | jsonb_ops      |           11 |       10

4037 |      2742 | jsonb_path_ops |           11 |       10

(4 rows)

 


as this table defines operator families I won't need to modify them.

 

pg_opclass

pg_opclass defines index access method operator classes.

Link to docs

{
opcmethod; /* index access method opclass is for */
opcname; /* name of this opclass */
opcnamespace; /* namespace of this opclass */
opcowner; /* opclass owner */
opcfamily; /* containing operator family */
opcintype; /* type of data indexed by opclass */
opcdefault; /* T if opclass is default for opcintype
opckeytype; /* type of data in index, or InvalidOid */
}

gin=# select * from pg_opclass where opcfamily = 2745;

opcmethod |  opcname  | opcnamespace | opcowner | opcfamily | opcintype | opcdefault | opckeytype

-----------+-----------+--------------+----------+-----------+-----------+------------+------------

     2742 | array_ops |           11 |       10 |      2745 |      2277 | t          |       2283

(1 row)

 

as this table defines operator classes I won't need to modify them.
this led me to pg_amproc

 

pg_amproc

pg_amproc stores information about support procedures associated with access method operator families.

Link to docs

{

amprocfamily; /* the index opfamily this entry is for */

amproclefttype; /* procedure's left input data type */

amprocrighttype; /* procedure's right input data type */

amprocnum[1]; /* support procedure index */

amproc; /* OID of the proc */

}

gin=# select * from pg_amproc where amprocfamily = 2745;

amprocfamily | amproclefttype | amprocrighttype | amprocnum |           amproc

------------------+--------------------+---------------------+---------------+----------------------------

        2745            |           2277              |            2277             |         2                    | pg_catalog.ginarrayextract

        2745            |           2277              |            2277             |         3                    | ginqueryarrayextract

        2745            |           2277              |            2277             |         4                    | ginarrayconsistent

        2745            |           2277              |            2277             |         6                    | ginarraytriconsistent

(4 rows)

 

[1]amprocnum refers to this table

as this table defines support procedures I won't need to modify them.
this led me to pg_amop

 

pg_amop

pg_amop stores information about operators associated with access method operator families.

Link to docs

{

amopfamily; /* the index opfamily this entry is for */

amoplefttype; /* operator's left input data type */

amoprighttype; /* operator's right input data type */

amopstrategy; /* operator strategy number */

amoppurpose; /* is operator for 's'earch or 'o'rdering? */

amopopr; /* the operator's pg_operator OID */

amopmethod; /* the index access method this entry is for

amopsortfamily; /* ordering opfamily OID, or 0 if search op

}

 

=# select * from pg_amop where amopfamily = 2745;

amopfamily | amoplefttype | amoprighttype | amopstrategy | amoppurpose | amopopr | amopmethod | amopsortfamily

------------+--------------+---------------+--------------+-------------+---------+------------+----------------

      2745 |         2277 |          2277 |            1 | s           |    2750 |       2742 |              0

      2745 |         2277 |          2277 |            2 | s           |    2751 |       2742 |              0

      2745 |         2277 |          2277 |            3 | s           |    2752 |       2742 |              0

      2745 |         2277 |          2277 |            4 | s           |    1070 |       2742 |              0

(4 rows)

I will need to add a record of my new operator to this table by appending this line to src/include/catalog/pg_amop.h

DATA(insert (2745   2277 2277 1 s 2750 2742 0 ));

 

This will result in the following entry

=# select * from pg_amop where amopfamily = 2745;

amopfamily | amoplefttype | amoprighttype | amopstrategy | amoppurpose | amopopr | amopmethod | amopsortfamily

------------+--------------+---------------+--------------+-------------+---------+------------+----------------

      2745 |         2277 |          2283 |            1 | s           |    2750 |       2742 |              0


 this led me to pg_operator

pg_operator

pg_operator stores information about operators.

Link to docs

{
oprname; /* name of operator */
oprnamespace; /* OID of namespace containing this oper */
oprowner; /* operator owner */
oprkind; /* 'l', 'r', or 'b' */
oprcanmerge; /* can be used in merge join? */
oprcanhash; /* can be used in hash join? */
oprleft; /* left arg type, or 0 if 'l' oprkind */
oprright; /* right arg type, or 0 if 'r' oprkind */
oprresult; /* result datatype */
oprcom; /* OID of commutator oper, or 0 if none */
oprnegate; /* OID of negator oper, or 0 if none */
oprcode; /* OID of underlying function */
oprrest; /* OID of restriction estimator, or 0 */
oprjoin; /* OID of join estimator, or 0 */
}

 

postgres=# select * from pg_operator where oid = 2751;

oprname | oprnamespace | oprowner | oprkind | oprcanmerge | oprcanhash | oprleft | oprright | oprresult | oprcom | oprnegate |    oprcode    |   oprrest    |     oprjoin      

---------+--------------+----------+---------+-------------+------------+---------+----------+-----------+--------+-----------+---------------+--------------+------------------

@>      |           11 |       10 | b       | f           | f          |    2277 |     2277 |        16 |   2752 |         0 | arraycontains | arraycontsel | arraycontjoinsel

(1 row)

 

 

I will need to add a record of my new operator to this table by appending this line to src/include/catalog/pg_operator.h

However, as this is dependent on the procedure I have yet to create there are still uknown values


DATA(insert OID = <uniqueProcId> (  "@>"   PGNSP PGUID b f f 2277 2283 16 2752  0 arraycontainselem ???? ???? ));

DESCR("contains");

#define OID_ARRAY_CONTAINS_OP <uniqueProcId>


This will lead to this entry

oprname | oprnamespace | oprowner | oprkind | oprcanmerge | oprcanhash | oprleft | oprright | oprresult | oprcom | oprnegate |    oprcode    |   oprrest    |     oprjoin      

---------+--------------+----------+---------+-------------+------------+---------+----------+-----------+--------+-----------+---------------+--------------+------------------

@>      |           11 |       10 | b       | f           | f          |    2277 |     2283 |        16 |   2752 |         0 | arraycontainselem | ????         | ????

(1 row)


 this led me to pg_proc

 

pg_proc

pg_proc stores information about functions (or procedures)

Link to docs

{

proname; /* procedure name */

pronamespace; /* OID of namespace containing this proc */

proowner; /* procedure owner */

prolang; /* OID of pg_language entry */

procost; /* estimated execution cost */

prorows; /* estimated # of rows out (if proretset) */

provariadic; /* element type of variadic array, or 0 */

protransform; /* transforms calls to it during planning */

proisagg; /* is it an aggregate? */

proiswindow; /* is it a window function? */

prosecdef; /* security definer */

proleakproof; /* is it a leak-proof function? */

proisstrict; /* strict with respect to NULLs? */

proretset; /* returns a set? */

provolatile; /* see PROVOLATILE_ categories below */

proparallel; /* see PROPARALLEL_ categories below */

pronargs; /* number of arguments */

pronargdefaults; /* number of arguments with defaults */

prorettype; /* OID of result type */

proargtypes; /* parameter types (excludes OUT params) */

proallargtypes[1]; /* all param types (NULL if IN only) */

proargmodes[1]; /* parameter modes (NULL if IN only) */

proargnames[1]; /* parameter names (NULL if no names) */

proargdefaults; /* list of expression trees for argument

protrftypes[1]; /* types for which to apply transforms */

prosrc; /* procedure source text */

probin; /* secondary procedure info (can be NULL) */

proconfig[1]; /* procedure-local GUC settings */

proacl[1]; /* access permissions */

}

 

 

postgres=# select * from pg_proc where oid = 2748;

   proname    | pronamespace | proowner | prolang | procost | prorows | provariadic | protransform | proisagg | proiswindow | prosecdef | proleakproof | proisstrict | proretset | provolatile | proparallel | pron

args | pronargdefaults | prorettype | proargtypes | proallargtypes | proargmodes | proargnames | proargdefaults | protrftypes |    prosrc     | probin | proconfig | proacl

---------------+--------------+----------+---------+---------+---------+-------------+--------------+----------+-------------+-----------+--------------+-------------+-----------+-------------+-------------+-----

-----+-----------------+------------+-------------+----------------+-------------+-------------+----------------+-------------+---------------+--------+-----------+--------

arraycontains |           11 |       10 |      12 |       1 |       0 |           0 | -            | f        | f           | f         | f            | t           | f         | i           | s           |     

  2 |               0 |         16 | 2277 2277   |                |             |             |                |             | arraycontains |        |           |

(1 row)

I have yet to study this table thoroughly.
This finally led me to the arraycontains procedure in src/backend/utils/adt/arrayfuncs.c

Datum
arraycontains(PG_FUNCTION_ARGS)
{
AnyArrayType *array1 = PG_GETARG_ANY_ARRAY(0);
AnyArrayType *array2 = PG_GETARG_ANY_ARRAY(1);
Oid collation = PG_GET_COLLATION();
bool result;

result = array_contain_compare(array2, array1, collation, true,
  &fcinfo->flinfo->fn_extra);

/* Avoid leaking memory when handed toasted input. */
AARR_FREE_IF_COPY(array1, 0);
AARR_FREE_IF_COPY(array2, 1);

PG_RETURN_BOOL(result);
}


This corrosponds to the operator @<(anyarray, anyarray) which is the generalised form of my proposed operator @<(anyarray, anyelement).
Studying the syntax will help me produce a function that follows the postgres style rules.
Reply | Threaded
Open this post in threaded view
|

Re: GSoC 2017: Foreign Key Arrays

Mark Rofail
• After finding the arraycontains function, I implemented
arraycontainselem that corresponds to the operator @<(anyarray,
anyelem)
       ◦ Please read the attached patch file to view my progress.

•  In addition to src/backend/utils/adt/arrayfuncs.c where I
implemented arraycontainselem.

       ◦ I also edited pg_amop (src/include/catalog/pg_amop.h) since
it stores information about operators associated with access method
operator families.

+DATA(insert ( 2745   2277 2283 2 s 2753 2742 0 ));
{
2745: Oid amopfamily; (denotes gin array_ops)
277: Oid amoplefttype; (denotes anyaray)
2283: Oid amoprighttype; (denotes anyelem)
5: int16 amopstrategy; /* operator strategy number */ (denotes the new
startegy that is yet to be created)
's': char amoppurpose; (denotes 's' for search)
2753: Oid amopopr; (denotes the new operator Oid)
2742: Oid amopmethod;(denotes gin)
0: Oid amopsortfamily; (0 since search operator)
}

       ◦ And pg_operator (src/include/catalog/pg_operator.h) since it
stores information about operators.
+DATA(insert OID = 2753 (  "@>"   PGNSP PGUID b f f 2277 2283 16 0  0
arraycontainselem 0 0 ));
{
 "@>": NameData oprname; /* name of operator */
Oid oprnamespace; /* OID of namespace containing this oper */
Oid oprowner; /* operator owner */
'b': char oprkind; /* 'l', 'r', or 'b' */ (denotes infix)
'f': bool oprcanmerge; /* can be used in merge join? */
'f': bool oprcanhash; /* can be used in hash join? */
277: Oid oprleft; (denotes anyaray)
2283: Oid oprright; (denotes anyelem)
16: Oid oprresult;  (denotes boolean)
0: Oid oprcom; /* OID of commutator oper, or 0 if none */ (needs to be
revisited)
0: Oid oprnegate; /* OID of negator oper, or 0 if none */ (needs to be
revisited)
arraycontainselem: regproc oprcode; /* OID of underlying function */
0: regproc oprrest; /* OID of restriction estimator, or 0 */
0: regproc oprjoin; /* OID of join estimator, or 0 */
}


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

elemOperatorV2.patch (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

regproc and when to schema-qualify

Chapman Flack
In reply to this post by Alexander Korotkov-4
I was idly following along in GSoC 2017: Foreign Key Arrays
when I noticed this:

=# select * from pg_amproc where amprocfamily = 2745;
 amprocfamily | amproclefttype | amprocrighttype | amprocnum |
amproc
--------------+----------------+-----------------+-----------+
         2745 |           2277 |            2277 |         2 |
pg_catalog.ginarrayextract
         2745 |           2277 |            2277 |         3 |
ginqueryarrayextract
...

where only ginarrayextract is schema-qualified. It seems to be
regproc's output procedure doing it:

=# select 2743::regproc, 2774::regproc;
          regproc           |       regproc
----------------------------+----------------------
 pg_catalog.ginarrayextract | ginqueryarrayextract


The manual says regproc "will display schema-qualified names on output
if the object would not be found in the current search path without
being qualified."

Is regproc displaying the schema in this case because there are two
overloaded flavors of ginarrayextract, though both are in pg_catalog?
Could it be searching for the object by name, ignoring the argument
signature, and just detecting that it hit one with a different OID first?

-Chap


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: regproc and when to schema-qualify

Tom Lane-2
Chapman Flack <[hidden email]> writes:
> The manual says regproc "will display schema-qualified names on output
> if the object would not be found in the current search path without
> being qualified."

That's less than the full truth :-(

> Is regproc displaying the schema in this case because there are two
> overloaded flavors of ginarrayextract, though both are in pg_catalog?

Yes, see the test in regprocout:

             * Would this proc be found (uniquely!) by regprocin? If not,
             * qualify it.

Of course, in a situation like this, schema-qualification is not enough to
save the day; regprocin will still fail because the name is ambiguous.
You really need to use regprocedure not regproc if you want any guarantees
about the results.

(The fact that we have regproc at all is a bit of a historical accident,
caused by some limitations of the bootstrap mode.)

                        regards, tom lane


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: GSoC 2017: Foreign Key Arrays

Mark Rofail
In reply to this post by Mark Rofail
Updates till now:
  • added a record to pg_proc (src/include/catalog/pg_proc.h)
  • modified opr_sanity regression check expected results
  • implemented a  low-level function called `array_contains_elem` as an equivalent to `array_contain_compare` but accepts anyelement instead of anyarray as the right operand. This is more efficient than constructing an array and then immediately deconstructing it.
Questions:
  • I'd like to check that anyelem and anyarray have the same element type. but anyelem is obtained from PG_FUNCTION_ARGS as a Datum. How can I make such a check?
Best Regards,
Mark Rofail


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

elemOperatorV3.patch (10K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: GSoC 2017: Foreign Key Arrays

Alexander Korotkov-4
On Sun, Jun 18, 2017 at 12:41 AM, Mark Rofail <[hidden email]> wrote:
Questions:
  • I'd like to check that anyelem and anyarray have the same element type. but anyelem is obtained from PG_FUNCTION_ARGS as a Datum. How can I make such a check?

As I know, it's implicitly checked during query analyze stage.  You don't have to implement your own check inside function implementation.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 
Reply | Threaded
Open this post in threaded view
|

Re: GSoC 2017: Foreign Key Arrays

Mark Rofail
Okay, so major breakthrough.

Updates:
  • The operator @>(anyarray, anyelement) is now functional
    • The segmentation fault was due to applying PG_FREE_IF_COPY on a datum when it should only be applied on TOASTed inputs
    • The only problem now is if for example you apply the operator as follows  '{AAAAAAAAAA646'}' @> 'AAAAAAAAAA646' it maps to @>(anyarray, anyarray) since 'AAAAAAAAAA646' is interpreted as char[] instead of Text
  • Added some regression tests (src/test/regress/sql/arrays.sql) and their results(src/test/regress/expected/arrays.out)
  • wokred on the new GIN strategy, I don't think it would vary much from GinContainsStrategy.
What I plan to do:
  • I need to start working on the Referential Integrity code but I don't where to start
Best Regards,
Mark Rofail




--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

elemOperatorV3_1.patch (22K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: GSoC 2017: Foreign Key Arrays

Alvaro Herrera-9
Mark Rofail wrote:

> Okay, so major breakthrough.
>
> *Updates:*
>
>    - The operator @>(anyarray, anyelement) is now functional
>       - The segmentation fault was due to applying PG_FREE_IF_COPY on a
>       datum when it should only be applied on TOASTed inputs
>       - The only problem now is if for example you apply the operator as
>       follows  '{AAAAAAAAAA646'}' @> 'AAAAAAAAAA646' it maps to @>(anyarray,
>       anyarray) since 'AAAAAAAAAA646' is interpreted as char[] instead of Text
>    - Added some regression tests (src/test/regress/sql/arrays.sql) and
>    their results(src/test/regress/expected/arrays.out)
>    - wokred on the new GIN strategy, I don't think it would vary much from
>    GinContainsStrategy.

OK, that's great.

> *What I plan to do:*
>
>    - I need to start working on the Referential Integrity code but I don't
>    where to start

You need to study the old patch posted by Marco Nenciarini.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: GSoC 2017: Foreign Key Arrays

Mark Rofail
What I did:
  • read into the old patch but couldn't apply it since it's quite old. It needs to be rebased and that's what I am working on.  It's a lot of work. 
    • incomplete patch can be found attached here
Bugs
  • problem with the @>(anyarray, anyelement) opertator: if for example, you apply the operator as follows  '{AAAAAAAAAA646'}' @> 'AAAAAAAAAA646' it maps to @>(anyarray, anyarray) since 'AAAAAAAAAA646' is interpreted as char[] instead of Text
Suggestion:
  • since I needed to check if the Datum was null and its type, I had to do it in the arraycontainselem and pass it as a parameter to the underlying function array_contains_elem. I'm proposing to introduce a new struct like ArrayType, but ElementType along all with brand new MACROs to make dealing with anyelement easier in any polymorphic context.

Best Regards,
Mark Rofail

On Tue, Jun 20, 2017 at 12:19 AM, Alvaro Herrera <[hidden email]> wrote:
Mark Rofail wrote:
> Okay, so major breakthrough.
>
> *Updates:*
>
>    - The operator @>(anyarray, anyelement) is now functional
>       - The segmentation fault was due to applying PG_FREE_IF_COPY on a
>       datum when it should only be applied on TOASTed inputs
>       - The only problem now is if for example you apply the operator as
>       follows  '{AAAAAAAAAA646'}' @> 'AAAAAAAAAA646' it maps to @>(anyarray,
>       anyarray) since 'AAAAAAAAAA646' is interpreted as char[] instead of Text
>    - Added some regression tests (src/test/regress/sql/arrays.sql) and
>    their results(src/test/regress/expected/arrays.out)
>    - wokred on the new GIN strategy, I don't think it would vary much from
>    GinContainsStrategy.

OK, that's great.

> *What I plan to do:*
>
>    - I need to start working on the Referential Integrity code but I don't
>    where to start

You need to study the old patch posted by Marco Nenciarini.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

incomplete-Array-ELEMENT-foreign-key-v2-REBASED-ddb5fdc.patch (35K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: GSoC 2017: Foreign Key Arrays

Alexander Korotkov-4
On Mon, Jun 26, 2017 at 2:26 AM, Mark Rofail <[hidden email]> wrote:
What I did:
  • read into the old patch but couldn't apply it since it's quite old. It needs to be rebased and that's what I am working on.  It's a lot of work. 
    • incomplete patch can be found attached here
Have you met any particular problem here?  Or is it just a lot of mechanical work?

Bugs
  • problem with the @>(anyarray, anyelement) opertator: if for example, you apply the operator as follows  '{AAAAAAAAAA646'}' @> 'AAAAAAAAAA646' it maps to @>(anyarray, anyarray) since 'AAAAAAAAAA646' is interpreted as char[] instead of Text
I don't think it is bug.  When types are not specified explicitly, then optimizer do its best on guessing them.  Sometimes results are counterintuitive to user.  But that is not bug, it's probably a room for improvement.  And I don't think this improvement should be subject of this GSoC.  Anyway, array FK code should use explicit type cast, and then you wouldn't meet this problem.

On the other hand, you could just choose another operator name for arraycontainselem.  Then such problem probably wouldn't occur.

Suggestion:
  • since I needed to check if the Datum was null and its type, I had to do it in the arraycontainselem and pass it as a parameter to the underlying function array_contains_elem. I'm proposing to introduce a new struct like ArrayType, but ElementType along all with brand new MACROs to make dealing with anyelement easier in any polymorphic context.
You don't need to do explicit check for nulls, because arraycontainselem is marked as strict function.  Executor never pass null inputs to your function if its declared as strict.  See evaluate_function().
Also, during query planning it's checked that all polymorphic are consistent between each other.  See https://www.postgresql.org/docs/devel/static/extend-type-system.html#extend-types-polymorphic and check_generic_type_consistency() for details.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Reply | Threaded
Open this post in threaded view
|

Re: GSoC 2017: Foreign Key Arrays

Mark Rofail
On Mon, Jun 26, 2017 at 6:44 PM, Alexander Korotkov <[hidden email]> wrote:
Have you met any particular problem here?  Or is it just a lot of mechanical work?

Just A LOT of mechanictal work, thankfully. The patch is now rebased and all regress tests have passed (even the element_foreign_key). Please find the patch below ! 

I don't think it is bug. 
That's good news ! 

What I plan to do next 
  • study ri_triggers.c (src/backend/utils/adt/ri_triggers.c) since this is where the new RI code will reside
Best Regards,
Mark Rofail



--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Array-ELEMENT-foreign-key-v2-REBASED-f0256c7.patch (150K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: GSoC 2017: Foreign Key Arrays

Alvaro Herrera-9
Mark Rofail wrote:

> On Mon, Jun 26, 2017 at 6:44 PM, Alexander Korotkov <[hidden email]>
> wrote:
>
> > Have you met any particular problem here?  Or is it just a lot of
> > mechanical work?
> >
>
> Just A LOT of mechanictal work, thankfully. The patch is now rebased and
> all regress tests have passed (even the element_foreign_key). Please find
> the patch below !

Great!

> *What I plan to do next *
>
>    - study ri_triggers.c (src/backend/utils/adt/ri_triggers.c) since this
>    is where the new RI code will reside

Any news?


--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: GSoC 2017: Foreign Key Arrays

Mark Rofail
I was unaware that there was a ver3 patch: https://www.postgresql.org/message-id/28617.1351095467%40sss.pgh.pa.us

I rebased this also (rebased version attached here).

There were considerable changes in syntax between v2 and v3, and different approaches in the implementations, so I have to restudy ri_triggers.c but at least the old patch gave me a good idea of what's going on.

As for the limitations of the patch:
  1. Only one "ELEMENT" column allowed in a multi-column key
    •  - e.g. FOREIGN KEY (c1, ELEMENT c2, ELEMENT c3) REFERENCES t1 (u1, u2, u3) will throw an error
  2. Supported actions:
    •  - NO ACTION
    •  - RESTRICT
  3. The use of count(distinct y) in the SQL statements if the referencing column is an array. Since its equality operator is different from the PK unique index equality operator this leads to a broken statement
    • regression=# create table ff (f1 float8 primary key);
      CREATE TABLE
      regression=# create table cc (f1 numeric references ff);
      CREATE TABLE
      regression=# create table cc2 (f1 numeric[], foreign key(each element of f1) references ff);
      ERROR:  foreign key constraint "cc2_f1_fkey" cannot be implemented
      DETAIL:  Key column "f1" has element type numeric which does not have a default btree operator class that's compatible with class "float8_ops".
  4. undesirable dependency on default opclass semantics in the patch, which is that it supposes it can use array_eq() to detect whether or not the referencing column has changed.  But I think that can be fixed without undue pain by providing a refactored version of array_eq() that can be told which element-comparison function to use
  5. fatal performance issues.  If you issue any UPDATE or DELETE against the PK table, you get a query like this for checking to see if the RI constraint would be violated:
    SELECT 1 FROM ONLY fktable x WHERE $1 = ANY (fkcol) FOR SHARE OF x;
  6. cross-type FKs are unsupported
These are the limitations I gathered from the previous mailing list: https://www.postgresql.org/message-id/flat/1343842863.5162.4.camel%40greygoo.devise-it.lan#1343842863.5162.4.camel@...

I am pretty sure other limitations will arise. 

I am confident that between the time the patch was implemented(2012) and now postgres has grown considerably, the array functions are now more robust and will help in resolving many issues. 

I would like to point out that Limitation #5 is the first limitation we should eliminate as it deems the feature unbeneficial. 

I would like to thank Marco Nenciarini, Gabriele, Gianni and Tom Lane, for their hard work in the previous patches and anyone else I forgot. 

As for limitations for the anyarray @> anyelem operator's limitations:
  1. since anyarray @< anyarray and  anyarray @> anyelem  have the same symbol when  a statemnt like this is executed '{AAAAAAAAAA646'}' @> 'AAAAAAAAAA646' it's mapped to  anyarray @< anyarray instead of anyarray @> anyelem
    • but as Alexander pointed out
On Mon, Jun 26, 2017 at 6:44 PM, Alexander Korotkov <[hidden email]> wrote:
When types are not specified explicitly, then optimizer do its best on guessing them.  Sometimes results are counterintuitive to user.  But that is not bug, it's probably a room for improvement.  And I don't think this improvement should be subject of this GSoC.  Anyway, array FK code should use explicit type cast, and then you wouldn't meet this problem.

What I plan to do next: 
  • located the SQL statements triggered at any insert or update and will now "convert" them to use GIN.  However,  NO ACTION and RESTRICT are the only actions supported right now
so that's how I will spend the next week.

Best Regards,
Mark Rofail


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Array-ELEMENT-foreign-key-v3-REBASED-42794d6.patch (153K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: GSoC 2017: Foreign Key Arrays

Mark Rofail
To make the queries fired by the RI triggers GIN indexed. We need to ‒ as Tom Lane has previously suggested[1] ‒ to replace the query 

SELECT 1 FROM ONLY fktable x WHERE $1 = ANY (fkcol) FOR SHARE OF x;

with 

SELECT 1 FROM ONLY fktable x WHERE ARRAY[$1] <@ fkcol FOR SHARE OF x;

but since we have @<(anyarray, anyelement) it can be improved to 

SELECT 1 FROM ONLY fktable x WHERE $1 @> fkcol FOR SHARE OF x;

and the piece of code responsible for all of this is ri_GenerateQual in ri_triggers.c.

How to accomplish that is the next step. I don't know if we should hardcode the "@>" symbol or if we just index the fk table then ri_GenerateQual would be able to find the operator on it's own.

What I plan to do:
  • study how to index the fk table upon its creation. I suspect this can be done in tablecmds.c
Questions:
  • how can you programmatically in C index a table?

Best Regards,
Mark Rofail



--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

GIN-fk-RI-code-v1.patch (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: GSoC 2017: Foreign Key Arrays

Mark Rofail
 What I am working on
  • since we want to create an index on the referencing column, I am working on firing a 'CREATE INDEX' query programatically right after the 'CREATE TABLE' query
    • The problem I ran into is how to specify my Strategy (GinContainsElemStrategy) within the CREATE INDEX query. For example: CREATE INDEX ON fktable USING gin (fkcolumn array_ops)
      Where does the strategy number fit?
    • The patch is attached here, is the approach I took to creating an index programmatically, correct? 
Best Regard,
Mark Rofail



--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

GIN-fk-RI-code-v1.1.patch (3K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: GSoC 2017: Foreign Key Arrays

Alexander Korotkov-4
On Sun, Jul 9, 2017 at 2:35 AM, Mark Rofail <[hidden email]> wrote:
 What I am working on
  • since we want to create an index on the referencing column, I am working on firing a 'CREATE INDEX' query programatically right after the 'CREATE TABLE' query
    • The problem I ran into is how to specify my Strategy (GinContainsElemStrategy) within the CREATE INDEX query. For example: CREATE INDEX ON fktable USING gin (fkcolumn array_ops)
      Where does the strategy number fit?
    • The patch is attached here, is the approach I took to creating an index programmatically, correct? 

Could you, please, specify idea of what you're implementing in more detail?  AFACS, you're going to automatically create GIN indexes on FK array columns.  However, if we don't do this for regular columns, why should we do for array columns?  For me that sounds like a separate feature which should be implemented for both regular and array FK columns.

Regarding your questions.  If you need to create index supporting given operator, you shouldn't take care about strategy number.  Strategy number makes sense only in opclass internals.  You just need to specify opclass which support your operator.  In principle, you can find all of them in pg_amop table.  Alternatively you can just stick to GIN array_ops.

In general the approach you create index looks OK.  It's OK to manually create DDL node and execute it.  As you can see, this is done in many other places of backend code.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

1234 ... 6
Previous Thread Next Thread