[PATCH] Opclass parameters

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

[PATCH] Opclass parameters

Nikita Glukhov
Hi hackers.

I would like to present patch set implementing opclass parameters.

This feature was recently presented at pgconf.ru:
http://www.sai.msu.su/~megera/postgres/talks/opclass_pgconf.ru-2018.pdf

A analogous work was already done by Nikolay Shaplov two years ago:
https://www.postgresql.org/message-id/5213596.TqFRiqmCTe%40nataraj-amd64
But this patches are not based on it, although they are very similar.


Opclass parameters can give user ability to:
  * Define the values of the constants that are hardcoded now in the opclasses
    depending on the indexed data.
  * Specify what to index for non-atomic data types (arrays, json[b], tsvector).
    Partial index can only filter whole rows.
  * Specify what indexing algorithm to use depending on the indexed data.


Description of patches:

1. Infrastructure for opclass parameters.

SQL grammar is changed only for CREATE INDEX statement: parenthesized parameters
in reloptions format are added after column's opclass name.  Default opclass can
be specified with DEFAULT keyword:

CREATE INDEX idx ON tab USING am (
    {expr {opclass | DEFAULT} ({name=value} [,...])} [,...]
);

Example for contrib/intarray:

CREATE INDEX ON arrays USING gist (
   arr gist__intbig_ops (siglen = 32),
   arr DEFAULT (numranges = 100)
);

\d arrays
                 Table "public.arrays"
  Column |   Type    | Collation | Nullable | Default
--------+-----------+-----------+----------+---------
  arr    | integer[] |           |          |
Indexes:
     "arrays_arr_arr1_idx" gist (arr gist__intbig_ops (siglen='32'), arr gist__int_ops (numranges='100'))


I decided to store parameters in text[] column pg_index.indoptions near to
existing columns like indkey, indcollation, indclass, indoption. I-th element of
indoptions[] is a text array of parameters of i-th index column serialized into
a string.  Each parameter is stored as 'name=value' text string like ordinal
reloptions.  There is another way to store opclass parameters: store them in
the existing column pg_attribute.attoptions (as it was done by Nikolay Shaplov)
and there will be no need to serialize reloptions to a text array element.

Example query showing how parameters are stored:

SELECT ARRAY(
     SELECT (pg_identify_object('pg_opclass'::regclass, opcid, 0)).name
     FROM unnest(indclass::int[]) opcid
   ) indclass, indoptions
FROM pg_index
WHERE indoptions IS NOT NULL;

              indclass             |             indoptions
----------------------------------+------------------------------------
  {gist__intbig_ops,gist__int_ops} | {"{siglen=32}","{numranges=100}"}
  {jsonb_path_ops}                 | {"{projection=$.tags[*].term}"}
(2 rows)


Each access method supporting opclass parameters specifies amopclassoptions
routine for transformation of text[] parameters datum into a binary bytea
structure which will be cached in RelationData and IndexOptInfo structures:

typedef bytea *(*amopclassoptions_function) (
    Relation index, AttrNumber colnum, Datum indoptions, bool validate
);


If access method wants simply to delegate parameters processing to one of
column opclass's support functions, then it can use
index_opclass_options_generic() subroutine in its amopclassoptions
implementation:

bytea *index_opclass_options_generic(
    Relation relation, AttrNumber attnum, uint16 procnum,
    Datum indoptions, bool validate
);

This support functions must have the following signature:
internal (options internal, validate bool).
Opclass parameters are passed as a text[] reloptions datum, returned pointer to
a bytea structure with parsed parameter values.

Opclass can use new functions parseLocalRelOptions(),
parseAndFillLocalRelOptions() for reloptions parsing.  This functions differ
from the standard parseRelOptions() in that a local array of reloptions
descriptions is passed here, not a global relopt_kind.  But it seems that
reloptions processing still needs deeper refactoring like the one already done
by Nikolay Shaplov (https://www.postgresql.org/message-id/flat/2146419.veIEZdk4E4%40x200m#2146419.veIEZdk4E4@x200m).


2. Opclass parameters support in GiST indices.

Parametrized GiST opclass specifies optional 10th (GIST_OPCLASSOPT_PROC)
support function with the following signature:
internal (options internal, validate bool)

Returned parsed bytea pointer with parameters will be passed to all support
functions in the last argument.

3. Opclass parameters support in GIN indices.

Everything is the same as for GiST, except for the optional support
function number which is 7 (GIN_OPCLASSOPTIONS_PROC) here.

4. Opclass parameters for GiST tsvector_ops
5. Opclass parameters for contrib/intarray
6. Opclass parameters for contrib/ltree
7. Opclass parameters for contrib/pg_trgm
8. Opclass parameters for contrib/hstore

This 5 patches for GiST opclasses are very similar: added optional 'siglen'
parameter for specifying signature length.  Default signature length is left
equal to the hardcoded value that was here before. Also added 'numranges'
parameter for gist__int_ops.



We also have two more complex unfinished patches for GIN opclasses which
should be posted in separate threads:

  * tsvector_ops: added parameter 'weights' for specification of indexed
    lexeme's weight groups.  This parameter can reduce index size and its
    build/update time and can also eliminate recheck.  By default, all weights
    are indexed within the same group.
   
  * jsonb_ops: added jsonpath parameter 'projection' for specification of
    indexed paths in jsonb (this patch depends on SQL/JSON jsonpath patch).
    Analogically to tsvector_ops, this parameter can reduce index size and its
    build/update time, but can not eliminate recheck.


--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


0001-opclass-parameters-v01.patch (42K) Download Attachment
0002-opclass-parameters-GiST-v01.patch (13K) Download Attachment
0003-opclass-parameters-GIN-v01.patch (14K) Download Attachment
0004-opclass-parameters-GiST-tsvector_ops-v01.patch (32K) Download Attachment
0005-opclass-parameters-contrib_intarray-v01.patch (56K) Download Attachment
0006-opclass-parameters-contrib_ltree-v01.patch (41K) Download Attachment
0007-opclass-parameters-contrib_pg_trgm-v01.patch (23K) Download Attachment
0008-opclass-parameters-contrib_hstore-v01.patch (17K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Opclass parameters

Nikolay Shaplov
В письме от 28 февраля 2018 00:46:36 пользователь Nikita Glukhov написал:

> I would like to present patch set implementing opclass parameters.
>
> This feature was recently presented at pgconf.ru:
> http://www.sai.msu.su/~megera/postgres/talks/opclass_pgconf.ru-2018.pdf
>
> A analogous work was already done by Nikolay Shaplov two years ago:
> https://www.postgresql.org/message-id/5213596.TqFRiqmCTe%40nataraj-amd64
> But this patches are not based on it, although they are very similar.
Hi!

You know, I am still working on this issue.

When I started to work on it, I found out that option code is not flexible at
all, and you can' reuse it for options that are not relOptions.

I gave your patch just a short glance for now, but as far as I can you start
deviding options into global and local one. I am afraid it will create grater
mess than it is now.

What I am doing right now, I am creating a new reloption internal API, that
will allow to create any option in any place using the very same code.

I think it should be done first, and then use it for opclass parameters and
any kind of options you like.

The big patch is here https://commitfest.postgresql.org/14/992/ (I am afraid
it will not apply to current master as it is, It is quite old. But you can get
the idea)

The smaller parts of the patch that in current commitfest are

https://commitfest.postgresql.org/17/1536/
https://commitfest.postgresql.org/17/1489/

You can help reviewing them. Then the whole thing will go faster. The second
patch is quite trivial. The first will also need attention of someone who is
really good in understanding postgres internals.

-----------

Concerning the patch that you've provided. I've just have a short look. But I
already have some question.

1. I've seen you've added a new attribute into pg_index. Why??!!
As far as I can get, if have index built on several columns (A1, A2, A3) you
can set, own opclass for each column. And set individual options for each
opclass if we are speaking about options. So I would expect to have these
options not in pg_index, but in pg_attribute. And we already have one there:
attoptions.I just do not get how you have come to per-index options. May be I
should read code more attentively...


2. Your patch does not provide any example of your new tool usage. In my
prototype patch I've shown the implementation of opclass options for intarray.
May be you should do the same. (Use my example it will be more easy then do it
from scratch). It will give more understanding of how this improvement can be
used.

3.

--- a/src/test/regress/expected/btree_index.out
+++ b/src/test/regress/expected/btree_index.out
@@ -150,3 +150,8 @@ vacuum btree_tall_tbl;
 -- need to insert some rows to cause the fast root page to split.
 insert into btree_tall_tbl (id, t)
   select g, repeat('x', 100) from generate_series(1, 500) g;
+-- Test unsupported btree opclass parameters
+create index on btree_tall_tbl (id int4_ops(foo=1));
+ERROR:  access method "btree" does not support opclass options
+create index on btree_tall_tbl (id default(foo=1));
+ERROR:  access method "btree" does not support opclass options

Are you sure we really need these in postgres???

---------------------------------

So my proposal is the following:
let's commit

https://commitfest.postgresql.org/17/1536/
https://commitfest.postgresql.org/17/1489/

I will provide a final patch for option engine refactoring in commit fest, and
you will write you implementation of opclass options on top of it. We can to
it simultaneously. Or I will write opclass options myself... I do not care who
will do oplcass options as long it is done using refactored options engine. If
you do it, I can review it.

>
>
> Opclass parameters can give user ability to:
>   * Define the values of the constants that are hardcoded now in the
> opclasses depending on the indexed data.
>   * Specify what to index for non-atomic data types (arrays, json[b],
> tsvector). Partial index can only filter whole rows.
>   * Specify what indexing algorithm to use depending on the indexed data.
>
>
> Description of patches:
>
> 1. Infrastructure for opclass parameters.
>
> SQL grammar is changed only for CREATE INDEX statement: parenthesized
> parameters in reloptions format are added after column's opclass name.
> Default opclass can be specified with DEFAULT keyword:
>
> CREATE INDEX idx ON tab USING am (
>     {expr {opclass | DEFAULT} ({name=value} [,...])} [,...]
> );
>
> Example for contrib/intarray:
>
> CREATE INDEX ON arrays USING gist (
>    arr gist__intbig_ops (siglen = 32),
>    arr DEFAULT (numranges = 100)
> );
>
> \d arrays
>                  Table "public.arrays"
>   Column |   Type    | Collation | Nullable | Default
> --------+-----------+-----------+----------+---------
>   arr    | integer[] |           |          |
> Indexes:
>      "arrays_arr_arr1_idx" gist (arr gist__intbig_ops (siglen='32'), arr
> gist__int_ops (numranges='100'))
>
>
> I decided to store parameters in text[] column pg_index.indoptions near to
> existing columns like indkey, indcollation, indclass, indoption. I-th
> element of indoptions[] is a text array of parameters of i-th index column
> serialized into a string.  Each parameter is stored as 'name=value' text
> string like ordinal reloptions.  There is another way to store opclass
> parameters: store them in the existing column pg_attribute.attoptions (as
> it was done by Nikolay Shaplov) and there will be no need to serialize
> reloptions to a text array element.
>
> Example query showing how parameters are stored:
>
> SELECT ARRAY(
>      SELECT (pg_identify_object('pg_opclass'::regclass, opcid, 0)).name
>      FROM unnest(indclass::int[]) opcid
>    ) indclass, indoptions
> FROM pg_index
> WHERE indoptions IS NOT NULL;
>
>               indclass             |             indoptions
> ----------------------------------+------------------------------------
>   {gist__intbig_ops,gist__int_ops} | {"{siglen=32}","{numranges=100}"}
>   {jsonb_path_ops}                 | {"{projection=$.tags[*].term}"}
> (2 rows)
>
>
> Each access method supporting opclass parameters specifies amopclassoptions
> routine for transformation of text[] parameters datum into a binary bytea
> structure which will be cached in RelationData and IndexOptInfo structures:
>
> typedef bytea *(*amopclassoptions_function) (
>     Relation index, AttrNumber colnum, Datum indoptions, bool validate
> );
>
>
> If access method wants simply to delegate parameters processing to one of
> column opclass's support functions, then it can use
> index_opclass_options_generic() subroutine in its amopclassoptions
> implementation:
>
> bytea *index_opclass_options_generic(
>     Relation relation, AttrNumber attnum, uint16 procnum,
>     Datum indoptions, bool validate
> );
>
> This support functions must have the following signature:
> internal (options internal, validate bool).
> Opclass parameters are passed as a text[] reloptions datum, returned pointer
> to a bytea structure with parsed parameter values.
>
> Opclass can use new functions parseLocalRelOptions(),
> parseAndFillLocalRelOptions() for reloptions parsing.  This functions differ
> from the standard parseRelOptions() in that a local array of reloptions
> descriptions is passed here, not a global relopt_kind.  But it seems that
> reloptions processing still needs deeper refactoring like the one already
> done by Nikolay Shaplov
> (https://www.postgresql.org/message-id/flat/2146419.veIEZdk4E4%40x200m#2146
> 419.veIEZdk4E4@x200m).
>
>
> 2. Opclass parameters support in GiST indices.
>
> Parametrized GiST opclass specifies optional 10th (GIST_OPCLASSOPT_PROC)
> support function with the following signature:
> internal (options internal, validate bool)
>
> Returned parsed bytea pointer with parameters will be passed to all support
> functions in the last argument.
>
> 3. Opclass parameters support in GIN indices.
>
> Everything is the same as for GiST, except for the optional support
> function number which is 7 (GIN_OPCLASSOPTIONS_PROC) here.
>
> 4. Opclass parameters for GiST tsvector_ops
> 5. Opclass parameters for contrib/intarray
> 6. Opclass parameters for contrib/ltree
> 7. Opclass parameters for contrib/pg_trgm
> 8. Opclass parameters for contrib/hstore
>
> This 5 patches for GiST opclasses are very similar: added optional 'siglen'
> parameter for specifying signature length.  Default signature length is left
> equal to the hardcoded value that was here before. Also added 'numranges'
> parameter for gist__int_ops.
>
>
>
> We also have two more complex unfinished patches for GIN opclasses which
> should be posted in separate threads:
>
>   * tsvector_ops: added parameter 'weights' for specification of indexed
>     lexeme's weight groups.  This parameter can reduce index size and its
>     build/update time and can also eliminate recheck.  By default, all
> weights are indexed within the same group.
>
>   * jsonb_ops: added jsonpath parameter 'projection' for specification of
>     indexed paths in jsonb (this patch depends on SQL/JSON jsonpath patch).
>     Analogically to tsvector_ops, this parameter can reduce index size and
> its build/update time, but can not eliminate recheck.
--
Do code for fun.

signature.asc (484 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Re: [PATCH] Opclass parameters

David Steele
Hi Nikita,

On 2/28/18 9:46 AM, Nikolay Shaplov wrote:

> В письме от 28 февраля 2018 00:46:36 пользователь Nikita Glukhov написал:
>
>> I would like to present patch set implementing opclass parameters.
>>
>> This feature was recently presented at pgconf.ru:
>> http://www.sai.msu.su/~megera/postgres/talks/opclass_pgconf.ru-2018.pdf
>>
>> A analogous work was already done by Nikolay Shaplov two years ago:
>> https://www.postgresql.org/message-id/5213596.TqFRiqmCTe%40nataraj-amd64
>> But this patches are not based on it, although they are very similar.
>
> You know, I am still working on this issue.

This patch was submitted to the 2018-03 CF at the last moment with no
prior discussion or review as far as I can tell. It appears to be
non-trivial and therefore not a good fit for the last CF for PG11.

In addition, based on Nikolay's response, I think the patch should be
marked Returned with Feedback until it is reconciled with the existing
patches.

Any objects to marking this Returned with Feedback?  Or, I can move it
to the next CF as is.

Regards,
--
-David
[hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Opclass parameters

Oleg Bartunov-2
In reply to this post by Nikolay Shaplov
On Wed, Feb 28, 2018 at 5:46 PM, Nikolay Shaplov <[hidden email]> wrote:

> Concerning the patch that you've provided. I've just have a short look. But I
> already have some question.
>
> 1. I've seen you've added a new attribute into pg_index. Why??!!
> As far as I can get, if have index built on several columns (A1, A2, A3) you
> can set, own opclass for each column. And set individual options for each
> opclass if we are speaking about options. So I would expect to have these
> options not in pg_index, but in pg_attribute. And we already have one there:
> attoptions.I just do not get how you have come to per-index options. May be I
> should read code more attentively...

this is what we want to discuss.

>
>
> 2. Your patch does not provide any example of your new tool usage. In my
> prototype patch I've shown the implementation of opclass options for intarray.
> May be you should do the same. (Use my example it will be more easy then do it
> from scratch). It will give more understanding of how this improvement can be
> used.

Hey, look on patches, there are many examples  !

>
> 3.
>
> --- a/src/test/regress/expected/btree_index.out
> +++ b/src/test/regress/expected/btree_index.out
> @@ -150,3 +150,8 @@ vacuum btree_tall_tbl;
>  -- need to insert some rows to cause the fast root page to split.
>  insert into btree_tall_tbl (id, t)
>    select g, repeat('x', 100) from generate_series(1, 500) g;
> +-- Test unsupported btree opclass parameters
> +create index on btree_tall_tbl (id int4_ops(foo=1));
> +ERROR:  access method "btree" does not support opclass options
> +create index on btree_tall_tbl (id default(foo=1));
> +ERROR:  access method "btree" does not support opclass options
>
> Are you sure we really need these in postgres???

Hey, this is a just a test to check unsupported opclass options !

Reply | Threaded
Open this post in threaded view
|

Re: Re: [PATCH] Opclass parameters

Oleg Bartunov-2
In reply to this post by David Steele
On Thu, Mar 1, 2018 at 7:02 PM, David Steele <[hidden email]> wrote:

> Hi Nikita,
>
> On 2/28/18 9:46 AM, Nikolay Shaplov wrote:
>> В письме от 28 февраля 2018 00:46:36 пользователь Nikita Glukhov написал:
>>
>>> I would like to present patch set implementing opclass parameters.
>>>
>>> This feature was recently presented at pgconf.ru:
>>> http://www.sai.msu.su/~megera/postgres/talks/opclass_pgconf.ru-2018.pdf
>>>
>>> A analogous work was already done by Nikolay Shaplov two years ago:
>>> https://www.postgresql.org/message-id/5213596.TqFRiqmCTe%40nataraj-amd64
>>> But this patches are not based on it, although they are very similar.
>>
>> You know, I am still working on this issue.
>
> This patch was submitted to the 2018-03 CF at the last moment with no
> prior discussion or review as far as I can tell. It appears to be
> non-trivial and therefore not a good fit for the last CF for PG11.

the idea is simple, the main problem is where to store parameters.
We hoped that we get a bright idea from developers.

>
> In addition, based on Nikolay's response, I think the patch should be
> marked Returned with Feedback until it is reconciled with the existing
> patches.


We proposed something that works and could be useful for people,
especially for people, who use complex json documents. It would
require minimal changes if Nikolay's patch, which is quite invasive,
will be committed in future.

What we need to discuss is the user-visible features - the syntax changes.


>
> Any objects to marking this Returned with Feedback?  Or, I can move it
> to the next CF as is.

I think that Returned with Feedback would be good. We will continue
discussion in -hackers.

>
> Regards,
> --
> -David
> [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Opclass parameters

David Steele
On 3/1/18 3:50 PM, Oleg Bartunov wrote:
> On Thu, Mar 1, 2018 at 7:02 PM, David Steele <[hidden email]> wrote:
>>
>> Any objections to marking this Returned with Feedback?  Or, I can move it
>> to the next CF as is.
>
> I think that Returned with Feedback would be good. We will continue
> discussion in -hackers.

Marked as Returned with Feedback.  Hopefully we'll see this patch in the
2018-09 CF.

--
-David
[hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Opclass parameters

Nikolay Shaplov
In reply to this post by Oleg Bartunov-2
В письме от 1 марта 2018 23:02:20 пользователь Oleg Bartunov написал:

> > 2. Your patch does not provide any example of your new tool usage. In my
> > prototype patch I've shown the implementation of opclass options for
> > intarray. May be you should do the same. (Use my example it will be more
> > easy then do it from scratch). It will give more understanding of how
> > this improvement can be used.
>
> Hey, look on patches, there are many examples  !
Oups...Sorry. I've looked at the patch from commitfest
https://commitfest.postgresql.org/17/1559/ and it have shown only the first
file. And When I read the letter I did not pay attention to attachments at
all. So I was sure there is only one there.

Yes. Now I see seven examples. But I think seven it is too many.
For each case a reviewer should make consideration if this parameter worth
moving to opclass options, or fixed definition in the C code is quite ok.
Doing it for whole bunch, may make it messy. I think, it would be good to
commit an implementation of opclass options, with a good example of usage. And
then commit patches for all cases where these options can be used.

But since it is now "Rejected with feedback", let's wait till autumn.

--
Do code for fun.

signature.asc (484 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Opclass parameters

Nikita Glukhov
On 02.03.2018 19:12, Nikolay Shaplov wrote:

> В письме от 1 марта 2018 23:02:20 пользователь Oleg Bartunov написал:
>>> 2. Your patch does not provide any example of your new tool usage. In my
>>> prototype patch I've shown the implementation of opclass options for
>>> intarray. May be you should do the same. (Use my example it will be more
>>> easy then do it from scratch). It will give more understanding of how
>>> this improvement can be used.
>> Hey, look on patches, there are many examples  !
> Oups...Sorry. I've looked at the patch from commitfest
> https://commitfest.postgresql.org/17/1559/ and it have shown only the first
> file. And When I read the letter I did not pay attention to attachments at
> all. So I was sure there is only one there.
>
> Yes. Now I see seven examples. But I think seven it is too many.
> For each case a reviewer should make consideration if this parameter worth
> moving to opclass options, or fixed definition in the C code is quite ok.
> Doing it for whole bunch, may make it messy. I think, it would be good to
> commit an implementation of opclass options, with a good example of usage. And
> then commit patches for all cases where these options can be used.

There are 5 examples for GiST opclasses, not 7, and they are almost
identical -- in all of them added 'siglen' parameter for signature length
specification.

> But since it is now "Rejected with feedback", let's wait till autumn.

We don't want to wait that long.  But now we only need to сome to an agreement
about CREATE INDEX syntax and where to store the opclass parameters.

--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Opclass parameters

Nikita Glukhov

On 03.03.2018 1:32, Nikita Glukhov wrote:

> On 02.03.2018 19:12, Nikolay Shaplov wrote:
>> But since it is now "Rejected with feedback", let's wait till autumn.
>
> We don't want to wait that long.  But now we only need to сome to an
> agreement
> about CREATE INDEX syntax and where to store the opclass parameters.

Attached 2nd version of the patches. Nothing has changed since March,
this is just a rebased version.

CREATE INDEX syntax and parameters storage method still need discussion.

--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

0001-Add-opclass-parameters-v02.patch (43K) Download Attachment
0002-Add-opclass-parameters-to-GiST-v02.patch (14K) Download Attachment
0003-Add-opclass-parameters-to-GIN-v02.patch (15K) Download Attachment
0004-Add-opclass-parameters-to-GiST-tsvector_ops-v02.patch (32K) Download Attachment
0005-Add-opclass-parameters-to-contrib-intarray-v02.patch (57K) Download Attachment
0006-Add-opclass-parameters-to-contrib-ltree-v02.patch (42K) Download Attachment
0007-Add-opclass-parameters-to-contrib-pg_trgm-v02.patch (23K) Download Attachment
0008-Add-opclass-parameters-to-contrib-hstore-v02.patch (18K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Opclass parameters

Nikolay Shaplov
В письме от 15 ноября 2018 18:26:43 пользователь Nikita Glukhov написал:

> >> But since it is now "Rejected with feedback", let's wait till autumn.
> >
> > We don't want to wait that long.  But now we only need to сome to an
> > agreement
> > about CREATE INDEX syntax and where to store the opclass parameters.
>
> Attached 2nd version of the patches. Nothing has changed since March,
> this is just a rebased version.
>
> CREATE INDEX syntax and parameters storage method still need discussion.
Hi! I'd like to review your patch if you do  not mind. Since I am familiar
with the subject. I'll add it to the TOP 3 of my IT TODO list :-)

But even without carefully reading the code I have several important
questions:

1. I am sure that this patch should contain code for opclass praram generic
implementation, and _one_ example of how it is used for certain opclass.
From my point of view, we should not add all hardcoded constant as opclass
params just because we can. It should be done only when it is really needed.
And each case needs special consideration. This is not only about code
complexity, this worries me less, but also about documentation complexity.
Once option is added, it should be documented. This will make documentation
more hard to understand (and completely unexperienced users reads
documentation too). If this option is needed, this is price we all pay. But if
nobody really use it, I see no reason to make things more complex for
everyone.

2. The second question, is why you create new column in one of the table of
pg_catalog, when we have attopitions in attribute description. And each
indexed column in index is technically a relation attribute. Options of index
column should be stored there, from my point of view (yes I know it is a
hassle to pass it there, but I did in in my preview, it can be done)
2.1. I did not read the code, I'll do it some time late, but the main question
I have is, how you will manage case, when you set different values for same
options of different columns. like:

CREATE INDEX ON arrays USING gist (
   arr1 gist__intbig_ops (siglen = 32),
   arr2 gist__intbig_ops (siglen = 64)
);

It is easitly solved when you store them in attoptions. But my question how do
you manage it... (I'll get the answer from the code soon, I hope)


   

--
Do code for fun.

signature.asc (484 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Opclass parameters

Nikolay Shaplov
In reply to this post by Nikita Glukhov
В письме от 15 ноября 2018 18:26:43 пользователь Nikita Glukhov написал:

> Attached 2nd version of the patches. Nothing has changed since March,
> this is just a rebased version.
>
> CREATE INDEX syntax and parameters storage method still need discussion.
I've played around a bit with you patch and come to some conclusions, I'd like
to share. They are almost same as those before, but now there are more
details.

Again some issues about storing opclass options in pg_inedx:

1. Having both indoption and indoptions column in pg_index will make someone's
brain explode for sure. If not, it will bring troubles when people start
confusing them.

2. Now I found out how do you store option values for each opclass: text[] of
indoptions in pg_index is not the same as text[] in
reloptions in pg_catalog (and it brings more confusion). In reloption each
member of the array is a single option.

reloptions          | {fillfactor=90,autovacuum_enabled=false}

In indoptions, is a whole string of options for one of the indexed attributes,
each array item has all options for one indexed attribute. And this string
needs furthermore parsing, that differs from reloption parsing.

indoptions     | {"{numranges=150}","{numranges=160}"}


This brings us to the following issues:

2a. pg_index stores properties of index in general. Properties of each indexed
attributes is stored in pg_attribute table. If we follow this philosophy
it is wrong to add any kind of per-attribute array values into pg_index. These
values should be added to pg_attribute one per each pg_attribute entry.

2b. Since you've chosen method of storage that differs from one that is used
in reloptions, that will lead to two verstions of code that processes the
attributes. And from now on, if we accept this, we should support both of them
and keep them in sync. (I see that you tried to reuse as much code as
possible, but still you added some more that duplicate current reloptions
functionality.)

I know that relotions code is not really suitable for reusing. This was the
reason why I started solving oplcass option task with rewriting reloptions
code,to make it 100% reusable for any kind of options. So I would offer you
again to join me as a reviewer of that code. This will make opclass code more
simple and more sensible, if my option code is used...

3. Speaking of sensible code

Datum
g_int_options(PG_FUNCTION_ARGS)
{
   Datum       raw_options = PG_GETARG_DATUM(0);
   bool        validate = PG_GETARG_BOOL(1);
   relopt_int  siglen =
       { {"numranges", "number of ranges for compression", 0, 0, 9,
RELOPT_TYPE_INT },
           G_INT_NUMRANGES_DEFAULT, 1, G_INT_NUMRANGES_MAX };
   relopt_gen *optgen[] = { &siglen.gen };
   int         offsets[] = { offsetof(IntArrayOptions, num_ranges) };
   IntArrayOptions *options =
       parseAndFillLocalRelOptions(raw_options, optgen, offsets, 1,
                                   sizeof(IntArrayOptions), validate);

   PG_RETURN_POINTER(options);
}

It seems to be not a very nice hack.
What would you do if you would like to have both int, real and boolean options
for one opclass? I do not know how to implement it using this code.
We have only int opclass options for now, but more will come and we should be
ready for it.

4. Now getting back to not adding opclass options wherever we can, just
because we can:

4a. For inrarray there were no opclass options tests added. I am sure there
should be one, at least just to make sure it still does not segfault when you
try to set one. And in some cases more tests can be needed. To add and review
them one should be familiar with this opclass internals. So it is good when
different people do it for different opclasses

4b. When you add opclass options instead of hardcoded values, it comes to
setting minimum and maximum value. Why do you choose 1000 as maximum
for num_ranges in gist_int_ops in intarray? Why 1 as minimum? All these
decisions needs careful considerations and can't be done for bunch of
opclasses just in one review.

4c. Patch usually take a long path from prototype to final commit. Do you
really want to update all these opclasses code each time when some changes
in the main opclass option code is made? ;-)

So I would suggest to work only with intarray and add other opclasses later.

5. You've been asking about SQL grammar

> CREATE INDEX idx ON tab USING am (
>    {expr {opclass | DEFAULT} ({name=value} [,...])} [,...]
> );

As for me I do not really care about it. For me all the solutions is
acceptable. But looking at is i came to one notion:

I've never seen before DEFAULT keyword to be used in this way. There is logic
in such usage, but I did not remember any practical usage case.
If there are such usages (I can easily missed it) or if it is somehow
recommended in SQL standard -- let it be. But if none above, I would suggest
to use WITH keyword instead. As it is already used for reloptions. As far as I
remember in my prototype I used "WITH OPTIONS" but did if just because did not
find my way through yac with single "WITH". So ideal way for me would be

create index ON test USING GIST (i WITH (sig_len_int=22));

But as I said it is not thing of importance for me. Just an observation.

--
Do code for fun.

signature.asc (484 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Opclass parameters

Robert Haas
In reply to this post by Nikolay Shaplov
On Wed, Feb 28, 2018 at 9:46 AM Nikolay Shaplov <[hidden email]> wrote:
> 1. I've seen you've added a new attribute into pg_index. Why??!!
> As far as I can get, if have index built on several columns (A1, A2, A3) you
> can set, own opclass for each column. And set individual options for each
> opclass if we are speaking about options. So I would expect to have these
> options not in pg_index, but in pg_attribute. And we already have one there:
> attoptions.I just do not get how you have come to per-index options. May be I
> should read code more attentively...

It seems sensible to have both per-column options and per-index
options.  For example, we've got the fastupdate option for GIN, which
is a property of the index as a whole, not any individual column.  But
you could also want to specify some column-specific options, which
seems to be what this patch is about, since an opclass is associated
with an individual column.  And since an index can have more than one
column, I agree that it seems more appropriate to store this
information in pg_attribute than pg_index.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Opclass parameters

Nikita Glukhov
In reply to this post by Nikolay Shaplov
Attached 3rd version of the patches.

On 20.11.2018 14:15, Nikolay Shaplov wrote:
В письме от 15 ноября 2018 18:26:43 пользователь Nikita Glukhov написал:
Attached 2nd version of the patches. Nothing has changed since March,
this is just a rebased version.

CREATE INDEX syntax and parameters storage method still need discussion.
I've played around a bit with you patch and come to some conclusions, I'd like 
to share. They are almost same as those before, but now there are more 
details.

Again some issues about storing opclass options in pg_inedx:

1. Having both indoption and indoptions column in pg_index will make someone's 
brain explode for sure. If not, it will bring troubles when people start 
confusing them.

2. Now I found out how do you store option values for each opclass: text[] of 
indoptions in pg_index is not the same as text[] in 
reloptions in pg_catalog (and it brings more confusion). In reloption each 
member of the array is a single option. 

reloptions          | {fillfactor=90,autovacuum_enabled=false}

In indoptions, is a whole string of options for one of the indexed attributes, 
each array item has all options for one indexed attribute. And this string 
needs furthermore parsing, that differs from reloption parsing.

indoptions     | {"{numranges=150}","{numranges=160}"}


This brings us to the following issues:

2a. pg_index stores properties of index in general. Properties of each indexed 
attributes is stored in pg_attribute table. If we follow this philosophy
it is wrong to add any kind of per-attribute array values into pg_index. These 
values should be added to pg_attribute one per each pg_attribute entry.

2b. Since you've chosen method of storage that differs from one that is used 
in reloptions, that will lead to two verstions of code that processes the 
attributes. And from now on, if we accept this, we should support both of them 
and keep them in sync. (I see that you tried to reuse as much code as 
possible, but still you added some more that duplicate current reloptions 
functionality.)

On 21.11.2018 18:04, Robert Haas wrote:

It seems sensible to have both per-column options and per-index
options.  For example, we've got the fastupdate option for GIN, which
is a property of the index as a whole, not any individual column.  But
you could also want to specify some column-specific options, which
seems to be what this patch is about, since an opclass is associated
with an individual column.  And since an index can have more than one
column, I agree that it seems more appropriate to store this
information in pg_attribute than pg_index.
Ok, I have switched from pg_index.indoptions to pg_attribute.attoptions.


I agree that we should distinguish per-index and per-column options, but they
can also be AM-specific and opclass-specific.

'fastupdate' option for GIN is an example of AM-specific per-index option.

ASC/DESC and NULLS LAST/FIRST are examples of AM-class-specific per-column
options having special SQL syntax.  "AM-class-specific" here means "specific
only for the class of AMs that support ordering".  Now they are stored as flags
in pg_index.indoption[] but later can be moved to pg_attribute.attoptions.

Don't know should per-column AM-specific and opclass-specific options be stored
in the single column attoptions or have separate columns (like attamoptions and
attopclassoptions).  If we will store them together, we can run into name
collisions, but this collisions can be easily resolved using autogenerated 
prefixes in option names ("am.foo=bar", "opclass.foo=baz").



And another problem is the options with default values.  They may be not 
explicitly  specified by the user, and in this case in current implementation 
nothing will be stored in the catalog because default values can be obtained 
from the code.  But this will not allow changing of default values without 
compatibility issues. So I think it would be better to store both default and
explicitly specified values of all opclass options, but this will require a
major refactoring of current API.


Also I have idea to define list of opclass parameters declaratively when opclass
is created using syntax like the following:

CREATE OPERATOR CLASS name [ (param_name param_type [= default_value] [,...]) ]
FOR TYPE ... AS (
 {  OPTIONS function_name ( arg_type [,...] )       /* reloptions => binary */
  | OPERATOR ...
 } [,...]
)

But if we remember about the min/max values etc. the syntax will definitely
become more complicated, and it will require much more changes in the catalog
(up to creation of new table pg_opclassparams).

In any case, I think it would be nice to give somehow the user the ability to
obtain a list of opclass parameters not only from the documentation.



On 20.11.2018 14:15, Nikolay Shaplov wrote:
I know that relotions code is not really suitable for reusing. This was the 
reason why I started solving oplcass option task with rewriting reloptions 
code,to make it 100% reusable for any kind of options. So I would offer you 
again to join me as a reviewer of that code. This will make opclass code more 
simple and more sensible, if my option code is used... 
I absolutely agree that reloptions code needs such refactoring, and I am 
planning to continue reviewing your patches.

3. Speaking of sensible code

Datum
g_int_options(PG_FUNCTION_ARGS)
{
   Datum       raw_options = PG_GETARG_DATUM(0);
   bool        validate = PG_GETARG_BOOL(1);
   relopt_int  siglen =
       { {"numranges", "number of ranges for compression", 0, 0, 9, 
RELOPT_TYPE_INT },
           G_INT_NUMRANGES_DEFAULT, 1, G_INT_NUMRANGES_MAX };
   relopt_gen *optgen[] = { &siglen.gen };
   int         offsets[] = { offsetof(IntArrayOptions, num_ranges) };
   IntArrayOptions *options =
       parseAndFillLocalRelOptions(raw_options, optgen, offsets, 1,
                                   sizeof(IntArrayOptions), validate);

   PG_RETURN_POINTER(options);
}

It seems to be not a very nice hack.
What would you do if you would like to have both int, real and boolean options 
for one opclass? I do not know how to implement it using this code.
We have only int opclass options for now, but more will come and we should be 
ready for it.
Don't understand where is hack.  

It's possible to parse options of different types, see the following example:

typedef struct FooOptions
{
  int int_opt;
  int str_opt_offset;
} FooOptions;

relopt_int int_option = { {"int_opt", "desc", 0, 0, 7, RELOPT_TYPE_INT },
                           INT_OPT_DEFAULT, INT_OPT_MIN, INT_OPT_MAX };

relopt_string str_option = { {"str_opt", "desc", 0, 0, 7, RELOPT_TYPE_STRING },
                             0, true, validate_str_option, NULL };

relopt_gen *gen_options[] = { &int_option.gen, &str_option.gen };

int offsets[] = { offsetof(FooOptions, int_opt), 
                  offsetof(FooOptions, str_opt_offset) };

FooOptions *opts = 
    parseAndFillLocalRelOptions(raw_options, gen_options, offsets, 2, 
                                sizeof(FooOptions), validate);

int   int_option_value = opts->int_opt;
char *str_option_value = (char *) opts + opts->str_opt_offset;


Also I can propose to combine this two arrays into the one:

relopt_gen_offset options[] = { 
    &int_option.gen, offsetof(FooOptions, int_opt),
    &str_option.gen, offsetof(FooOptions, str_opt_offset)
};


4. Now getting back to not adding opclass options wherever we can, just 
because we can:

4a. For inrarray there were no opclass options tests added. I am sure there 
should be one, at least just to make sure it still does not segfault when you 
try to set one. And in some cases more tests can be needed. To add and review 
them one should be familiar with this opclass internals. So it is good when 
different people do it for different opclasses

4b. When you add opclass options instead of hardcoded values, it comes to 
setting minimum and maximum value. Why do you choose 1000 as maximum 
for num_ranges in gist_int_ops in intarray? Why 1 as minimum? All these 
decisions needs careful considerations and can't be done for bunch of 
opclasses just in one review. 

4c. Patch usually take a long path from prototype to final commit. Do you 
really want to update all these opclasses code each time when some changes 
in the main opclass option code is made? ;-)

So I would suggest to work only with intarray and add other opclasses later.

I decided to leave only GiST tsvector_ops as an example, because it is an 
in-core opclass, not an extension like intarray.

5. You've been asking about SQL grammar

CREATE INDEX idx ON tab USING am (
   {expr {opclass | DEFAULT} ({name=value} [,...])} [,...]
);
As for me I do not really care about it. For me all the solutions is 
acceptable. But looking at is i came to one notion:

I've never seen before DEFAULT keyword to be used in this way. There is logic 
in such usage, but I did not remember any practical usage case.
If there are such usages (I can easily missed it) or if it is somehow 
recommended in SQL standard -- let it be. But if none above, I would suggest 
to use WITH keyword instead. As it is already used for reloptions. As far as I 
remember in my prototype I used "WITH OPTIONS" but did if just because did not 
find my way through yac with single "WITH". So ideal way for me would be 

create index ON test USING GIST (i WITH (sig_len_int=22)); 

But as I said it is not thing of importance for me. Just an observation.
"[opclass] WITH OPTIONS (options)" looks too verbose, of course.

"[opclass] WITH (options)" looks acceptable, but it seems to conflict with
exclusion constraints syntax ("index_elem WITH operator").

"opclass (options)" looks the most natural to me. But we still need some
keyword before the parentheses when the opclass is not specified since we
can't distinguish "func_name (func_params)" and "col_name (opclass_options)"
in grammar.



--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

0001-Add-opclass-parameters-v03.patch (42K) Download Attachment
0002-Add-opclass-parameters-to-GiST-v03.patch (14K) Download Attachment
0003-Add-opclass-parameters-to-GIN-v03.patch (15K) Download Attachment
0004-Add-opclass-parameters-to-GiST-tsvector_ops-v03.patch (32K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Opclass parameters

Robert Haas
On Wed, Dec 5, 2018 at 6:58 PM Nikita Glukhov <[hidden email]> wrote:
> I agree that we should distinguish per-index and per-column options, but they
> can also be AM-specific and opclass-specific.

True, but an index is bound to a single AM, and a column is bound to a
single opclass which is bound to a single AM.  So I'm not very worried
about name collisions.  Can't we just tell opclass authors to pick
names that are unlikely to collide with names chose by the AM or names
that are globally reserved?  It's hard to imagine that we're ever
going to have more than a dozen or so options that could possibly
apply to a column, so splitting things up into different namespaces
seems like an unnecessary burden on the user.

> 'fastupdate' option for GIN is an example of AM-specific per-index option.
>
> ASC/DESC and NULLS LAST/FIRST are examples of AM-class-specific per-column
> options having special SQL syntax.  "AM-class-specific" here means "specific
> only for the class of AMs that support ordering".  Now they are stored as flags
> in pg_index.indoption[] but later can be moved to pg_attribute.attoptions.

Or left where they are.

> And another problem is the options with default values.  They may be not
> explicitly  specified by the user, and in this case in current implementation
> nothing will be stored in the catalog because default values can be obtained
> from the code.  But this will not allow changing of default values without
> compatibility issues. So I think it would be better to store both default and
> explicitly specified values of all opclass options, but this will require a
> major refactoring of current API.

Hmm.  I think if the default ever changes, it will require a new major
release, and we can compensate in pg_dump.  That means that a dump
taken with an old version will not preserve the options.  However,
using the pg_dump from the newest release is the recommended
procedure, and if you don't do that, you might get outright failures.
Failing to preserve an option value in the rare case that a default
was changed seems less bad than that.

> Also I have idea to define list of opclass parameters declaratively when opclass
> is created using syntax like the following:
>
> CREATE OPERATOR CLASS name [ (param_name param_type [= default_value] [,...]) ]
> FOR TYPE ... AS (
>  {  OPTIONS function_name ( arg_type [,...] )       /* reloptions => binary */
>   | OPERATOR ...
>  } [,...]
> )

I'm not sure exposing SQL syntax for this is a very good idea.

> "[opclass] WITH OPTIONS (options)" looks too verbose, of course.

It's not that bad.

> "[opclass] WITH (options)" looks acceptable, but it seems to conflict with
> exclusion constraints syntax ("index_elem WITH operator").

Are you sure?  The operator can't be (

But it might be confusing anyhow.

> "opclass (options)" looks the most natural to me. But we still need some
> keyword before the parentheses when the opclass is not specified since we
> can't distinguish "func_name (func_params)" and "col_name (opclass_options)"
> in grammar.

Are you sure?  What's the SQL syntax where there is actually a problem
here?  CREATE INDEX requires parentheses around a non-trivial
expression.

How about just OPTIONS (options) ?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Opclass parameters

Tom Lane-2
Robert Haas <[hidden email]> writes:
> On Wed, Dec 5, 2018 at 6:58 PM Nikita Glukhov <[hidden email]> wrote:
>> "opclass (options)" looks the most natural to me. But we still need some
>> keyword before the parentheses when the opclass is not specified since we
>> can't distinguish "func_name (func_params)" and "col_name (opclass_options)"
>> in grammar.

> Are you sure?  What's the SQL syntax where there is actually a problem
> here?  CREATE INDEX requires parentheses around a non-trivial
> expression.

Well, the reason we have to require parens around nontrivial expressions
is mostly lack of forethought about making the syntax non-ambiguous :-(

> How about just OPTIONS (options) ?

That would require making OPTIONS a fully reserved word, I think,
else it's ambiguous with an opclass name.

How about saying that you must give an opclass name if you want to
specify options, ie the syntax is

        [ opclass_name [ ( options... ) ] ]

I'm not necessarily wedded to that, but it seems worth throwing
out the idea.

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Opclass parameters

Robert Haas
On Thu, Dec 6, 2018 at 11:55 AM Tom Lane <[hidden email]> wrote:
> How about saying that you must give an opclass name if you want to
> specify options, ie the syntax is
>
>         [ opclass_name [ ( options... ) ] ]
>
> I'm not necessarily wedded to that, but it seems worth throwing
> out the idea.

Agreed, that's not bad, certainly better than making OPTIONS more reserved.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company