[PATCH] Finally split StdRdOptions into HeapOptions and ToastOptions

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[PATCH] Finally split StdRdOptions into HeapOptions and ToastOptions

Nikolay Shaplov
In the thread
https://www.postgresql.org/message-id/2620882.s52SJui4ql@x200m
I've suggested to split one big StdRdOption that is used for options storage
into into Options structures individual for each relkind and each relam

That patch have been split into smaller parts, most of them were already
commit:
https://commitfest.postgresql.org/25/2294/ - Remove StdRdOptions from AM
https://commitfest.postgresql.org/25/2297/ - Do not use StdRdOption for
partitioned tables
https://commitfest.postgresql.org/25/2295/ - Some useful Asserts for view-
related macroses.

And here goes the last part of StrRdOptions removal patch, where StdRdOptions
is replaced with HeapOptions and ToastOptions.

What did I do here.

- Added HeapOptions and ToastOptions structues
- Moved options building tab for autovacuum into AUTOVACUUM_RELOPTIONS macro,
so it can be used in relopt_parse_elt tab both for heap and toast
- Changed everywhere in the code, where old heap_reloptions is used, to use
new heap_reloptions or toast_reloptions
- Changed heap & toast option fetching macros to use HeapOptions and
ToastOptions
- Added Asserts to heap and toast options macros. Now we finally can do it.

What I did not do

- I've split fillfactor related macros into heap and toast like
RelationGetFillFactor will become HeapGetFillFactor and ToastGetFillFactor. I
have to do it, because now they handle different structure.
but there are heap only option macros like RelationGetParallelWorkers that
should be better called HeapGetParallelWorkers, as it is heap related. But I
did not change the name, as somebody from core team (I think it was Alvaro, it
was a while ago) asked me not to change macros names unless in is inavoidable.
So I kept the names, though I still think that naming them with Heap prefix
will make code more clear.

- vacuum_index_cleanup and vacuum_truncate options were added recently. They
were added into StdRdOptions. I think their place is inside AutoVacOpts not in
StdRdOptions, but did not dare to change it. If you see it the same way as I
see, please let me know, I will move it to a proper place.

--
Software Developer: https://www.upwork.com/freelancers/~014a87e140ff02c0da
Body-oriented Therapist: https://vk.com/nataraj_rebalancing  (Russian)

StdRdOptions_to_HeapOptions_and_ToastOptions_v1.diff (24K) Download Attachment
signature.asc (499 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Finally split StdRdOptions into HeapOptions and ToastOptions

Michael Paquier-2
On Fri, Dec 06, 2019 at 06:18:48PM +0300, Nikolay Shaplov wrote:
> In the thread
> https://www.postgresql.org/message-id/2620882.s52SJui4ql@x200m
> I've suggested to split one big StdRdOption that is used for options storage
> into into Options structures individual for each relkind and each relam
>
> And here goes the last part of StrRdOptions removal patch, where StdRdOptions
> is replaced with HeapOptions and ToastOptions.

-typedef struct StdRdOptions
+/*
+ * HeapOptions
+ *     Binary representation of relation options for Heap relations.
+ */
+typedef struct HeapOptions

I think that it makes sense to split relation options dedicated to
heap into their own parsing structure, because those options are
actually related to the table AM heap.  However, I think that this
patch is not ambitious enough in the work which is done and that
things could move into a more generic direction.  At the end of the
day, I'd like to think that we should have something like:
- Heap-related reloptions are built as part of its AM handler in
heapam_handler.c, with reloptions.c holding no more references to
heap.  At all.
- The table AM option parsing follows a model close to what is done
for indexes in terms of option parsing, moving the responsibility to
define relation options to each table AM.
- Toast is an interesting case, as table AMs may want to use toast
tables.  Or not.  Robert may be able to comment more on that as he has
worked in this area for bd12499.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Finally split StdRdOptions into HeapOptions and ToastOptions

Nikolay Shaplov
В письме от понедельник, 9 декабря 2019 г. 12:11:17 MSK пользователь Michael
Paquier написал:

> On Fri, Dec 06, 2019 at 06:18:48PM +0300, Nikolay Shaplov wrote:
> > In the thread
> > https://www.postgresql.org/message-id/2620882.s52SJui4ql@x200m
> > I've suggested to split one big StdRdOption that is used for options
> > storage into into Options structures individual for each relkind and each
> > relam
> >
> > And here goes the last part of StrRdOptions removal patch, where
> > StdRdOptions is replaced with HeapOptions and ToastOptions.
>
> -typedef struct StdRdOptions
> +/*
> + * HeapOptions
> + *     Binary representation of relation options for Heap relations.
> + */
> +typedef struct HeapOptions
>
> I think that it makes sense to split relation options dedicated to
> heap into their own parsing structure, because those options are
> actually related to the table AM heap.  However, I think that this
> patch is not ambitious enough in the work which is done and that
> things could move into a more generic direction.  At the end of the
> day, I'd like to think that we should have something like:
> - Heap-related reloptions are built as part of its AM handler in
> heapam_handler.c, with reloptions.c holding no more references to
> heap.  At all.
> - The table AM option parsing follows a model close to what is done
> for indexes in terms of option parsing, moving the responsibility to
> define relation options to each table AM.
> - Toast is an interesting case, as table AMs may want to use toast
> tables.  Or not.  Robert may be able to comment more on that as he has
> worked in this area for bd12499.
Oh, yeah, I forget that relations now also have AM :-)

But the truth is that my goal is to move all code that defines all option
names, min/max values etc, move it inside am code. To move data from
boolRelOpts, intRelOpts, realRelOpts, enumRelOpts, enumRelOpts from
reloptions.c into the code that implement AMs that uses these options.

I did it for indexes in patch I've offered several years ago. Now we have also
relaion AM.

But I would prefer to fix index AM relioptions first, and then copy that
solution for relations.

Because if I frist copy AM solution from indexes to relation, then I will have
to fix it in two places.

So I would prefer to keep reloptions for relations in relations.c, only split
them into HeapOptions and ToastOptions, then change AM for indexes moving
option definition into AM's and then clone the solution for relations.

This seems to be most simple and most logical way.

PS. I've checked the patch against current master. No changes were needed, but
I am attaching a diff made against current master, just in case.

StdRdOptions_to_HeapOptions_and_ToastOptions_v1a.diff (24K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Finally split StdRdOptions into HeapOptions and ToastOptions

Michael Paquier-2
On Wed, Mar 04, 2020 at 10:58:31PM +0300, Nikolay Shaplov wrote:

> But the truth is that my goal is to move all code that defines all option
> names, min/max values etc, move it inside am code. To move data from
> boolRelOpts, intRelOpts, realRelOpts, enumRelOpts, enumRelOpts from
> reloptions.c into the code that implement AMs that uses these options.
>
> I did it for indexes in patch I've offered several years ago. Now we have also
> relaion AM.
>
> But I would prefer to fix index AM relioptions first, and then copy that
> solution for relations.
How do you think that this part should be changed then, if this needs
any changes?  It seems to me that we have a rather correct layer for
index AMs by requiring each one to define the available option set
using indoptions through their handler, with option fetching macros
located within each AM.

> Because if I first copy AM solution from indexes to relation, then I will have
> to fix it in two places.
>
> So I would prefer to keep reloptions for relations in relations.c, only split
> them into HeapOptions and ToastOptions, then change AM for indexes moving
> option definition into AM's and then clone the solution for relations.

Then, for table AMs, it seems to me that you are right for long-term
perspective to have the toast-related options in reloptions.c, or
perhaps directly located within more toast-related file (?) as table
AMs interact with toast using heapam_relation_needs_toast_table and
such callbacks.  So for heap, moving the option handling to roughly
heapam_handler.c is a natural move, though this requires a redesign of
the existing structure to use option handling closer to what
indoptions does, but for tables.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Finally split StdRdOptions into HeapOptions and ToastOptions

Nikolay Shaplov
В письме от суббота, 7 марта 2020 г. 10:03:40 MSK пользователь Michael Paquier
написал:

> On Wed, Mar 04, 2020 at 10:58:31PM +0300, Nikolay Shaplov wrote:
> > But the truth is that my goal is to move all code that defines all option
> > names, min/max values etc, move it inside am code. To move data from
> > boolRelOpts, intRelOpts, realRelOpts, enumRelOpts, enumRelOpts from
> > reloptions.c into the code that implement AMs that uses these options.
> >
> > I did it for indexes in patch I've offered several years ago. Now we have
> > also relaion AM.
> >
> > But I would prefer to fix index AM relioptions first, and then copy that
> > solution for relations.
>
> How do you think that this part should be changed then, if this needs
> any changes?  It seems to me that we have a rather correct layer for
> index AMs by requiring each one to define the available option set
> using indoptions through their handler, with option fetching macros
> located within each AM.

My idea is like this:

Now information about what reloption AM has and how they should be parsed are
stored in two places. One is relioptions.c and boolRelOpts, intRelOpts,
realRelOpts, enumRelOpts, enumRelOpts arrays, another is
static const relopt_parse_elt tab[] inside  amoptions_function amoptions;
function of each AM.

My suggestion is to merge all this data into one structure. Like
option_definition


/* generic struct to hold shared data */                                      
typedef struct option_definition_basic                                        
{                                                                              
   const char *name;           /* must be first (used as list termination      
                                * marker) */                                  
   const char *desc;                                                          
   LOCKMODE    lockmode;                                                      
   option_definition_flags flags;                                              
   option_type type;                                                          
   int         struct_offset;  /* offset of the value in Bytea representation
*/
}  option_definition_basic;

typedef struct option_definition_bool                                          
{                                                                              
   option_definition_basic base;                                              
   bool        default_val;                                                    
}  option_definition_bool;                                                    
                                                                               
typedef struct option_definition_int                                          
{                                                                              
   option_definition_basic base;                                              
   int         default_val;                                                    
   int         min;                                                            
   int         max;                                                            
}  option_definition_int;                                                      
                                                                               
typedef struct option_definition_real                                          
{                                                                              
   option_definition_basic base;                                              
   double      default_val;                                                    
   double      min;                                                            
   double      max;                                                            
}  option_definition_real;                                                    
                                                                               
typedef struct option_definition_enum                                          
{                                                                              
   option_definition_basic base;                                              
   const char **allowed_values;/* Null terminated array of allowed values for  
                                * the option */                                
   int         default_val;    /* Number of item of allowed_values array */    
}  option_definition_enum;                                                    

This example from my old code, I guess I should add a union here, this will
make code more readable... But the idea is the same, we have one structure
that describes how this option should be parsed.

Then we gather all options definitions for one object (for example for index)
into a structure called OptionsDefSet

typedef struct OptionDefSet
{                                                                              
   option_definition_basic **definitions;                                      
   int         num;            /* Number of catalog items in use */            
   int         num_allocated;  /* Number of catalog items allocated */        
   bool        forbid_realloc; /* If number of items of the catalog were      
                                * strictly set to certain value do no allow    
                                * adding more idems */                        
   Size        struct_size;    /* Size of a structure for options in binary    
                                * representation */                            
   postprocess_bytea_options_function postprocess_fun; /* This function is    
                                                        * called after options
                                                        * were converted in    
                                                        * Bytea represenation.
                                                       * Can be used for extra
                                                     * validation and so on */
   char       *namespace;      /* Catalog is used for options from this        
                                * namespase */                                
}  OptionDefSet;


This DefSet will have all we need to parse options for certain object.

This all will be stored in the AM code.

Then we replace amoptions_function amoptions; function of the AM with
amoptions_def_set_function amoptions_def_set; function. This function will
return OptionDefSet to whoever called it.

So whenever we need an option in index_reloptions from reloption.c we get this
DefSet, and then call a function that parses options using this data.


Why do we need this.

1. To have all options related data in one place. Now when a developer wants
to add an option, he needs to find all places in the code this option should
be added. It is not good thing. It brings troubles and errors.

2. Now we have two different options storages for in-core AM options and for
contib AM options. They are boolRelOpts, intRelOpts, realRelOpts, enumRelOpts,
enumRelOpts arrays for in-core AMm's and static relopt_gen **custom_options
for contrib AM. It if better ho have same tool for both. My idea will allow to
have same code both in contrib and in-core AM.

3. Then we will be able to have reloptions-like options just anywhere, just
define an OptionDefSet, and put code that provides option values from SQL and
pg_catalog. I came to the idea of rewriting option code when I have been
working on the task of adding opclass parameters, Nikita Glukhov works on it
now. It is very simulator to options, but now you can not use reloptions code
for that. When we will have OptionDefSet I've described, it can be used for
opclass parameters too.

The example of how it can be done can be found at https://
commitfest.postgresql.org/15/992/
The code is quite outdated, and has a lot of extra code that already commited
or is not really needed. But you can see the idea.

I will try to provide new version of it soon, with no extra code, that can be
applied to current master, but it would be good to apply it to the master
branch where there is no StdRdOptions. This will make patch more readable and
understandable.

So, can I please have it in the code :-)

> > Because if I first copy AM solution from indexes to relation, then I will
> > have to fix it in two places.
> >
> > So I would prefer to keep reloptions for relations in relations.c, only
> > split them into HeapOptions and ToastOptions, then change AM for indexes
> > moving option definition into AM's and then clone the solution for
> > relations.
> Then, for table AMs, it seems to me that you are right for long-term
> perspective to have the toast-related options in reloptions.c, or
> perhaps directly located within more toast-related file (?) as table
> AMs interact with toast using heapam_relation_needs_toast_table and
> such callbacks.  So for heap, moving the option handling to roughly
> heapam_handler.c is a natural move, though this requires a redesign of
> the existing structure to use option handling closer to what
> indoptions does, but for tables.
We also have view reloptions and attribute options, as well as toast options.
We can keep them in reloption.c or find better place for them. It is not a big
problem I think. And as for heap options, yes, as heap now has AM, they should
be moved inside AM. I can do it when we are finished with index options.




Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Finally split StdRdOptions into HeapOptions and ToastOptions

Nikolay Shaplov
In reply to this post by Nikolay Shaplov
A new version of the patch.
Autovacuum options were extended in b07642db

So I added that options to the current patch.

StdRdOptions_to_HeapOptions_and_ToastOptions_v1b.diff (24K) Download Attachment