Custom compression methods

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
196 messages Options
1 ... 78910
Reply | Threaded
Open this post in threaded view
|

Re: Custom compression methods

Dilip Kumar-2
On Thu, Oct 22, 2020 at 10:41 AM Dilip Kumar <[hidden email]> wrote:

>
> On Thu, Oct 22, 2020 at 2:11 AM Tomas Vondra
> <[hidden email]> wrote:
> >
> > On Wed, Oct 21, 2020 at 01:59:50PM +0530, Dilip Kumar wrote:
> > >On Sat, Oct 17, 2020 at 11:34 AM Dilip Kumar <[hidden email]> wrote:
> > >>
> > >> On Tue, Oct 13, 2020 at 10:30 AM Dilip Kumar <[hidden email]> wrote:
> > >> >
> > >> > On Mon, Oct 12, 2020 at 7:32 PM Tomas Vondra
> > >> > <[hidden email]> wrote:
> > >> > >
> > >> > > On Mon, Oct 12, 2020 at 02:28:43PM +0530, Dilip Kumar wrote:
> > >> > > >
> > >> > > >> ...
> > >> > > >
> > >> > > >I have worked on this patch, so as discussed now I am maintaining the
> > >> > > >preserved compression methods using dependency.  Still PRESERVE ALL
> > >> > > >syntax is not supported, I will work on that part.
> > >> > > >
> > >> > >
> > >> > > Cool, I'll take a look. What's your opinion on doing it this way? Do you
> > >> > > think it's cleaner / more elegant, or is it something contrary to what
> > >> > > the dependencies are meant to do?
> > >> >
> > >> > I think this looks much cleaner.  Moreover, I feel that once we start
> > >> > supporting the custom compression methods then we anyway have to
> > >> > maintain the dependency so using that for finding the preserved
> > >> > compression method is good option.
> > >>
> > >> I have also implemented the next set of patches.
> > >> 0004 -> Provide a way to create custom compression methods
> > >> 0005 -> Extention to implement lz4 as a custom compression method.
> > >
> > >In the updated version I have worked on some of the listed items
> > >> A pending list of items:
> > >> 1. Provide support for handling the compression option
> > >> - As discussed up thread I will store the compression option of the
> > >> latest compression method in a new field in pg_atrribute table
> > >> 2. As of now I have kept zlib as the second built-in option and lz4 as
> > >> a custom compression extension.  In Offlist discussion with Robert, he
> > >> suggested that we should keep lz4 as the built-in method and we can
> > >> move zlib as an extension because lz4 is faster than zlib so better to
> > >> keep that as the built-in method.  So in the next version, I will
> > >> change that.  Any different opinion on this?
> > >
> > >Done
> > >
> > >> 3. Improve the documentation, especially for create_compression_method.
> > >> 4. By default support table compression method for the index.
> > >
> > >Done
> > >
> > >> 5. Support the PRESERVE ALL option so that we can preserve all
> > >> existing lists of compression methods without providing the whole
> > >> list.
> > >
> > >1,3,5 points are still pending.
> > >
> >
> > Thanks. I took a quick look at the patches and I think it seems fine. I
> > have one question, though - toast_compress_datum contains this code:
> >
> >
> >      /* Call the actual compression function */
> >      tmp = cmroutine->cmcompress((const struct varlena *) value);
> >      if (!tmp)
> >          return PointerGetDatum(NULL);
> >
> >
> > Shouldn't this really throw an error instead? I mean, if the compression
> > library returns NULL, isn't that an error?
>
> I don't think that we can throw an error here because pglz_compress
> might return -1 if it finds that it can not reduce the size of the
> data and we consider such data as "incompressible data" and return
> NULL.  In such a case the caller will try to compress another
> attribute of the tuple.  I think we can handle such cases in the
> specific handler functions.
I have added the compression failure error in lz4.c, please refer
lz4_cmcompress in v9-0001 patch.  Apart from that, I have also
supported the PRESERVE ALL syntax to preserve all the existing
compression methods.  I have also rebased the patch on the current
head.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

v9-0002-alter-table-set-compression.patch (17K) Download Attachment
v9-0001-Built-in-compression-method.patch (275K) Download Attachment
v9-0004-Create-custom-compression-methods.patch (53K) Download Attachment
v9-0003-Add-support-for-PRESERVE.patch (55K) Download Attachment
v9-0005-new-compression-method-extension-for-zlib.patch (13K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Custom compression methods

Dilip Kumar-2
On Thu, Oct 22, 2020 at 5:56 PM Dilip Kumar <[hidden email]> wrote:

>
> On Thu, Oct 22, 2020 at 10:41 AM Dilip Kumar <[hidden email]> wrote:
> >
> > On Thu, Oct 22, 2020 at 2:11 AM Tomas Vondra
> > <[hidden email]> wrote:
> > >
> > > On Wed, Oct 21, 2020 at 01:59:50PM +0530, Dilip Kumar wrote:
> > > >On Sat, Oct 17, 2020 at 11:34 AM Dilip Kumar <[hidden email]> wrote:
> > > >>
> > > >> On Tue, Oct 13, 2020 at 10:30 AM Dilip Kumar <[hidden email]> wrote:
> > > >> >
> > > >> > On Mon, Oct 12, 2020 at 7:32 PM Tomas Vondra
> > > >> > <[hidden email]> wrote:
> > > >> > >
> > > >> > > On Mon, Oct 12, 2020 at 02:28:43PM +0530, Dilip Kumar wrote:
> > > >> > > >
> > > >> > > >> ...
> > > >> > > >
> > > >> > > >I have worked on this patch, so as discussed now I am maintaining the
> > > >> > > >preserved compression methods using dependency.  Still PRESERVE ALL
> > > >> > > >syntax is not supported, I will work on that part.
> > > >> > > >
> > > >> > >
> > > >> > > Cool, I'll take a look. What's your opinion on doing it this way? Do you
> > > >> > > think it's cleaner / more elegant, or is it something contrary to what
> > > >> > > the dependencies are meant to do?
> > > >> >
> > > >> > I think this looks much cleaner.  Moreover, I feel that once we start
> > > >> > supporting the custom compression methods then we anyway have to
> > > >> > maintain the dependency so using that for finding the preserved
> > > >> > compression method is good option.
> > > >>
> > > >> I have also implemented the next set of patches.
> > > >> 0004 -> Provide a way to create custom compression methods
> > > >> 0005 -> Extention to implement lz4 as a custom compression method.
> > > >
> > > >In the updated version I have worked on some of the listed items
> > > >> A pending list of items:
> > > >> 1. Provide support for handling the compression option
> > > >> - As discussed up thread I will store the compression option of the
> > > >> latest compression method in a new field in pg_atrribute table
> > > >> 2. As of now I have kept zlib as the second built-in option and lz4 as
> > > >> a custom compression extension.  In Offlist discussion with Robert, he
> > > >> suggested that we should keep lz4 as the built-in method and we can
> > > >> move zlib as an extension because lz4 is faster than zlib so better to
> > > >> keep that as the built-in method.  So in the next version, I will
> > > >> change that.  Any different opinion on this?
> > > >
> > > >Done
> > > >
> > > >> 3. Improve the documentation, especially for create_compression_method.
> > > >> 4. By default support table compression method for the index.
> > > >
> > > >Done
> > > >
> > > >> 5. Support the PRESERVE ALL option so that we can preserve all
> > > >> existing lists of compression methods without providing the whole
> > > >> list.
> > > >
> > > >1,3,5 points are still pending.
> > > >
> > >
> > > Thanks. I took a quick look at the patches and I think it seems fine. I
> > > have one question, though - toast_compress_datum contains this code:
> > >
> > >
> > >      /* Call the actual compression function */
> > >      tmp = cmroutine->cmcompress((const struct varlena *) value);
> > >      if (!tmp)
> > >          return PointerGetDatum(NULL);
> > >
> > >
> > > Shouldn't this really throw an error instead? I mean, if the compression
> > > library returns NULL, isn't that an error?
> >
> > I don't think that we can throw an error here because pglz_compress
> > might return -1 if it finds that it can not reduce the size of the
> > data and we consider such data as "incompressible data" and return
> > NULL.  In such a case the caller will try to compress another
> > attribute of the tuple.  I think we can handle such cases in the
> > specific handler functions.
>
> I have added the compression failure error in lz4.c, please refer
> lz4_cmcompress in v9-0001 patch.  Apart from that, I have also
> supported the PRESERVE ALL syntax to preserve all the existing
> compression methods.  I have also rebased the patch on the current
> head.
I have added the next patch to support the compression options.  I am
storing the compression options only for the latest compression
method.  Basically, based on this design we would be able to support
the options which are used only for compressions.  As of now, the
compression option infrastructure is added and the compression options
for inbuilt method pglz and the external method zlib are added.  Next,
I will work on adding the options for the lz4 method.


--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

v10-0002-alter-table-set-compression.patch (17K) Download Attachment
v10-0001-Built-in-compression-method.patch (274K) Download Attachment
v10-0005-new-compression-method-extension-for-zlib.patch (13K) Download Attachment
v10-0003-Add-support-for-PRESERVE.patch (55K) Download Attachment
v10-0004-Create-custom-compression-methods.patch (53K) Download Attachment
v10-0006-Support-compression-methods-options.patch (56K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Custom compression methods

Dilip Kumar-2
On Tue, Oct 27, 2020 at 10:54 AM Dilip Kumar <[hidden email]> wrote:

>
> On Thu, Oct 22, 2020 at 5:56 PM Dilip Kumar <[hidden email]> wrote:
> >
> > On Thu, Oct 22, 2020 at 10:41 AM Dilip Kumar <[hidden email]> wrote:
> > >
> > > On Thu, Oct 22, 2020 at 2:11 AM Tomas Vondra
> > > <[hidden email]> wrote:
> > > >
> > > > On Wed, Oct 21, 2020 at 01:59:50PM +0530, Dilip Kumar wrote:
> > > > >On Sat, Oct 17, 2020 at 11:34 AM Dilip Kumar <[hidden email]> wrote:
> > > > >>
> > > > >> On Tue, Oct 13, 2020 at 10:30 AM Dilip Kumar <[hidden email]> wrote:
> > > > >> >
> > > > >> > On Mon, Oct 12, 2020 at 7:32 PM Tomas Vondra
> > > > >> > <[hidden email]> wrote:
> > > > >> > >
> > > > >> > > On Mon, Oct 12, 2020 at 02:28:43PM +0530, Dilip Kumar wrote:
> > > > >> > > >
> > > > >> > > >> ...
> > > > >> > > >
> > > > >> > > >I have worked on this patch, so as discussed now I am maintaining the
> > > > >> > > >preserved compression methods using dependency.  Still PRESERVE ALL
> > > > >> > > >syntax is not supported, I will work on that part.
> > > > >> > > >
> > > > >> > >
> > > > >> > > Cool, I'll take a look. What's your opinion on doing it this way? Do you
> > > > >> > > think it's cleaner / more elegant, or is it something contrary to what
> > > > >> > > the dependencies are meant to do?
> > > > >> >
> > > > >> > I think this looks much cleaner.  Moreover, I feel that once we start
> > > > >> > supporting the custom compression methods then we anyway have to
> > > > >> > maintain the dependency so using that for finding the preserved
> > > > >> > compression method is good option.
> > > > >>
> > > > >> I have also implemented the next set of patches.
> > > > >> 0004 -> Provide a way to create custom compression methods
> > > > >> 0005 -> Extention to implement lz4 as a custom compression method.
> > > > >
> > > > >In the updated version I have worked on some of the listed items
> > > > >> A pending list of items:
> > > > >> 1. Provide support for handling the compression option
> > > > >> - As discussed up thread I will store the compression option of the
> > > > >> latest compression method in a new field in pg_atrribute table
> > > > >> 2. As of now I have kept zlib as the second built-in option and lz4 as
> > > > >> a custom compression extension.  In Offlist discussion with Robert, he
> > > > >> suggested that we should keep lz4 as the built-in method and we can
> > > > >> move zlib as an extension because lz4 is faster than zlib so better to
> > > > >> keep that as the built-in method.  So in the next version, I will
> > > > >> change that.  Any different opinion on this?
> > > > >
> > > > >Done
> > > > >
> > > > >> 3. Improve the documentation, especially for create_compression_method.
> > > > >> 4. By default support table compression method for the index.
> > > > >
> > > > >Done
> > > > >
> > > > >> 5. Support the PRESERVE ALL option so that we can preserve all
> > > > >> existing lists of compression methods without providing the whole
> > > > >> list.
> > > > >
> > > > >1,3,5 points are still pending.
> > > > >
> > > >
> > > > Thanks. I took a quick look at the patches and I think it seems fine. I
> > > > have one question, though - toast_compress_datum contains this code:
> > > >
> > > >
> > > >      /* Call the actual compression function */
> > > >      tmp = cmroutine->cmcompress((const struct varlena *) value);
> > > >      if (!tmp)
> > > >          return PointerGetDatum(NULL);
> > > >
> > > >
> > > > Shouldn't this really throw an error instead? I mean, if the compression
> > > > library returns NULL, isn't that an error?
> > >
> > > I don't think that we can throw an error here because pglz_compress
> > > might return -1 if it finds that it can not reduce the size of the
> > > data and we consider such data as "incompressible data" and return
> > > NULL.  In such a case the caller will try to compress another
> > > attribute of the tuple.  I think we can handle such cases in the
> > > specific handler functions.
> >
> > I have added the compression failure error in lz4.c, please refer
> > lz4_cmcompress in v9-0001 patch.  Apart from that, I have also
> > supported the PRESERVE ALL syntax to preserve all the existing
> > compression methods.  I have also rebased the patch on the current
> > head.
>
> I have added the next patch to support the compression options.  I am
> storing the compression options only for the latest compression
> method.  Basically, based on this design we would be able to support
> the options which are used only for compressions.  As of now, the
> compression option infrastructure is added and the compression options
> for inbuilt method pglz and the external method zlib are added.  Next,
> I will work on adding the options for the lz4 method.
In the attached patch set I have also included the compression option
support for lz4.  As of now, I have only supported the acceleration
for LZ4_compress_fast.  There is also support for the dictionary-based
compression but if we try to support that then we will need the
dictionary for decompression also.  Since we are only keeping the
options for the current compression methods, we can not support
dictionary-based options as of now.


--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

v11-0004-Create-custom-compression-methods.patch (53K) Download Attachment
v11-0005-new-compression-method-extension-for-zlib.patch (13K) Download Attachment
v11-0001-Built-in-compression-method.patch (274K) Download Attachment
v11-0003-Add-support-for-PRESERVE.patch (55K) Download Attachment
v11-0002-alter-table-set-compression.patch (17K) Download Attachment
v11-0006-Support-compression-methods-options.patch (59K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Custom compression methods

Tomas Vondra-4
On Wed, Oct 28, 2020 at 01:16:31PM +0530, Dilip Kumar wrote:

>>
>> ...
>>
>> I have added the next patch to support the compression options.  I am
>> storing the compression options only for the latest compression
>> method.  Basically, based on this design we would be able to support
>> the options which are used only for compressions.  As of now, the
>> compression option infrastructure is added and the compression options
>> for inbuilt method pglz and the external method zlib are added.  Next,
>> I will work on adding the options for the lz4 method.
>
>In the attached patch set I have also included the compression option
>support for lz4.  As of now, I have only supported the acceleration
>for LZ4_compress_fast.  There is also support for the dictionary-based
>compression but if we try to support that then we will need the
>dictionary for decompression also.  Since we are only keeping the
>options for the current compression methods, we can not support
>dictionary-based options as of now.
>

OK, thanks. Do you have any other plans to improve this patch series? I
plan to do some testing and review, but if you're likely to post another
version soon then I'd wait a bit.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Custom compression methods

Dilip Kumar-2
On Thu, Oct 29, 2020 at 12:31 AM Tomas Vondra
<[hidden email]> wrote:

>
> On Wed, Oct 28, 2020 at 01:16:31PM +0530, Dilip Kumar wrote:
> >>
> >> ...
> >>
> >> I have added the next patch to support the compression options.  I am
> >> storing the compression options only for the latest compression
> >> method.  Basically, based on this design we would be able to support
> >> the options which are used only for compressions.  As of now, the
> >> compression option infrastructure is added and the compression options
> >> for inbuilt method pglz and the external method zlib are added.  Next,
> >> I will work on adding the options for the lz4 method.
> >
> >In the attached patch set I have also included the compression option
> >support for lz4.  As of now, I have only supported the acceleration
> >for LZ4_compress_fast.  There is also support for the dictionary-based
> >compression but if we try to support that then we will need the
> >dictionary for decompression also.  Since we are only keeping the
> >options for the current compression methods, we can not support
> >dictionary-based options as of now.
> >
>
> OK, thanks. Do you have any other plans to improve this patch series? I
> plan to do some testing and review, but if you're likely to post another
> version soon then I'd wait a bit.
There was some issue in create_compression_method.sgml and the
drop_compression_method.sgml was missing.  I have fixed that in the
attached patch.  Now I am not planning to change anything soon so you
can review.  Thanks.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

v12-0002-alter-table-set-compression.patch (17K) Download Attachment
v12-0001-Built-in-compression-method.patch (274K) Download Attachment
v12-0005-new-compression-method-extension-for-zlib.patch (13K) Download Attachment
v12-0004-Create-custom-compression-methods.patch (61K) Download Attachment
v12-0003-Add-support-for-PRESERVE.patch (55K) Download Attachment
v12-0006-Support-compression-methods-options.patch (59K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Custom compression methods

Dilip Kumar-2
On Thu, Oct 29, 2020 at 12:07 PM Dilip Kumar <[hidden email]> wrote:

>
> On Thu, Oct 29, 2020 at 12:31 AM Tomas Vondra
> <[hidden email]> wrote:
> >
> > On Wed, Oct 28, 2020 at 01:16:31PM +0530, Dilip Kumar wrote:
> > >>
> > >> ...
> > >>
> > >> I have added the next patch to support the compression options.  I am
> > >> storing the compression options only for the latest compression
> > >> method.  Basically, based on this design we would be able to support
> > >> the options which are used only for compressions.  As of now, the
> > >> compression option infrastructure is added and the compression options
> > >> for inbuilt method pglz and the external method zlib are added.  Next,
> > >> I will work on adding the options for the lz4 method.
> > >
> > >In the attached patch set I have also included the compression option
> > >support for lz4.  As of now, I have only supported the acceleration
> > >for LZ4_compress_fast.  There is also support for the dictionary-based
> > >compression but if we try to support that then we will need the
> > >dictionary for decompression also.  Since we are only keeping the
> > >options for the current compression methods, we can not support
> > >dictionary-based options as of now.
> > >
> >
> > OK, thanks. Do you have any other plans to improve this patch series? I
> > plan to do some testing and review, but if you're likely to post another
> > version soon then I'd wait a bit.
>
> There was some issue in create_compression_method.sgml and the
> drop_compression_method.sgml was missing.  I have fixed that in the
> attached patch.  Now I am not planning to change anything soon so you
> can review.  Thanks.
The patches were not applying on the current head so I have re-based them.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

v13-0003-Add-support-for-PRESERVE.patch (55K) Download Attachment
v13-0001-Built-in-compression-method.patch (274K) Download Attachment
v13-0002-alter-table-set-compression.patch (17K) Download Attachment
v13-0005-new-compression-method-extension-for-zlib.patch (13K) Download Attachment
v13-0004-Create-custom-compression-methods.patch (61K) Download Attachment
v13-0006-Support-compression-methods-options.patch (60K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Custom compression methods

Dilip Kumar-2
On Sun, Nov 8, 2020 at 4:29 PM Dilip Kumar <[hidden email]> wrote:

>
> On Thu, Oct 29, 2020 at 12:07 PM Dilip Kumar <[hidden email]> wrote:
> >
> > On Thu, Oct 29, 2020 at 12:31 AM Tomas Vondra
> > <[hidden email]> wrote:
> > >
> > > On Wed, Oct 28, 2020 at 01:16:31PM +0530, Dilip Kumar wrote:
> > > >>
> > > >> ...
> > > >>
> > > >> I have added the next patch to support the compression options.  I am
> > > >> storing the compression options only for the latest compression
> > > >> method.  Basically, based on this design we would be able to support
> > > >> the options which are used only for compressions.  As of now, the
> > > >> compression option infrastructure is added and the compression options
> > > >> for inbuilt method pglz and the external method zlib are added.  Next,
> > > >> I will work on adding the options for the lz4 method.
> > > >
> > > >In the attached patch set I have also included the compression option
> > > >support for lz4.  As of now, I have only supported the acceleration
> > > >for LZ4_compress_fast.  There is also support for the dictionary-based
> > > >compression but if we try to support that then we will need the
> > > >dictionary for decompression also.  Since we are only keeping the
> > > >options for the current compression methods, we can not support
> > > >dictionary-based options as of now.
> > > >
> > >
> > > OK, thanks. Do you have any other plans to improve this patch series? I
> > > plan to do some testing and review, but if you're likely to post another
> > > version soon then I'd wait a bit.
> >
> > There was some issue in create_compression_method.sgml and the
> > drop_compression_method.sgml was missing.  I have fixed that in the
> > attached patch.  Now I am not planning to change anything soon so you
> > can review.  Thanks.
>
> The patches were not applying on the current head so I have re-based them.
There were a few problems in this rebased version, basically, the
compression options were not passed while compressing values from the
brin_form_tuple, so I have fixed this.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

v14-0005-new-compression-method-extension-for-zlib.patch (13K) Download Attachment
v14-0002-alter-table-set-compression.patch (17K) Download Attachment
v14-0001-Built-in-compression-method.patch (274K) Download Attachment
v14-0003-Add-support-for-PRESERVE.patch (55K) Download Attachment
v14-0004-Create-custom-compression-methods.patch (61K) Download Attachment
v14-0006-Support-compression-methods-options.patch (67K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Custom compression methods

Robert Haas
On Wed, Nov 11, 2020 at 9:39 AM Dilip Kumar <[hidden email]> wrote:
> There were a few problems in this rebased version, basically, the
> compression options were not passed while compressing values from the
> brin_form_tuple, so I have fixed this.

Since the authorship history of this patch is complicated, it would be
nice if you would include authorship information and relevant
"Discussion" links in the patches.

Design level considerations and overall notes:

configure is autogenerated from configure.in, so the patch shouldn't
include changes only to the former.

Looking over the changes to src/include:

+       PGLZ_COMPRESSION_ID,
+       LZ4_COMPRESSION_ID

I think that it would be good to assign values to these explicitly.

+/* compresion handler routines */

Spelling.

+       /* compression routine for the compression method */
+       cmcompress_function cmcompress;
+
+       /* decompression routine for the compression method */
+       cmcompress_function cmdecompress;

Don't reuse cmcompress_function; that's confusing. Just have a typedef
per structure member, even if they end up being the same.

 #define TOAST_COMPRESS_SET_RAWSIZE(ptr, len) \
-       (((toast_compress_header *) (ptr))->rawsize = (len))
+do { \
+       Assert(len > 0 && len <= RAWSIZEMASK); \
+       ((toast_compress_header *) (ptr))->info = (len); \
+} while (0)

Indentation.

+#define TOAST_COMPRESS_SET_COMPRESSION_METHOD(ptr, cm_method) \
+       ((toast_compress_header *) (ptr))->info |= ((cm_method) << 30);

What about making TOAST_COMPRESS_SET_RAWSIZE() take another argument?
And possibly also rename it to TEST_COMPRESS_SET_SIZE_AND_METHOD() or
something? It seems not great to have separate functions each setting
part of a 4-byte quantity. Too much chance of failing to set both
parts. I guess you've got a function called
toast_set_compressed_datum_info() for that, but it's just a wrapper
around two macros that could just be combined, which would reduce
complexity overall.

+       T_CompressionRoutine,           /* in access/compressionapi.h */

This looks misplaced. I guess it should go just after these:

    T_FdwRoutine,               /* in foreign/fdwapi.h */
    T_IndexAmRoutine,           /* in access/amapi.h */
    T_TableAmRoutine,           /* in access/tableam.h */

Looking over the regression test changes:

The tests at the top of create_cm.out that just test that we can
create tables with various storage types seem unrelated to the purpose
of the patch. And the file doesn't test creating a compression method
either, as the file name would suggest, so either the file name needs
to be changed (compression, compression_method?) or the tests don't go
here.

+-- check data is okdd

I guess whoever is responsible for this comment prefers vi to emacs.

I don't quite understand the purpose of all of these tests, and there
are some things that I feel like ought to be tested that seemingly
aren't. Like, you seem to test using an UPDATE to move a datum from a
table to another table with the same compression method, but not one
with a different compression method. Testing the former is nice and
everything, but that's the easy case: I think we also need to test the
latter. I think it would be good to verify not only that the data is
readable but that it's compressed the way we expect. I think it would
be a great idea to add a pg_column_compression() function in a similar
spirit to pg_column_size(). Perhaps it could return NULL when
compression is not in use or the data type is not varlena, and the
name of the compression method otherwise. That would allow for better
testing of this feature, and it would also be useful to users who are
switching methods, to see what data they still have that's using the
old method. It could be useful for debugging problems on customer
systems, too.

I wonder if we need a test that moves data between tables through an
intermediary. For instance, suppose a plpgsql function or DO block
fetches some data and stores it in a plpgsql variable and then uses
the variable to insert into another table. Hmm, maybe that would force
de-TOASTing. But perhaps there are other cases. Maybe a more general
way to approach the problem is: have you tried running a coverage
report and checked which parts of your code are getting exercised by
the existing tests and which parts are not? The stuff that isn't, we
should try to add more tests. It's easy to get corner cases wrong with
this kind of thing.

I notice that LIKE INCLUDING COMPRESSION doesn't seem to be tested, at
least not by 0001, which reinforces my feeling that the tests here are
not as thorough as they could be.

+NOTICE:  pg_compression contains unpinned initdb-created object(s)

This seems wrong to me - why is it OK?

-       result = (struct varlena *)
-               palloc(TOAST_COMPRESS_RAWSIZE(attr) + VARHDRSZ);
-       SET_VARSIZE(result, TOAST_COMPRESS_RAWSIZE(attr) + VARHDRSZ);
+       cmoid = GetCompressionOidFromCompressionId(TOAST_COMPRESS_METHOD(attr));

-       if (pglz_decompress(TOAST_COMPRESS_RAWDATA(attr),
-                                               TOAST_COMPRESS_SIZE(attr),
-                                               VARDATA(result),
-
TOAST_COMPRESS_RAWSIZE(attr), true) < 0)
-               elog(ERROR, "compressed data is corrupted");
+       /* get compression method handler routines */
+       cmroutine = GetCompressionRoutine(cmoid);

-       return result;
+       return cmroutine->cmdecompress(attr);

I'm worried about how expensive this might be, and I think we could
make it cheaper. The reason why I think this might be expensive is:
currently, for every datum, you have a single direct function call.
Now, with this, you first have a direct function call to
GetCompressionOidFromCompressionId(). Then you have a call to
GetCompressionRoutine(), which does a syscache lookup and calls a
handler function, which is quite a lot more expensive than a single
function call. And the handler isn't even returning a statically
allocated structure, but is allocating new memory every time, which
involves more function calls and maybe memory leaks. Then you use the
results of all that to make an indirect function call.

I'm not sure exactly what combination of things we could use to make
this better, but it seems like there are a few possibilities:

(1) The handler function could return a pointer to the same
CompressionRoutine every time instead of constructing a new one every
time.
(2) The CompressionRoutine to which the handler function returns a
pointer could be statically allocated instead of being built at
runtime.
(3) GetCompressionRoutine could have an OID -> handler cache instead
of relying on syscache + calling the handler function all over again.
(4) For the compression types that have dedicated bit patterns in the
high bits of the compressed TOAST size, toast_compress_datum() could
just have hard-coded logic to use the correct handlers instead of
translating the bit pattern into an OID and then looking it up over
again.
(5) Going even further than #4 we could skip the handler layer
entirely for such methods, and just call the right function directly.

I think we should definitely do (1), and also (2) unless there's some
reason it's hard. (3) doesn't need to be part of this patch, but might
be something to consider later in the series. It's possible that it
doesn't have enough benefit to be worth the work, though. Also, I
think we should do either (4) or (5). I have a mild preference for (5)
unless it looks too ugly.

Note that I'm not talking about hard-coding a fast path for a
hard-coded list of OIDs - which would seem a little bit unprincipled -
but hard-coding a fast path for the bit patterns that are themselves
hard-coded. I don't think we lose anything in terms of extensibility
or even-handedness there; it's just avoiding a bunch of rigamarole
that doesn't really buy us anything.

All these points apply equally to toast_decompress_datum_slice() and
toast_compress_datum().

+       /* Fallback to default compression method, if not specified */
+       if (!OidIsValid(cmoid))
+               cmoid = DefaultCompressionOid;

I think that the caller should be required to specify a legal value,
and this should be an elog(ERROR) or an Assert().

The change to equalTupleDescs() makes me wonder. Like, can we specify
the compression method for a function parameter, or a function return
value? I would think not. But then how are the tuple descriptors set
up in that case? Under what circumstances do we actually need the
tuple descriptors to compare unequal?

lz4.c's header comment calls it cm_lz4.c, and the pathname is wrong too.

I wonder if we should try to adopt a convention for the names of these
files that isn't just the compression method name, like cmlz4 or
compress_lz4. I kind of like the latter one. I am a little worried
that just calling it lz4.c will result in name collisions later - not
in this directory, of course, but elsewhere in the system. It's not a
disaster if that happens, but for example verbose error reports print
the file name, so it's nice if it's unambiguous.

+               if (!IsBinaryUpgrade &&
+                       (relkind == RELKIND_RELATION ||
+                        relkind == RELKIND_PARTITIONED_TABLE))
+                       attr->attcompression =
+
GetAttributeCompressionMethod(attr, colDef->compression);
+               else
+                       attr->attcompression = InvalidOid;

Storing InvalidOid in the IsBinaryUpgrade case looks wrong. If
upgrading from pre-v14, we need to store PGLZ_COMPRESSION_OID.
Otherwise, we need to preserve whatever value was present in the old
version. Or am I confused here?

I think there should be tests for the way this interacts with
partitioning, and I think the intended interaction should be
documented. Perhaps it should behave like TABLESPACE, where the parent
property has no effect on what gets stored because the parent has no
storage, but is inherited by each new child.

I wonder in passing about TOAST tables and materialized views, which
are the other things that have storage. What gets stored for
attcompression? For a TOAST table it probably doesn't matter much
since TOAST table entries shouldn't ever be toasted themselves, so
anything that doesn't crash is fine (but maybe we should test that
trying to alter the compression properties of a TOAST table doesn't
crash, for example). For a materialized view it seems reasonable to
want to set column properties, but I'm not quite sure how that works
today for things like STORAGE anyway. If we do allow setting STORAGE
or COMPRESSION for materialized view columns then dump-and-reload
needs to preserve the values.

+       /*
+        * Use default compression method if the existing compression method is
+        * invalid but the new storage type is non plain storage.
+        */
+       if (!OidIsValid(attrtuple->attcompression) &&
+               (newstorage != TYPSTORAGE_PLAIN))
+               attrtuple->attcompression = DefaultCompressionOid;

You have a few too many parens in there.

I don't see a particularly good reason to treat plain and external
differently. More generally, I think there's a question here about
when we need an attribute to have a valid compression type and when we
don't. If typstorage is plan or external, then there's no point in
ever having a compression type and maybe we should even reject
attempts to set one (but I'm not sure). However, the attstorage is a
different case. Suppose the column is created with extended storage
and then later it's changed to plain. That's only a hint, so there may
still be toasted values in that column, so the compression setting
must endure. At any rate, we need to make sure we have clear and
sensible rules for when attcompression (a) must be valid, (b) may be
valid, and (c) must be invalid. And those rules need to at least be
documented in the comments, and maybe in the SGML docs.

I'm out of time for today, so I'll have to look at this more another
day. Hope this helps for a start.

--
Robert Haas
EDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Custom compression methods

Dilip Kumar-2
On Sat, Nov 21, 2020 at 3:50 AM Robert Haas <[hidden email]> wrote:

>
> On Wed, Nov 11, 2020 at 9:39 AM Dilip Kumar <[hidden email]> wrote:
> > There were a few problems in this rebased version, basically, the
> > compression options were not passed while compressing values from the
> > brin_form_tuple, so I have fixed this.
>
> Since the authorship history of this patch is complicated, it would be
> nice if you would include authorship information and relevant
> "Discussion" links in the patches.
>
> Design level considerations and overall notes:
>
> configure is autogenerated from configure.in, so the patch shouldn't
> include changes only to the former.
>
> Looking over the changes to src/include:
>
> +       PGLZ_COMPRESSION_ID,
> +       LZ4_COMPRESSION_ID
>
> I think that it would be good to assign values to these explicitly.
>
> +/* compresion handler routines */
>
> Spelling.
>
> +       /* compression routine for the compression method */
> +       cmcompress_function cmcompress;
> +
> +       /* decompression routine for the compression method */
> +       cmcompress_function cmdecompress;
>
> Don't reuse cmcompress_function; that's confusing. Just have a typedef
> per structure member, even if they end up being the same.
>
>  #define TOAST_COMPRESS_SET_RAWSIZE(ptr, len) \
> -       (((toast_compress_header *) (ptr))->rawsize = (len))
> +do { \
> +       Assert(len > 0 && len <= RAWSIZEMASK); \
> +       ((toast_compress_header *) (ptr))->info = (len); \
> +} while (0)
>
> Indentation.
>
> +#define TOAST_COMPRESS_SET_COMPRESSION_METHOD(ptr, cm_method) \
> +       ((toast_compress_header *) (ptr))->info |= ((cm_method) << 30);
>
> What about making TOAST_COMPRESS_SET_RAWSIZE() take another argument?
> And possibly also rename it to TEST_COMPRESS_SET_SIZE_AND_METHOD() or
> something? It seems not great to have separate functions each setting
> part of a 4-byte quantity. Too much chance of failing to set both
> parts. I guess you've got a function called
> toast_set_compressed_datum_info() for that, but it's just a wrapper
> around two macros that could just be combined, which would reduce
> complexity overall.
>
> +       T_CompressionRoutine,           /* in access/compressionapi.h */
>
> This looks misplaced. I guess it should go just after these:
>
>     T_FdwRoutine,               /* in foreign/fdwapi.h */
>     T_IndexAmRoutine,           /* in access/amapi.h */
>     T_TableAmRoutine,           /* in access/tableam.h */
>
> Looking over the regression test changes:
>
> The tests at the top of create_cm.out that just test that we can
> create tables with various storage types seem unrelated to the purpose
> of the patch. And the file doesn't test creating a compression method
> either, as the file name would suggest, so either the file name needs
> to be changed (compression, compression_method?) or the tests don't go
> here.
>
> +-- check data is okdd
>
> I guess whoever is responsible for this comment prefers vi to emacs.
>
> I don't quite understand the purpose of all of these tests, and there
> are some things that I feel like ought to be tested that seemingly
> aren't. Like, you seem to test using an UPDATE to move a datum from a
> table to another table with the same compression method, but not one
> with a different compression method. Testing the former is nice and
> everything, but that's the easy case: I think we also need to test the
> latter. I think it would be good to verify not only that the data is
> readable but that it's compressed the way we expect. I think it would
> be a great idea to add a pg_column_compression() function in a similar
> spirit to pg_column_size(). Perhaps it could return NULL when
> compression is not in use or the data type is not varlena, and the
> name of the compression method otherwise. That would allow for better
> testing of this feature, and it would also be useful to users who are
> switching methods, to see what data they still have that's using the
> old method. It could be useful for debugging problems on customer
> systems, too.
>
> I wonder if we need a test that moves data between tables through an
> intermediary. For instance, suppose a plpgsql function or DO block
> fetches some data and stores it in a plpgsql variable and then uses
> the variable to insert into another table. Hmm, maybe that would force
> de-TOASTing. But perhaps there are other cases. Maybe a more general
> way to approach the problem is: have you tried running a coverage
> report and checked which parts of your code are getting exercised by
> the existing tests and which parts are not? The stuff that isn't, we
> should try to add more tests. It's easy to get corner cases wrong with
> this kind of thing.
>
> I notice that LIKE INCLUDING COMPRESSION doesn't seem to be tested, at
> least not by 0001, which reinforces my feeling that the tests here are
> not as thorough as they could be.
>
> +NOTICE:  pg_compression contains unpinned initdb-created object(s)
>
> This seems wrong to me - why is it OK?
>
> -       result = (struct varlena *)
> -               palloc(TOAST_COMPRESS_RAWSIZE(attr) + VARHDRSZ);
> -       SET_VARSIZE(result, TOAST_COMPRESS_RAWSIZE(attr) + VARHDRSZ);
> +       cmoid = GetCompressionOidFromCompressionId(TOAST_COMPRESS_METHOD(attr));
>
> -       if (pglz_decompress(TOAST_COMPRESS_RAWDATA(attr),
> -                                               TOAST_COMPRESS_SIZE(attr),
> -                                               VARDATA(result),
> -
> TOAST_COMPRESS_RAWSIZE(attr), true) < 0)
> -               elog(ERROR, "compressed data is corrupted");
> +       /* get compression method handler routines */
> +       cmroutine = GetCompressionRoutine(cmoid);
>
> -       return result;
> +       return cmroutine->cmdecompress(attr);
>
> I'm worried about how expensive this might be, and I think we could
> make it cheaper. The reason why I think this might be expensive is:
> currently, for every datum, you have a single direct function call.
> Now, with this, you first have a direct function call to
> GetCompressionOidFromCompressionId(). Then you have a call to
> GetCompressionRoutine(), which does a syscache lookup and calls a
> handler function, which is quite a lot more expensive than a single
> function call. And the handler isn't even returning a statically
> allocated structure, but is allocating new memory every time, which
> involves more function calls and maybe memory leaks. Then you use the
> results of all that to make an indirect function call.
>
> I'm not sure exactly what combination of things we could use to make
> this better, but it seems like there are a few possibilities:
>
> (1) The handler function could return a pointer to the same
> CompressionRoutine every time instead of constructing a new one every
> time.
> (2) The CompressionRoutine to which the handler function returns a
> pointer could be statically allocated instead of being built at
> runtime.
> (3) GetCompressionRoutine could have an OID -> handler cache instead
> of relying on syscache + calling the handler function all over again.
> (4) For the compression types that have dedicated bit patterns in the
> high bits of the compressed TOAST size, toast_compress_datum() could
> just have hard-coded logic to use the correct handlers instead of
> translating the bit pattern into an OID and then looking it up over
> again.
> (5) Going even further than #4 we could skip the handler layer
> entirely for such methods, and just call the right function directly.
>
> I think we should definitely do (1), and also (2) unless there's some
> reason it's hard. (3) doesn't need to be part of this patch, but might
> be something to consider later in the series. It's possible that it
> doesn't have enough benefit to be worth the work, though. Also, I
> think we should do either (4) or (5). I have a mild preference for (5)
> unless it looks too ugly.
>
> Note that I'm not talking about hard-coding a fast path for a
> hard-coded list of OIDs - which would seem a little bit unprincipled -
> but hard-coding a fast path for the bit patterns that are themselves
> hard-coded. I don't think we lose anything in terms of extensibility
> or even-handedness there; it's just avoiding a bunch of rigamarole
> that doesn't really buy us anything.
>
> All these points apply equally to toast_decompress_datum_slice() and
> toast_compress_datum().
>
> +       /* Fallback to default compression method, if not specified */
> +       if (!OidIsValid(cmoid))
> +               cmoid = DefaultCompressionOid;
>
> I think that the caller should be required to specify a legal value,
> and this should be an elog(ERROR) or an Assert().
>
> The change to equalTupleDescs() makes me wonder. Like, can we specify
> the compression method for a function parameter, or a function return
> value? I would think not. But then how are the tuple descriptors set
> up in that case? Under what circumstances do we actually need the
> tuple descriptors to compare unequal?
>
> lz4.c's header comment calls it cm_lz4.c, and the pathname is wrong too.
>
> I wonder if we should try to adopt a convention for the names of these
> files that isn't just the compression method name, like cmlz4 or
> compress_lz4. I kind of like the latter one. I am a little worried
> that just calling it lz4.c will result in name collisions later - not
> in this directory, of course, but elsewhere in the system. It's not a
> disaster if that happens, but for example verbose error reports print
> the file name, so it's nice if it's unambiguous.
>
> +               if (!IsBinaryUpgrade &&
> +                       (relkind == RELKIND_RELATION ||
> +                        relkind == RELKIND_PARTITIONED_TABLE))
> +                       attr->attcompression =
> +
> GetAttributeCompressionMethod(attr, colDef->compression);
> +               else
> +                       attr->attcompression = InvalidOid;
>
> Storing InvalidOid in the IsBinaryUpgrade case looks wrong. If
> upgrading from pre-v14, we need to store PGLZ_COMPRESSION_OID.
> Otherwise, we need to preserve whatever value was present in the old
> version. Or am I confused here?
>
> I think there should be tests for the way this interacts with
> partitioning, and I think the intended interaction should be
> documented. Perhaps it should behave like TABLESPACE, where the parent
> property has no effect on what gets stored because the parent has no
> storage, but is inherited by each new child.
>
> I wonder in passing about TOAST tables and materialized views, which
> are the other things that have storage. What gets stored for
> attcompression? For a TOAST table it probably doesn't matter much
> since TOAST table entries shouldn't ever be toasted themselves, so
> anything that doesn't crash is fine (but maybe we should test that
> trying to alter the compression properties of a TOAST table doesn't
> crash, for example). For a materialized view it seems reasonable to
> want to set column properties, but I'm not quite sure how that works
> today for things like STORAGE anyway. If we do allow setting STORAGE
> or COMPRESSION for materialized view columns then dump-and-reload
> needs to preserve the values.
>
> +       /*
> +        * Use default compression method if the existing compression method is
> +        * invalid but the new storage type is non plain storage.
> +        */
> +       if (!OidIsValid(attrtuple->attcompression) &&
> +               (newstorage != TYPSTORAGE_PLAIN))
> +               attrtuple->attcompression = DefaultCompressionOid;
>
> You have a few too many parens in there.
>
> I don't see a particularly good reason to treat plain and external
> differently. More generally, I think there's a question here about
> when we need an attribute to have a valid compression type and when we
> don't. If typstorage is plan or external, then there's no point in
> ever having a compression type and maybe we should even reject
> attempts to set one (but I'm not sure). However, the attstorage is a
> different case. Suppose the column is created with extended storage
> and then later it's changed to plain. That's only a hint, so there may
> still be toasted values in that column, so the compression setting
> must endure. At any rate, we need to make sure we have clear and
> sensible rules for when attcompression (a) must be valid, (b) may be
> valid, and (c) must be invalid. And those rules need to at least be
> documented in the comments, and maybe in the SGML docs.
>
> I'm out of time for today, so I'll have to look at this more another
> day. Hope this helps for a start.
>

Thanks for the review Robert, I will work on these comments and
provide my analysis along with the updated patch in a couple of days.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Custom compression methods

Dilip Kumar-2
In reply to this post by Robert Haas
On Sat, Nov 21, 2020 at 3:50 AM Robert Haas <[hidden email]> wrote:

Most of the comments looks fine to me but I have a slightly different
opinion for one of the comment so replying only for that.

> I'm worried about how expensive this might be, and I think we could
> make it cheaper. The reason why I think this might be expensive is:
> currently, for every datum, you have a single direct function call.
> Now, with this, you first have a direct function call to
> GetCompressionOidFromCompressionId(). Then you have a call to
> GetCompressionRoutine(), which does a syscache lookup and calls a
> handler function, which is quite a lot more expensive than a single
> function call. And the handler isn't even returning a statically
> allocated structure, but is allocating new memory every time, which
> involves more function calls and maybe memory leaks. Then you use the
> results of all that to make an indirect function call.
>
> I'm not sure exactly what combination of things we could use to make
> this better, but it seems like there are a few possibilities:
>
> (1) The handler function could return a pointer to the same
> CompressionRoutine every time instead of constructing a new one every
> time.
> (2) The CompressionRoutine to which the handler function returns a
> pointer could be statically allocated instead of being built at
> runtime.
> (3) GetCompressionRoutine could have an OID -> handler cache instead
> of relying on syscache + calling the handler function all over again.
> (4) For the compression types that have dedicated bit patterns in the
> high bits of the compressed TOAST size, toast_compress_datum() could
> just have hard-coded logic to use the correct handlers instead of
> translating the bit pattern into an OID and then looking it up over
> again.
> (5) Going even further than #4 we could skip the handler layer
> entirely for such methods, and just call the right function directly.
>
> I think we should definitely do (1), and also (2) unless there's some
> reason it's hard. (3) doesn't need to be part of this patch, but might
> be something to consider later in the series. It's possible that it
> doesn't have enough benefit to be worth the work, though. Also, I
> think we should do either (4) or (5). I have a mild preference for (5)
> unless it looks too ugly.
>
> Note that I'm not talking about hard-coding a fast path for a
> hard-coded list of OIDs - which would seem a little bit unprincipled -
> but hard-coding a fast path for the bit patterns that are themselves
> hard-coded. I don't think we lose anything in terms of extensibility
> or even-handedness there; it's just avoiding a bunch of rigamarole
> that doesn't really buy us anything.
>
> All these points apply equally to toast_decompress_datum_slice() and
> toast_compress_datum().

I agree that (1) and (2) we shall definitely do as part of the first
patch and (3) we might do in later patches.  I think from (4) and (5)
I am more inclined to do (4) for a couple of reasons
a) If we bypass the handler function and directly calls the
compression and decompression routines then we need to check whether
the current executable is compiled with this particular compression
library or not for example in 'lz4handler' we have this below check,
now if we don't have the handler function we either need to put this
in each compression/decompression functions or we need to put is in
each caller.
Datum
lz4handler(PG_FUNCTION_ARGS)
{
#ifndef HAVE_LIBLZ4
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("not built with lz4 support")));
#else

b) Another reason is that once we start supporting the compression
options (0006-Support-compression-methods-options.patch) then we also
need to call 'cminitstate_function' for parsing the compression
options and then calling the compression function, so we need to
hardcode multiple function calls.

I think b) is still okay but because of a) I am more inclined to do
(4), what is your opinion on this?

About (4), one option is that we directly call the correct handler
function for the built-in type directly from
toast_(de)compress(_slice) functions but in that case, we are
duplicating the code, another option is that we call the
GetCompressionRoutine() a common function and in that, for the
built-in type, we can directly call the corresponding handler function
and get the routine.  The only thing is to avoid duplicating in
decompression routine we need to convert CompressionId to Oid before
calling GetCompressionRoutine(), but now we can avoid sys cache lookup
for the built-in type.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Custom compression methods

Robert Haas
On Tue, Nov 24, 2020 at 7:11 AM Dilip Kumar <[hidden email]> wrote:

> About (4), one option is that we directly call the correct handler
> function for the built-in type directly from
> toast_(de)compress(_slice) functions but in that case, we are
> duplicating the code, another option is that we call the
> GetCompressionRoutine() a common function and in that, for the
> built-in type, we can directly call the corresponding handler function
> and get the routine.  The only thing is to avoid duplicating in
> decompression routine we need to convert CompressionId to Oid before
> calling GetCompressionRoutine(), but now we can avoid sys cache lookup
> for the built-in type.

Suppose that we have a variable lz4_methods (like heapam_methods) that
is always defined, whether or not lz4 support is present. It's defined
like this:

const CompressionAmRoutine lz4_compress_methods = {
    .datum_compress = lz4_datum_compress,
    .datum_decompress = lz4_datum_decompress,
    .datum_decompress_slice = lz4_datum_decompress_slice
};

(It would be good, I think, to actually name things something like
this - in particular why would we have TableAmRoutine and
IndexAmRoutine but not include "Am" in the one for compression? In
general I think tableam is a good pattern to adhere to and we should
try to make this patch hew closely to it.)

Then those functions are contingent on #ifdef HAVE_LIBLZ4: they either
do their thing, or complain that lz4 compression is not supported.
Then in this function you can just say, well, if we have the 01 bit
pattern, handler = &lz4_compress_methods and proceed from there.

BTW, I think the "not supported" message should probably use the 'by
this build' language we use in some places i.e.

[rhaas pgsql]$ git grep errmsg.*'this build' | grep -vF .po:
contrib/pg_prewarm/pg_prewarm.c: errmsg("prefetch is not supported by
this build")));
src/backend/libpq/be-secure-openssl.c: (errmsg("\"%s\" setting \"%s\"
not supported by this build",
src/backend/libpq/be-secure-openssl.c: (errmsg("\"%s\" setting \"%s\"
not supported by this build",
src/backend/libpq/hba.c: errmsg("local connections are not supported
by this build"),
src/backend/libpq/hba.c: errmsg("hostssl record cannot match because
SSL is not supported by this build"),
src/backend/libpq/hba.c: errmsg("hostgssenc record cannot match
because GSSAPI is not supported by this build"),
src/backend/libpq/hba.c: errmsg("invalid authentication method \"%s\":
not supported by this build",
src/backend/utils/adt/pg_locale.c: errmsg("ICU is not supported in
this build"), \
src/backend/utils/misc/guc.c: GUC_check_errmsg("Bonjour is not
supported by this build");
src/backend/utils/misc/guc.c: GUC_check_errmsg("SSL is not supported
by this build");

--
Robert Haas
EDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Custom compression methods

Dilip Kumar-2
On Tue, Nov 24, 2020 at 7:14 PM Robert Haas <[hidden email]> wrote:

>
> On Tue, Nov 24, 2020 at 7:11 AM Dilip Kumar <[hidden email]> wrote:
> > About (4), one option is that we directly call the correct handler
> > function for the built-in type directly from
> > toast_(de)compress(_slice) functions but in that case, we are
> > duplicating the code, another option is that we call the
> > GetCompressionRoutine() a common function and in that, for the
> > built-in type, we can directly call the corresponding handler function
> > and get the routine.  The only thing is to avoid duplicating in
> > decompression routine we need to convert CompressionId to Oid before
> > calling GetCompressionRoutine(), but now we can avoid sys cache lookup
> > for the built-in type.
>
> Suppose that we have a variable lz4_methods (like heapam_methods) that
> is always defined, whether or not lz4 support is present. It's defined
> like this:
>
> const CompressionAmRoutine lz4_compress_methods = {
>     .datum_compress = lz4_datum_compress,
>     .datum_decompress = lz4_datum_decompress,
>     .datum_decompress_slice = lz4_datum_decompress_slice
> };

Yeah, this makes sense.

>
> (It would be good, I think, to actually name things something like
> this - in particular why would we have TableAmRoutine and
> IndexAmRoutine but not include "Am" in the one for compression? In
> general I think tableam is a good pattern to adhere to and we should
> try to make this patch hew closely to it.)

For the compression routine name, I did not include "Am" because
currently, we are storing the compression method in the new catalog
"pg_compression" not in the pg_am.   So are you suggesting that we
should store the compression methods also in the pg_am instead of
creating a new catalog?  IMHO, storing the compression methods in a
new catalog is a better option instead of storing them in pg_am
because actually, the compression methods are not the same as heap or
index AMs, I mean they are actually not the access methods.  Am I
missing something?

> Then those functions are contingent on #ifdef HAVE_LIBLZ4: they either
> do their thing, or complain that lz4 compression is not supported.
> Then in this function you can just say, well, if we have the 01 bit
> pattern, handler = &lz4_compress_methods and proceed from there.

Okay

> BTW, I think the "not supported" message should probably use the 'by
> this build' language we use in some places i.e.
>

Okay

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Custom compression methods

Robert Haas
On Tue, Nov 24, 2020 at 10:47 AM Dilip Kumar <[hidden email]> wrote:
> For the compression routine name, I did not include "Am" because
> currently, we are storing the compression method in the new catalog
> "pg_compression" not in the pg_am.   So are you suggesting that we
> should store the compression methods also in the pg_am instead of
> creating a new catalog?  IMHO, storing the compression methods in a
> new catalog is a better option instead of storing them in pg_am
> because actually, the compression methods are not the same as heap or
> index AMs, I mean they are actually not the access methods.  Am I
> missing something?

Oh, I thought it had been suggested in previous discussions that these
should be treated as access methods rather than inventing a whole new
concept just for this, and it seemed like a good idea to me. I guess I
missed the fact that the patch wasn't doing it that way. Hmm.

--
Robert Haas
EDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Custom compression methods

Tom Lane-2
Robert Haas <[hidden email]> writes:

> On Tue, Nov 24, 2020 at 10:47 AM Dilip Kumar <[hidden email]> wrote:
>> For the compression routine name, I did not include "Am" because
>> currently, we are storing the compression method in the new catalog
>> "pg_compression" not in the pg_am.   So are you suggesting that we
>> should store the compression methods also in the pg_am instead of
>> creating a new catalog?  IMHO, storing the compression methods in a
>> new catalog is a better option instead of storing them in pg_am
>> because actually, the compression methods are not the same as heap or
>> index AMs, I mean they are actually not the access methods.  Am I
>> missing something?

> Oh, I thought it had been suggested in previous discussions that these
> should be treated as access methods rather than inventing a whole new
> concept just for this, and it seemed like a good idea to me. I guess I
> missed the fact that the patch wasn't doing it that way. Hmm.

FWIW, I kind of agree with Robert's take on this.  Heap and index AMs
are pretty fundamentally different animals, yet we don't have a problem
sticking them in the same catalog.  I think anything that is related to
storage access could reasonably go into that catalog, rather than
inventing a new one.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Custom compression methods

Robert Haas
On Tue, Nov 24, 2020 at 1:21 PM Tom Lane <[hidden email]> wrote:
> FWIW, I kind of agree with Robert's take on this.  Heap and index AMs
> are pretty fundamentally different animals, yet we don't have a problem
> sticking them in the same catalog.  I think anything that is related to
> storage access could reasonably go into that catalog, rather than
> inventing a new one.

It's good to have your opinion on this since I wasn't totally sure
what was best, but for the record, I can't take credit. Looks like it
was Álvaro's suggestion originally:

http://postgr.es/m/20171130205155.7mgq2cuqv6zxi25a@...

--
Robert Haas
EDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Custom compression methods

Álvaro Herrera
In reply to this post by Tom Lane-2
On 2020-Nov-24, Tom Lane wrote:

> Robert Haas <[hidden email]> writes:

> > Oh, I thought it had been suggested in previous discussions that these
> > should be treated as access methods rather than inventing a whole new
> > concept just for this, and it seemed like a good idea to me. I guess I
> > missed the fact that the patch wasn't doing it that way. Hmm.
>
> FWIW, I kind of agree with Robert's take on this.  Heap and index AMs
> are pretty fundamentally different animals, yet we don't have a problem
> sticking them in the same catalog.  I think anything that is related to
> storage access could reasonably go into that catalog, rather than
> inventing a new one.

Right -- Something like amname=lz4, amhandler=lz4handler, amtype=c.
The core code must of course know how to instantiate an AM of type
'c' and what to use it for.

https://postgr.es/m/20171213151818.75a20259@...


1 ... 78910