Custom compression methods

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
24 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Custom compression methods

Ildus Kurbangaliev
Hello hackers!

I've attached a patch that implements custom compression
methods. This patch is based on Nikita Glukhov's code (which he hasn't
publish in mailing lists) for jsonb compression. This is early but
working version of the patch, and there are still few fixes and features
that should be implemented (like pg_dump support and support of
compression options for types), and it requires more testing. But I'd
like to get some feedback at the current stage first.

There's been a proposal [1] of Alexander Korotkov and some discussion
about custom compression methods before. This is an implementation of
per-datum compression. Syntax is similar to the one in proposal but not
the same.

Syntax:

CREATE COMPRESSION METHOD <cmname> HANDLER <compression_handler>;
DROP COMPRESSION METHOD <cmname>;

Compression handler is a function that returns a structure containing
compression routines:

- configure - function called when the compression method applied to
  an attribute
- drop - called when the compression method is removed from an attribute
- compress - compress function
- decompress - decompress function

User can create compressed columns with the commands below:

CREATE TABLE t(a tsvector COMPRESSED <cmname> WITH <options>);
ALTER TABLE t ALTER COLUMN a SET COMPRESSED <cmname> WITH <options>;
ALTER TABLE t ALTER COLUMN a SET NOT COMPRESSED;

Also there is syntax of binding compression methods to types:

ALTER TYPE <type> SET COMPRESSED <cmname>;
ALTER TYPE <type> SET NOT COMPRESSED;

There are two new tables in the catalog, pg_compression and
pg_compression_opt. pg_compression is used as storage of compression
methods, and pg_compression_opt is used to store specific compression
options for particular column.

When user binds a compression method to some column a new record in
pg_compression_opt is created and all further attribute values will
contain compression options Oid while old values will remain unchanged.
And when we alter a compression method for
the attribute it won't change previous record in pg_compression_opt.
Instead it'll create a new one and new values will be stored
with new Oid. That way there is no need of recompression of the old
tuples. And also tuples containing compressed datums can be copied to
other tables so records in pg_compression_opt shouldn't be removed. In
the current patch they can be removed with DROP COMPRESSION METHOD
CASCADE, but after that decompression won't be possible on compressed
tuples. Maybe CASCADE should keep compression options.

I haven't changed the base logic of working with compressed datums. It
means that custom compressed datums behave exactly the same as current
LZ compressed datums, and the logic differs only in toast_compress_datum
and toast_decompress_datum.

This patch doesn't break backward compability and should work seamlessly
with older version of database. I used one of two free bits in
`va_rawsize` from `varattrib_4b->va_compressed` as flag of custom
compressed datums. Also I renamed it to `va_info` since it contains not
only rawsize now.

The patch also includes custom compression method for tsvector which is
used in tests.

[1]
https://www.postgresql.org/message-id/CAPpHfdsdTA5uZeq6MNXL5ZRuNx%2BSig4ykWzWEAfkC6ZKMDy6%3DQ%40mail.gmail.com
--
---
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

custom_compression_methods_v1.patch (293K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Custom compression methods

Ildus Kurbangaliev
On Thu, 7 Sep 2017 19:42:36 +0300
Ildus Kurbangaliev <[hidden email]> wrote:

> Hello hackers!
>
> I've attached a patch that implements custom compression
> methods. This patch is based on Nikita Glukhov's code (which he hasn't
> publish in mailing lists) for jsonb compression. This is early but
> working version of the patch, and there are still few fixes and
> features that should be implemented (like pg_dump support and support
> of compression options for types), and it requires more testing. But
> I'd like to get some feedback at the current stage first.
>
> There's been a proposal [1] of Alexander Korotkov and some discussion
> about custom compression methods before. This is an implementation of
> per-datum compression. Syntax is similar to the one in proposal but
> not the same.
>
> Syntax:
>
> CREATE COMPRESSION METHOD <cmname> HANDLER <compression_handler>;
> DROP COMPRESSION METHOD <cmname>;
>
> Compression handler is a function that returns a structure containing
> compression routines:
>
> - configure - function called when the compression method applied to
>   an attribute
> - drop - called when the compression method is removed from an
> attribute
> - compress - compress function
> - decompress - decompress function
>
> User can create compressed columns with the commands below:
>
> CREATE TABLE t(a tsvector COMPRESSED <cmname> WITH <options>);
> ALTER TABLE t ALTER COLUMN a SET COMPRESSED <cmname> WITH <options>;
> ALTER TABLE t ALTER COLUMN a SET NOT COMPRESSED;
>
> Also there is syntax of binding compression methods to types:
>
> ALTER TYPE <type> SET COMPRESSED <cmname>;
> ALTER TYPE <type> SET NOT COMPRESSED;
>
> There are two new tables in the catalog, pg_compression and
> pg_compression_opt. pg_compression is used as storage of compression
> methods, and pg_compression_opt is used to store specific compression
> options for particular column.
>
> When user binds a compression method to some column a new record in
> pg_compression_opt is created and all further attribute values will
> contain compression options Oid while old values will remain
> unchanged. And when we alter a compression method for
> the attribute it won't change previous record in pg_compression_opt.
> Instead it'll create a new one and new values will be stored
> with new Oid. That way there is no need of recompression of the old
> tuples. And also tuples containing compressed datums can be copied to
> other tables so records in pg_compression_opt shouldn't be removed. In
> the current patch they can be removed with DROP COMPRESSION METHOD
> CASCADE, but after that decompression won't be possible on compressed
> tuples. Maybe CASCADE should keep compression options.
>
> I haven't changed the base logic of working with compressed datums. It
> means that custom compressed datums behave exactly the same as current
> LZ compressed datums, and the logic differs only in
> toast_compress_datum and toast_decompress_datum.
>
> This patch doesn't break backward compability and should work
> seamlessly with older version of database. I used one of two free
> bits in `va_rawsize` from `varattrib_4b->va_compressed` as flag of
> custom compressed datums. Also I renamed it to `va_info` since it
> contains not only rawsize now.
>
> The patch also includes custom compression method for tsvector which
> is used in tests.
>
> [1]
> https://www.postgresql.org/message-id/CAPpHfdsdTA5uZeq6MNXL5ZRuNx%2BSig4ykWzWEAfkC6ZKMDy6%3DQ%40mail.gmail.com
Attached rebased version of the patch. Added support of pg_dump, the
code was simplified, and a separate cache for compression options was
added.

--
---
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

custom_compression_methods_v2.patch (320K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Custom compression methods

Peter Eisentraut-6
On 9/12/17 10:55, Ildus Kurbangaliev wrote:
>> The patch also includes custom compression method for tsvector which
>> is used in tests.
>>
>> [1]
>> https://www.postgresql.org/message-id/CAPpHfdsdTA5uZeq6MNXL5ZRuNx%2BSig4ykWzWEAfkC6ZKMDy6%3DQ%40mail.gmail.com
> Attached rebased version of the patch. Added support of pg_dump, the
> code was simplified, and a separate cache for compression options was
> added.

I would like to see some more examples of how this would be used, so we
can see how it should all fit together.

So far, it's not clear to me that we need a compression method as a
standalone top-level object.  It would make sense, perhaps, to have a
compression function attached to a type, so a type can provide a
compression function that is suitable for its specific storage.

The proposal here is very general: You can use any of the eligible
compression methods for any attribute.  That seems very complicated to
manage.  Any attribute could be compressed using either a choice of
general compression methods or a type-specific compression method, or
perhaps another type-specific compression method.  That's a lot.  Is
this about packing certain types better, or trying out different
compression algorithms, or about changing the TOAST thresholds, and so on?

Ideally, we would like something that just works, with minimal
configuration and nudging.  Let's see a list of problems to be solved
and then we can discuss what the right set of primitives might be to
address them.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Custom compression methods

Ildus Kurbangaliev
On Wed, 1 Nov 2017 17:05:58 -0400
Peter Eisentraut <[hidden email]> wrote:

> On 9/12/17 10:55, Ildus Kurbangaliev wrote:
> >> The patch also includes custom compression method for tsvector
> >> which is used in tests.
> >>
> >> [1]
> >> https://www.postgresql.org/message-id/CAPpHfdsdTA5uZeq6MNXL5ZRuNx%2BSig4ykWzWEAfkC6ZKMDy6%3DQ%40mail.gmail.com 
> > Attached rebased version of the patch. Added support of pg_dump, the
> > code was simplified, and a separate cache for compression options
> > was added.  
>
> I would like to see some more examples of how this would be used, so
> we can see how it should all fit together.
>
> So far, it's not clear to me that we need a compression method as a
> standalone top-level object.  It would make sense, perhaps, to have a
> compression function attached to a type, so a type can provide a
> compression function that is suitable for its specific storage.

In this patch compression methods is suitable for MAIN and EXTENDED
storages like in current implementation in postgres. Just instead only
of LZ4 you can specify any other compression method.

Idea is not to change compression for some types, but give the user and
extension developers opportunity to change how data in some attribute
will be compressed because they know about it more than database itself.

>
> The proposal here is very general: You can use any of the eligible
> compression methods for any attribute.  That seems very complicated to
> manage.  Any attribute could be compressed using either a choice of
> general compression methods or a type-specific compression method, or
> perhaps another type-specific compression method.  That's a lot.  Is
> this about packing certain types better, or trying out different
> compression algorithms, or about changing the TOAST thresholds, and
> so on?

It is about extensibility of postgres, for example if you
need to store a lot of time series data you can create an extension that
stores array of timestamps in more optimized way, using delta encoding
or something else. I'm not sure that such specialized things should be
in core.

In case of array of timestamps in could look like this:

CREATE EXTENSION timeseries; -- some extension that provides compression
                                method

Extension installs a compression method:

CREATE OR REPLACE FUNCTION timestamps_compression_handler(INTERNAL)
RETURNS COMPRESSION_HANDLER AS 'MODULE_PATHNAME',
'timestamps_compression_handler' LANGUAGE C STRICT;

CREATE COMPRESSION METHOD cm1 HANDLER timestamps_compression_handler;

And user can specify it in his table:

CREATE TABLE t1 (
        time_series_data timestamp[] COMPRESSED cm1;
)

I think generalization of some method to a type is not a good idea. For
some attribute you could be happy with builtin LZ4, for other you can
need more compressibility and so on.

--
---
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Custom compression methods

Ildus Kurbangaliev
In reply to this post by Ildus Kurbangaliev
On Tue, 12 Sep 2017 17:55:05 +0300
Ildus Kurbangaliev <[hidden email]> wrote:

>
> Attached rebased version of the patch. Added support of pg_dump, the
> code was simplified, and a separate cache for compression options was
> added.
>

Attached version 3 of the patch. Rebased to the current master, removed
ALTER TYPE .. SET COMPRESSED syntax, fixed bug in compression options
cache.

--
---
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

custom_compression_methods_v3.patch (246K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Custom compression methods

Craig Ringer-3
In reply to this post by Ildus Kurbangaliev
On 2 November 2017 at 17:41, Ildus Kurbangaliev
<[hidden email]> wrote:

> In this patch compression methods is suitable for MAIN and EXTENDED
> storages like in current implementation in postgres. Just instead only
> of LZ4 you can specify any other compression method.

We've had this discussion before.

Please read the "pluggable compression support" thread. See you in a
few days ;) sorry, it's kinda long.

https://www.postgresql.org/message-id/flat/20130621000900.GA12425%40alap2.anarazel.de#20130621000900.GA12425@...

IIRC there were some concerns about what happened with pg_upgrade,
with consuming precious toast bits, and a few other things.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Custom compression methods

Oleg Bartunov-2
On Thu, Nov 2, 2017 at 6:02 PM, Craig Ringer <[hidden email]> wrote:

> On 2 November 2017 at 17:41, Ildus Kurbangaliev
> <[hidden email]> wrote:
>
>> In this patch compression methods is suitable for MAIN and EXTENDED
>> storages like in current implementation in postgres. Just instead only
>> of LZ4 you can specify any other compression method.
>
> We've had this discussion before.
>
> Please read the "pluggable compression support" thread. See you in a
> few days ;) sorry, it's kinda long.
>
> https://www.postgresql.org/message-id/flat/20130621000900.GA12425%40alap2.anarazel.de#20130621000900.GA12425@...
>

the proposed patch provides "pluggable" compression and let's user
decide by their own which algorithm to use.
The postgres core doesn't responsible for any patent problem.


> IIRC there were some concerns about what happened with pg_upgrade,
> with consuming precious toast bits, and a few other things.

yes, pg_upgrade may be a problem.

>
> --
>  Craig Ringer                   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>
>
> --
> Sent via pgsql-hackers mailing list ([hidden email])
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Custom compression methods

Robert Haas
On Sun, Nov 5, 2017 at 2:22 PM, Oleg Bartunov <[hidden email]> wrote:
>> IIRC there were some concerns about what happened with pg_upgrade,
>> with consuming precious toast bits, and a few other things.
>
> yes, pg_upgrade may be a problem.

A basic problem here is that, as proposed, DROP COMPRESSION METHOD may
break your database irretrievably.  If there's no data compressed
using the compression method you dropped, everything is cool -
otherwise everything is broken and there's no way to recover.  The
only obvious alternative is to disallow DROP altogether (or make it
not really DROP).

Both of those alternatives sound fairly unpleasant to me, but I'm not
exactly sure what to recommend in terms of how to make it better.
Ideally anything we expose as an SQL command should have a DROP
command that undoes whatever CREATE did and leaves the database in an
intact state, but that seems hard to achieve in this case.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Custom compression methods

Adam Brusselback
> If there's no data compressed
> using the compression method you dropped, everything is cool -
> otherwise everything is broken and there's no way to recover.
> The only obvious alternative is to disallow DROP altogether (or make it
> not really DROP).

Wouldn't whatever was using the compression method have something
marking which method was used? If so, couldn't we just scan if there is
any data using it, and if so disallow the drop, or possibly an option to allow
the drop and rewrite the table either uncompressed, or with the default
compression method?


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Custom compression methods

Tom Lane-2
In reply to this post by Robert Haas
Robert Haas <[hidden email]> writes:
> A basic problem here is that, as proposed, DROP COMPRESSION METHOD may
> break your database irretrievably.  If there's no data compressed
> using the compression method you dropped, everything is cool -
> otherwise everything is broken and there's no way to recover.  The
> only obvious alternative is to disallow DROP altogether (or make it
> not really DROP).

> Both of those alternatives sound fairly unpleasant to me, but I'm not
> exactly sure what to recommend in terms of how to make it better.
> Ideally anything we expose as an SQL command should have a DROP
> command that undoes whatever CREATE did and leaves the database in an
> intact state, but that seems hard to achieve in this case.

If the use of a compression method is tied to specific data types and/or
columns, then each of those could have a dependency on the compression
method, forcing a type or column drop if you did DROP COMPRESSION METHOD.
That would leave no reachable data using the removed compression method.
So that part doesn't seem unworkable on its face.

IIRC, the bigger concerns in the last discussion had to do with
replication, ie, can downstream servers make sense of the data.
Maybe that's not any worse than the issues you get with non-core
index AMs, but I'm not sure.

                        regards, tom lane


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Custom compression methods

Ildus Kurbangaliev
In reply to this post by Craig Ringer-3
On Thu, 2 Nov 2017 23:02:34 +0800
Craig Ringer <[hidden email]> wrote:

> On 2 November 2017 at 17:41, Ildus Kurbangaliev
> <[hidden email]> wrote:
>
> > In this patch compression methods is suitable for MAIN and EXTENDED
> > storages like in current implementation in postgres. Just instead
> > only of LZ4 you can specify any other compression method.  
>
> We've had this discussion before.
>
> Please read the "pluggable compression support" thread. See you in a
> few days ;) sorry, it's kinda long.
>
> https://www.postgresql.org/message-id/flat/20130621000900.GA12425%40alap2.anarazel.de#20130621000900.GA12425@...
>
> IIRC there were some concerns about what happened with pg_upgrade,
> with consuming precious toast bits, and a few other things.
>

Thank you for the link, I didn't see that thread when I looked over
mailing lists. I read it briefly, and I can address few things
relating to my patch.

Most concerns have been related with legal issues.
Actually that was the reason I did not include any new compression
algorithms to my patch. Unlike that patch mine only provides syntax
and is just a way to give the users use their own compression algorithms
and deal with any legal issues themselves.

I use only one unused bit in header (there's still one free ;), that's
enough to determine that data is compressed or not.

I did found out that pg_upgrade doesn't work properly with my patch,
soon I will send fix for it.

--
---
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Custom compression methods

Ildus Kurbangaliev
In reply to this post by Ildus Kurbangaliev
On Thu, 2 Nov 2017 15:28:36 +0300
Ildus Kurbangaliev <[hidden email]> wrote:

> On Tue, 12 Sep 2017 17:55:05 +0300
> Ildus Kurbangaliev <[hidden email]> wrote:
>
> >
> > Attached rebased version of the patch. Added support of pg_dump, the
> > code was simplified, and a separate cache for compression options
> > was added.
> >  
>
> Attached version 3 of the patch. Rebased to the current master,
> removed ALTER TYPE .. SET COMPRESSED syntax, fixed bug in compression
> options cache.
>
Attached version 4 of the patch. Fixed pg_upgrade and few other bugs.

--
---
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

custom_compression_methods_v4.patch (250K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Custom compression methods

Ildus Kurbangaliev
In reply to this post by Robert Haas
On Sun, 5 Nov 2017 17:34:23 -0500
Robert Haas <[hidden email]> wrote:

> On Sun, Nov 5, 2017 at 2:22 PM, Oleg Bartunov <[hidden email]>
> wrote:
> >> IIRC there were some concerns about what happened with pg_upgrade,
> >> with consuming precious toast bits, and a few other things.  
> >
> > yes, pg_upgrade may be a problem.  
>
> A basic problem here is that, as proposed, DROP COMPRESSION METHOD may
> break your database irretrievably.  If there's no data compressed
> using the compression method you dropped, everything is cool -
> otherwise everything is broken and there's no way to recover.  The
> only obvious alternative is to disallow DROP altogether (or make it
> not really DROP).

In the patch I use separate table for compresssion options (because
each attribute can have additional options for compression). So basicly
compressed attribute linked to compression options, not the compression
method and this method can be safely dropped.

So in the next version of the patch I can just unlink the options from
compression methods and dropping compression method will not affect
already compressed tuples. They still could be decompressed.

--
---
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Reply | Threaded
Open this post in threaded view
|

Re: Custom compression methods

Robert Haas
On Wed, Nov 15, 2017 at 4:09 AM, Ildus Kurbangaliev
<[hidden email]> wrote:
> So in the next version of the patch I can just unlink the options from
> compression methods and dropping compression method will not affect
> already compressed tuples. They still could be decompressed.

I guess I don't understand how that can work.  I mean, if somebody
removes a compression method - i.e. uninstalls the library - and you
don't have a way to make sure there are no tuples that can only be
uncompressed by that library - then you've broken the database.
Ideally, there should be a way to add a new compression method via an
extension ... and then get rid of it and all dependencies thereupon.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Reply | Threaded
Open this post in threaded view
|

Re: Custom compression methods

Ildar Musin
In reply to this post by Ildus Kurbangaliev
Hi Ildus,

On 14.11.2017 16:23, Ildus Kurbangaliev wrote:

> On Thu, 2 Nov 2017 15:28:36 +0300 Ildus Kurbangaliev
> <[hidden email]> wrote:
>
>> On Tue, 12 Sep 2017 17:55:05 +0300 Ildus Kurbangaliev
>> <[hidden email]> wrote:
>>
>>>
>>> Attached rebased version of the patch. Added support of pg_dump,
>>> the code was simplified, and a separate cache for compression
>>> options was added.
>>>
>>
>> Attached version 3 of the patch. Rebased to the current master,
>> removed ALTER TYPE .. SET COMPRESSED syntax, fixed bug in
>> compression options cache.
>>
>
> Attached version 4 of the patch. Fixed pg_upgrade and few other
> bugs.
>

I've started to review your code. And even though it's fine overall I
have few questions and comments (aside from DROP COMPRESSION METHOD
discussion).

1. I'm not sure about proposed syntax for ALTER TABLE command:

>> ALTER TABLE t ALTER COLUMN a SET COMPRESSED <cmname> WITH
>> (<options>); ALTER TABLE t ALTER COLUMN a SET NOT COMPRESSED;

ISTM it is more common for Postgres to use syntax like SET/DROP for
column options (SET/DROP NOT NULL, DEFAULT etc). My suggestion would be:

ALTER TABLE t ALTER COLUMN a SET COMPRESSED USING <compression_method>
WITH (<options>);
ALTER TABLE t ALTER COLUMN a DROP COMPRESSED;

(keyword USING here is similar to "CREATE INDEX ... USING <method>" syntax)

2. The way you changed DefineRelation() implies that caller is
responsible for creation of compression options. Probably it would be
better to create them within DefineRelation().

3. Few minor issues which seem like obsolete code:

Function freeRelOptions() is defined but never used.

Function getBaseTypeTuple() has been extracted from
getBaseTypeAndTypmod() but never used separately.

In toast_flatten_tuple_to_datum() there is untoasted_value variable
which is only used for meaningless assignment.

(Should I send a patch for that kind of issues?)

--
Ildar Musin
[hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Custom compression methods

Tomas Vondra-4
In reply to this post by Ildus Kurbangaliev
Hi,

On 11/14/2017 02:23 PM, Ildus Kurbangaliev wrote:
>
> ...
>
> Attached version 4 of the patch. Fixed pg_upgrade and few other bugs.
>

I did a review of this today, and I think there are some things that
need improvement / fixing.

Firstly, some basic comments from just eye-balling the diff, then some
bugs I discovered after writing an extension adding lz4.

1) formatRelOptions/freeRelOptions are no longer needed (I see Ildar
already pointer that out)

2) There's unnecessary whitespace (extra newlines) on a couple of
places, which is needlessly increasing the size of the patch. Small
difference, but annoying.

3) tuptoaster.c

Why do you change 'info' from int32 to uint32? Seems unnecessary.

Adding new 'att' variable in toast_insert_or_update is confusing, as
there already is 'att' in the very next loop. Technically it's correct,
but I'd bet it'll lead to some WTF?! moments later. I propose to just
use TupleDescAttr(tupleDesc,i) on the two places where it matters,
around line 808.

There are no comments for init_compression_options_htab and
get_compression_options_info, so that needs to be fixed. Moreover, the
names are confusing because what we really get is not just 'options' but
the compression routines too.

4) gen_db_file_maps probably shouldn't do the fprints, right?

5) not sure why you modify src/tools/pgindent/exclude_file_patterns

6) I'm rather confused by AttributeCompression vs. ColumnCompression. I
mean, attribute==column, right? Of course, one is for data from parser,
the other one is for internal info. But can we make the naming clearer?

7) The docs in general are somewhat unsatisfactory, TBH. For example the
ColumnCompression has no comments, unlike everything else in parsenodes.
Similarly for the SGML docs - I suggest to expand them to resemble FDW
docs (https://www.postgresql.org/docs/10/static/fdwhandler.html) which
also follows the handler/routines pattern.

8) One of the unclear things if why we even need 'drop' routing. It
seems that if it's defined DropAttributeCompression does something. But
what should it do? I suppose dropping the options should be done using
dependencies (just like we drop columns in this case).

BTW why does DropAttributeCompression mess with att->attisdropped in
this way? That seems a bit odd.

9) configure routines that only check if (options != NIL) and then error
out (like tsvector_configure) seem a bit unnecessary. Just allow it to
be NULL in CompressionMethodRoutine, and throw an error if options is
not NIL for such compression method.

10) toast_compress_datum still does this:

    if (!ac && (valsize < PGLZ_strategy_default->min_input_size ||
                valsize > PGLZ_strategy_default->max_input_size))

which seems rather pglz-specific (the naming is a hint). Why shouldn't
this be specific to compression, exposed either as min/max constants, or
wrapped in another routine - size_is_valid() or something like that?

11) The comments in toast_compress_datum probably need updating, as it
still references to pglz specifically. I guess the new compression
methods do matter too.

12) get_compression_options_info organizes the compression info into a
hash table by OID. The hash table implementation assumes the hash key is
at the beginning of the entry, but AttributeCompression is defined like
this:

    typedef struct
    {
        CompressionMethodRoutine *routine;
        List  *options;
        Oid cmoptoid;
    } AttributeCompression;

Which means get_compression_options_info is busted, will never lookup
anything, and the hash table will grow by adding more and more entries
into the same bucket. Of course, this has extremely negative impact on
performance (pretty much arbitrarily bad, depending on how many entries
you've already added to the hash table).

Moving the OID to the beginning of the struct fixes the issue.

13) When writing the experimental extension, I was extremely confused
about the regular varlena headers, custom compression headers, etc. In
the end I stole the code from tsvector.c and whacked it a bit until it
worked, but I wouldn't dare to claim I understand how it works.

This needs to be documented somewhere. For example postgres.h has a
bunch of paragraphs about varlena headers, so perhaps it should be
there? I see the patch tweaks some of the constants, but does not update
the comment at all.

Perhaps it would be useful to provide some additional macros making
access to custom-compressed varlena values easier. Or perhaps the
VARSIZE_ANY / VARSIZE_ANY_EXHDR / VARDATA_ANY already support that? This
part is not very clear to me.



regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

pg_lz4.tgz (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Custom compression methods

Tomas Vondra-4
In reply to this post by Robert Haas


On 11/15/2017 02:13 PM, Robert Haas wrote:

> On Wed, Nov 15, 2017 at 4:09 AM, Ildus Kurbangaliev
> <[hidden email]> wrote:
>> So in the next version of the patch I can just unlink the options from
>> compression methods and dropping compression method will not affect
>> already compressed tuples. They still could be decompressed.
>
> I guess I don't understand how that can work.  I mean, if somebody
> removes a compression method - i.e. uninstalls the library - and you
> don't have a way to make sure there are no tuples that can only be
> uncompressed by that library - then you've broken the database.
> Ideally, there should be a way to add a new compression method via an
> extension ... and then get rid of it and all dependencies thereupon.
>

I share your confusion. Once you do DROP COMPRESSION METHOD, there must
be no remaining data compressed with it. But that's what the patch is
doing already - it enforces this using dependencies, as usual.

Ildus, can you explain what you meant? How could the data still be
decompressed after DROP COMPRESSION METHOD, and possibly after removing
the .so library?

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: Custom compression methods

Ildus Kurbangaliev
On Mon, 20 Nov 2017 00:23:23 +0100
Tomas Vondra <[hidden email]> wrote:

> On 11/15/2017 02:13 PM, Robert Haas wrote:
> > On Wed, Nov 15, 2017 at 4:09 AM, Ildus Kurbangaliev
> > <[hidden email]> wrote:  
> >> So in the next version of the patch I can just unlink the options
> >> from compression methods and dropping compression method will not
> >> affect already compressed tuples. They still could be
> >> decompressed.  
> >
> > I guess I don't understand how that can work.  I mean, if somebody
> > removes a compression method - i.e. uninstalls the library - and you
> > don't have a way to make sure there are no tuples that can only be
> > uncompressed by that library - then you've broken the database.
> > Ideally, there should be a way to add a new compression method via
> > an extension ... and then get rid of it and all dependencies
> > thereupon.
>
> I share your confusion. Once you do DROP COMPRESSION METHOD, there
> must be no remaining data compressed with it. But that's what the
> patch is doing already - it enforces this using dependencies, as
> usual.
>
> Ildus, can you explain what you meant? How could the data still be
> decompressed after DROP COMPRESSION METHOD, and possibly after
> removing the .so library?

The removal of the .so library will broke all compressed tuples. I
don't see a way to avoid it. I meant that DROP COMPRESSION METHOD could
remove the record from 'pg_compression' table, but actually the
compressed tuple needs only a record from 'pg_compression_opt' where
its options are located. And there is dependency between an extension
and the options so you can't just remove the extension without CASCADE,
postgres will complain.

Still it's a problem if the user used for example `SELECT
<compressed_column> INTO * FROM *` because postgres will copy compressed
tuples, and there will not be any dependencies between destination and
the options.

Also thank you for review. I will look into it today.


--
---
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Reply | Threaded
Open this post in threaded view
|

Re: Custom compression methods

Tomas Vondra-4


On 11/20/2017 10:44 AM, Ildus Kurbangaliev wrote:

> On Mon, 20 Nov 2017 00:23:23 +0100
> Tomas Vondra <[hidden email]> wrote:
>
>> On 11/15/2017 02:13 PM, Robert Haas wrote:
>>> On Wed, Nov 15, 2017 at 4:09 AM, Ildus Kurbangaliev
>>> <[hidden email]> wrote:  
>>>> So in the next version of the patch I can just unlink the options
>>>> from compression methods and dropping compression method will not
>>>> affect already compressed tuples. They still could be
>>>> decompressed.  
>>>
>>> I guess I don't understand how that can work.  I mean, if somebody
>>> removes a compression method - i.e. uninstalls the library - and you
>>> don't have a way to make sure there are no tuples that can only be
>>> uncompressed by that library - then you've broken the database.
>>> Ideally, there should be a way to add a new compression method via
>>> an extension ... and then get rid of it and all dependencies
>>> thereupon.
>>
>> I share your confusion. Once you do DROP COMPRESSION METHOD, there
>> must be no remaining data compressed with it. But that's what the
>> patch is doing already - it enforces this using dependencies, as
>> usual.
>>
>> Ildus, can you explain what you meant? How could the data still be
>> decompressed after DROP COMPRESSION METHOD, and possibly after
>> removing the .so library?
>
> The removal of the .so library will broke all compressed tuples. I
> don't see a way to avoid it. I meant that DROP COMPRESSION METHOD could
> remove the record from 'pg_compression' table, but actually the
> compressed tuple needs only a record from 'pg_compression_opt' where
> its options are located. And there is dependency between an extension
> and the options so you can't just remove the extension without CASCADE,
> postgres will complain.
>

I don't think we need to do anything smart here - it should behave just
like dropping a data type, for example. That is, error out if there are
columns using the compression method (without CASCADE), and drop all the
columns (with CASCADE).

Leaving around the pg_compression_opt is not a solution. Not only it's
confusing and I'm not aware about any extension  because the user is
likely to remove the .so file (perhaps not directly, but e.g. by
removing the rpm package providing it).

> Still it's a problem if the user used for example `SELECT
> <compressed_column> INTO * FROM *` because postgres will copy compressed
> tuples, and there will not be any dependencies between destination and
> the options.
>

This seems like a rather fatal design flaw, though. I'd say we need to
force recompression of the data, in such cases. Otherwise all the
dependency tracking is rather pointless.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: Custom compression methods

Evgeniy Shishkin


On Nov 20, 2017, at 18:18, Tomas Vondra <[hidden email]> wrote:


I don't think we need to do anything smart here - it should behave just
like dropping a data type, for example. That is, error out if there are
columns using the compression method (without CASCADE), and drop all the
columns (with CASCADE).

What about instead of dropping column we leave data uncompressed?

12
Previous Thread Next Thread