Hello hackers!
I've attached a patch that implements custom compression methods. This patch is based on Nikita Glukhov's code (which he hasn't publish in mailing lists) for jsonb compression. This is early but working version of the patch, and there are still few fixes and features that should be implemented (like pg_dump support and support of compression options for types), and it requires more testing. But I'd like to get some feedback at the current stage first. There's been a proposal [1] of Alexander Korotkov and some discussion about custom compression methods before. This is an implementation of per-datum compression. Syntax is similar to the one in proposal but not the same. Syntax: CREATE COMPRESSION METHOD <cmname> HANDLER <compression_handler>; DROP COMPRESSION METHOD <cmname>; Compression handler is a function that returns a structure containing compression routines: - configure - function called when the compression method applied to an attribute - drop - called when the compression method is removed from an attribute - compress - compress function - decompress - decompress function User can create compressed columns with the commands below: CREATE TABLE t(a tsvector COMPRESSED <cmname> WITH <options>); ALTER TABLE t ALTER COLUMN a SET COMPRESSED <cmname> WITH <options>; ALTER TABLE t ALTER COLUMN a SET NOT COMPRESSED; Also there is syntax of binding compression methods to types: ALTER TYPE <type> SET COMPRESSED <cmname>; ALTER TYPE <type> SET NOT COMPRESSED; There are two new tables in the catalog, pg_compression and pg_compression_opt. pg_compression is used as storage of compression methods, and pg_compression_opt is used to store specific compression options for particular column. When user binds a compression method to some column a new record in pg_compression_opt is created and all further attribute values will contain compression options Oid while old values will remain unchanged. And when we alter a compression method for the attribute it won't change previous record in pg_compression_opt. Instead it'll create a new one and new values will be stored with new Oid. That way there is no need of recompression of the old tuples. And also tuples containing compressed datums can be copied to other tables so records in pg_compression_opt shouldn't be removed. In the current patch they can be removed with DROP COMPRESSION METHOD CASCADE, but after that decompression won't be possible on compressed tuples. Maybe CASCADE should keep compression options. I haven't changed the base logic of working with compressed datums. It means that custom compressed datums behave exactly the same as current LZ compressed datums, and the logic differs only in toast_compress_datum and toast_decompress_datum. This patch doesn't break backward compability and should work seamlessly with older version of database. I used one of two free bits in `va_rawsize` from `varattrib_4b->va_compressed` as flag of custom compressed datums. Also I renamed it to `va_info` since it contains not only rawsize now. The patch also includes custom compression method for tsvector which is used in tests. [1] https://www.postgresql.org/message-id/CAPpHfdsdTA5uZeq6MNXL5ZRuNx%2BSig4ykWzWEAfkC6ZKMDy6%3DQ%40mail.gmail.com -- --- Ildus Kurbangaliev Postgres Professional: http://www.postgrespro.com Russian Postgres Company -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
On Thu, 7 Sep 2017 19:42:36 +0300
Ildus Kurbangaliev <[hidden email]> wrote: > Hello hackers! > > I've attached a patch that implements custom compression > methods. This patch is based on Nikita Glukhov's code (which he hasn't > publish in mailing lists) for jsonb compression. This is early but > working version of the patch, and there are still few fixes and > features that should be implemented (like pg_dump support and support > of compression options for types), and it requires more testing. But > I'd like to get some feedback at the current stage first. > > There's been a proposal [1] of Alexander Korotkov and some discussion > about custom compression methods before. This is an implementation of > per-datum compression. Syntax is similar to the one in proposal but > not the same. > > Syntax: > > CREATE COMPRESSION METHOD <cmname> HANDLER <compression_handler>; > DROP COMPRESSION METHOD <cmname>; > > Compression handler is a function that returns a structure containing > compression routines: > > - configure - function called when the compression method applied to > an attribute > - drop - called when the compression method is removed from an > attribute > - compress - compress function > - decompress - decompress function > > User can create compressed columns with the commands below: > > CREATE TABLE t(a tsvector COMPRESSED <cmname> WITH <options>); > ALTER TABLE t ALTER COLUMN a SET COMPRESSED <cmname> WITH <options>; > ALTER TABLE t ALTER COLUMN a SET NOT COMPRESSED; > > Also there is syntax of binding compression methods to types: > > ALTER TYPE <type> SET COMPRESSED <cmname>; > ALTER TYPE <type> SET NOT COMPRESSED; > > There are two new tables in the catalog, pg_compression and > pg_compression_opt. pg_compression is used as storage of compression > methods, and pg_compression_opt is used to store specific compression > options for particular column. > > When user binds a compression method to some column a new record in > pg_compression_opt is created and all further attribute values will > contain compression options Oid while old values will remain > unchanged. And when we alter a compression method for > the attribute it won't change previous record in pg_compression_opt. > Instead it'll create a new one and new values will be stored > with new Oid. That way there is no need of recompression of the old > tuples. And also tuples containing compressed datums can be copied to > other tables so records in pg_compression_opt shouldn't be removed. In > the current patch they can be removed with DROP COMPRESSION METHOD > CASCADE, but after that decompression won't be possible on compressed > tuples. Maybe CASCADE should keep compression options. > > I haven't changed the base logic of working with compressed datums. It > means that custom compressed datums behave exactly the same as current > LZ compressed datums, and the logic differs only in > toast_compress_datum and toast_decompress_datum. > > This patch doesn't break backward compability and should work > seamlessly with older version of database. I used one of two free > bits in `va_rawsize` from `varattrib_4b->va_compressed` as flag of > custom compressed datums. Also I renamed it to `va_info` since it > contains not only rawsize now. > > The patch also includes custom compression method for tsvector which > is used in tests. > > [1] > https://www.postgresql.org/message-id/CAPpHfdsdTA5uZeq6MNXL5ZRuNx%2BSig4ykWzWEAfkC6ZKMDy6%3DQ%40mail.gmail.com code was simplified, and a separate cache for compression options was added. -- --- Ildus Kurbangaliev Postgres Professional: http://www.postgrespro.com Russian Postgres Company -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
On 9/12/17 10:55, Ildus Kurbangaliev wrote:
>> The patch also includes custom compression method for tsvector which >> is used in tests. >> >> [1] >> https://www.postgresql.org/message-id/CAPpHfdsdTA5uZeq6MNXL5ZRuNx%2BSig4ykWzWEAfkC6ZKMDy6%3DQ%40mail.gmail.com > Attached rebased version of the patch. Added support of pg_dump, the > code was simplified, and a separate cache for compression options was > added. I would like to see some more examples of how this would be used, so we can see how it should all fit together. So far, it's not clear to me that we need a compression method as a standalone top-level object. It would make sense, perhaps, to have a compression function attached to a type, so a type can provide a compression function that is suitable for its specific storage. The proposal here is very general: You can use any of the eligible compression methods for any attribute. That seems very complicated to manage. Any attribute could be compressed using either a choice of general compression methods or a type-specific compression method, or perhaps another type-specific compression method. That's a lot. Is this about packing certain types better, or trying out different compression algorithms, or about changing the TOAST thresholds, and so on? Ideally, we would like something that just works, with minimal configuration and nudging. Let's see a list of problems to be solved and then we can discuss what the right set of primitives might be to address them. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
On Wed, 1 Nov 2017 17:05:58 -0400
Peter Eisentraut <[hidden email]> wrote: > On 9/12/17 10:55, Ildus Kurbangaliev wrote: > >> The patch also includes custom compression method for tsvector > >> which is used in tests. > >> > >> [1] > >> https://www.postgresql.org/message-id/CAPpHfdsdTA5uZeq6MNXL5ZRuNx%2BSig4ykWzWEAfkC6ZKMDy6%3DQ%40mail.gmail.com > > Attached rebased version of the patch. Added support of pg_dump, the > > code was simplified, and a separate cache for compression options > > was added. > > I would like to see some more examples of how this would be used, so > we can see how it should all fit together. > > So far, it's not clear to me that we need a compression method as a > standalone top-level object. It would make sense, perhaps, to have a > compression function attached to a type, so a type can provide a > compression function that is suitable for its specific storage. In this patch compression methods is suitable for MAIN and EXTENDED storages like in current implementation in postgres. Just instead only of LZ4 you can specify any other compression method. Idea is not to change compression for some types, but give the user and extension developers opportunity to change how data in some attribute will be compressed because they know about it more than database itself. > > The proposal here is very general: You can use any of the eligible > compression methods for any attribute. That seems very complicated to > manage. Any attribute could be compressed using either a choice of > general compression methods or a type-specific compression method, or > perhaps another type-specific compression method. That's a lot. Is > this about packing certain types better, or trying out different > compression algorithms, or about changing the TOAST thresholds, and > so on? It is about extensibility of postgres, for example if you need to store a lot of time series data you can create an extension that stores array of timestamps in more optimized way, using delta encoding or something else. I'm not sure that such specialized things should be in core. In case of array of timestamps in could look like this: CREATE EXTENSION timeseries; -- some extension that provides compression method Extension installs a compression method: CREATE OR REPLACE FUNCTION timestamps_compression_handler(INTERNAL) RETURNS COMPRESSION_HANDLER AS 'MODULE_PATHNAME', 'timestamps_compression_handler' LANGUAGE C STRICT; CREATE COMPRESSION METHOD cm1 HANDLER timestamps_compression_handler; And user can specify it in his table: CREATE TABLE t1 ( time_series_data timestamp[] COMPRESSED cm1; ) I think generalization of some method to a type is not a good idea. For some attribute you could be happy with builtin LZ4, for other you can need more compressibility and so on. -- --- Ildus Kurbangaliev Postgres Professional: http://www.postgrespro.com Russian Postgres Company -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
In reply to this post by Ildus Kurbangaliev
On Tue, 12 Sep 2017 17:55:05 +0300
Ildus Kurbangaliev <[hidden email]> wrote: > > Attached rebased version of the patch. Added support of pg_dump, the > code was simplified, and a separate cache for compression options was > added. > Attached version 3 of the patch. Rebased to the current master, removed ALTER TYPE .. SET COMPRESSED syntax, fixed bug in compression options cache. -- --- Ildus Kurbangaliev Postgres Professional: http://www.postgrespro.com Russian Postgres Company -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
In reply to this post by Ildus Kurbangaliev
On 2 November 2017 at 17:41, Ildus Kurbangaliev
<[hidden email]> wrote: > In this patch compression methods is suitable for MAIN and EXTENDED > storages like in current implementation in postgres. Just instead only > of LZ4 you can specify any other compression method. We've had this discussion before. Please read the "pluggable compression support" thread. See you in a few days ;) sorry, it's kinda long. https://www.postgresql.org/message-id/flat/20130621000900.GA12425%40alap2.anarazel.de#20130621000900.GA12425@... IIRC there were some concerns about what happened with pg_upgrade, with consuming precious toast bits, and a few other things. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
On Thu, Nov 2, 2017 at 6:02 PM, Craig Ringer <[hidden email]> wrote:
> On 2 November 2017 at 17:41, Ildus Kurbangaliev > <[hidden email]> wrote: > >> In this patch compression methods is suitable for MAIN and EXTENDED >> storages like in current implementation in postgres. Just instead only >> of LZ4 you can specify any other compression method. > > We've had this discussion before. > > Please read the "pluggable compression support" thread. See you in a > few days ;) sorry, it's kinda long. > > https://www.postgresql.org/message-id/flat/20130621000900.GA12425%40alap2.anarazel.de#20130621000900.GA12425@... > the proposed patch provides "pluggable" compression and let's user decide by their own which algorithm to use. The postgres core doesn't responsible for any patent problem. > IIRC there were some concerns about what happened with pg_upgrade, > with consuming precious toast bits, and a few other things. yes, pg_upgrade may be a problem. > > -- > Craig Ringer http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services > > > -- > Sent via pgsql-hackers mailing list ([hidden email]) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
On Sun, Nov 5, 2017 at 2:22 PM, Oleg Bartunov <[hidden email]> wrote:
>> IIRC there were some concerns about what happened with pg_upgrade, >> with consuming precious toast bits, and a few other things. > > yes, pg_upgrade may be a problem. A basic problem here is that, as proposed, DROP COMPRESSION METHOD may break your database irretrievably. If there's no data compressed using the compression method you dropped, everything is cool - otherwise everything is broken and there's no way to recover. The only obvious alternative is to disallow DROP altogether (or make it not really DROP). Both of those alternatives sound fairly unpleasant to me, but I'm not exactly sure what to recommend in terms of how to make it better. Ideally anything we expose as an SQL command should have a DROP command that undoes whatever CREATE did and leaves the database in an intact state, but that seems hard to achieve in this case. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
> If there's no data compressed
> using the compression method you dropped, everything is cool - > otherwise everything is broken and there's no way to recover. > The only obvious alternative is to disallow DROP altogether (or make it > not really DROP). Wouldn't whatever was using the compression method have something marking which method was used? If so, couldn't we just scan if there is any data using it, and if so disallow the drop, or possibly an option to allow the drop and rewrite the table either uncompressed, or with the default compression method? -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
In reply to this post by Robert Haas
Robert Haas <[hidden email]> writes:
> A basic problem here is that, as proposed, DROP COMPRESSION METHOD may > break your database irretrievably. If there's no data compressed > using the compression method you dropped, everything is cool - > otherwise everything is broken and there's no way to recover. The > only obvious alternative is to disallow DROP altogether (or make it > not really DROP). > Both of those alternatives sound fairly unpleasant to me, but I'm not > exactly sure what to recommend in terms of how to make it better. > Ideally anything we expose as an SQL command should have a DROP > command that undoes whatever CREATE did and leaves the database in an > intact state, but that seems hard to achieve in this case. If the use of a compression method is tied to specific data types and/or columns, then each of those could have a dependency on the compression method, forcing a type or column drop if you did DROP COMPRESSION METHOD. That would leave no reachable data using the removed compression method. So that part doesn't seem unworkable on its face. IIRC, the bigger concerns in the last discussion had to do with replication, ie, can downstream servers make sense of the data. Maybe that's not any worse than the issues you get with non-core index AMs, but I'm not sure. regards, tom lane -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
In reply to this post by Craig Ringer-3
On Thu, 2 Nov 2017 23:02:34 +0800
Craig Ringer <[hidden email]> wrote: > On 2 November 2017 at 17:41, Ildus Kurbangaliev > <[hidden email]> wrote: > > > In this patch compression methods is suitable for MAIN and EXTENDED > > storages like in current implementation in postgres. Just instead > > only of LZ4 you can specify any other compression method. > > We've had this discussion before. > > Please read the "pluggable compression support" thread. See you in a > few days ;) sorry, it's kinda long. > > https://www.postgresql.org/message-id/flat/20130621000900.GA12425%40alap2.anarazel.de#20130621000900.GA12425@... > > IIRC there were some concerns about what happened with pg_upgrade, > with consuming precious toast bits, and a few other things. > Thank you for the link, I didn't see that thread when I looked over mailing lists. I read it briefly, and I can address few things relating to my patch. Most concerns have been related with legal issues. Actually that was the reason I did not include any new compression algorithms to my patch. Unlike that patch mine only provides syntax and is just a way to give the users use their own compression algorithms and deal with any legal issues themselves. I use only one unused bit in header (there's still one free ;), that's enough to determine that data is compressed or not. I did found out that pg_upgrade doesn't work properly with my patch, soon I will send fix for it. -- --- Ildus Kurbangaliev Postgres Professional: http://www.postgrespro.com Russian Postgres Company -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
In reply to this post by Ildus Kurbangaliev
On Thu, 2 Nov 2017 15:28:36 +0300
Ildus Kurbangaliev <[hidden email]> wrote: > On Tue, 12 Sep 2017 17:55:05 +0300 > Ildus Kurbangaliev <[hidden email]> wrote: > > > > > Attached rebased version of the patch. Added support of pg_dump, the > > code was simplified, and a separate cache for compression options > > was added. > > > > Attached version 3 of the patch. Rebased to the current master, > removed ALTER TYPE .. SET COMPRESSED syntax, fixed bug in compression > options cache. > -- --- Ildus Kurbangaliev Postgres Professional: http://www.postgrespro.com Russian Postgres Company |
In reply to this post by Robert Haas
On Sun, 5 Nov 2017 17:34:23 -0500
Robert Haas <[hidden email]> wrote: > On Sun, Nov 5, 2017 at 2:22 PM, Oleg Bartunov <[hidden email]> > wrote: > >> IIRC there were some concerns about what happened with pg_upgrade, > >> with consuming precious toast bits, and a few other things. > > > > yes, pg_upgrade may be a problem. > > A basic problem here is that, as proposed, DROP COMPRESSION METHOD may > break your database irretrievably. If there's no data compressed > using the compression method you dropped, everything is cool - > otherwise everything is broken and there's no way to recover. The > only obvious alternative is to disallow DROP altogether (or make it > not really DROP). In the patch I use separate table for compresssion options (because each attribute can have additional options for compression). So basicly compressed attribute linked to compression options, not the compression method and this method can be safely dropped. So in the next version of the patch I can just unlink the options from compression methods and dropping compression method will not affect already compressed tuples. They still could be decompressed. -- --- Ildus Kurbangaliev Postgres Professional: http://www.postgrespro.com Russian Postgres Company |
On Wed, Nov 15, 2017 at 4:09 AM, Ildus Kurbangaliev
<[hidden email]> wrote: > So in the next version of the patch I can just unlink the options from > compression methods and dropping compression method will not affect > already compressed tuples. They still could be decompressed. I guess I don't understand how that can work. I mean, if somebody removes a compression method - i.e. uninstalls the library - and you don't have a way to make sure there are no tuples that can only be uncompressed by that library - then you've broken the database. Ideally, there should be a way to add a new compression method via an extension ... and then get rid of it and all dependencies thereupon. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company |
In reply to this post by Ildus Kurbangaliev
Hi Ildus,
On 14.11.2017 16:23, Ildus Kurbangaliev wrote: > On Thu, 2 Nov 2017 15:28:36 +0300 Ildus Kurbangaliev > <[hidden email]> wrote: > >> On Tue, 12 Sep 2017 17:55:05 +0300 Ildus Kurbangaliev >> <[hidden email]> wrote: >> >>> >>> Attached rebased version of the patch. Added support of pg_dump, >>> the code was simplified, and a separate cache for compression >>> options was added. >>> >> >> Attached version 3 of the patch. Rebased to the current master, >> removed ALTER TYPE .. SET COMPRESSED syntax, fixed bug in >> compression options cache. >> > > Attached version 4 of the patch. Fixed pg_upgrade and few other > bugs. > I've started to review your code. And even though it's fine overall I have few questions and comments (aside from DROP COMPRESSION METHOD discussion). 1. I'm not sure about proposed syntax for ALTER TABLE command: >> ALTER TABLE t ALTER COLUMN a SET COMPRESSED <cmname> WITH >> (<options>); ALTER TABLE t ALTER COLUMN a SET NOT COMPRESSED; ISTM it is more common for Postgres to use syntax like SET/DROP for column options (SET/DROP NOT NULL, DEFAULT etc). My suggestion would be: ALTER TABLE t ALTER COLUMN a SET COMPRESSED USING <compression_method> WITH (<options>); ALTER TABLE t ALTER COLUMN a DROP COMPRESSED; (keyword USING here is similar to "CREATE INDEX ... USING <method>" syntax) 2. The way you changed DefineRelation() implies that caller is responsible for creation of compression options. Probably it would be better to create them within DefineRelation(). 3. Few minor issues which seem like obsolete code: Function freeRelOptions() is defined but never used. Function getBaseTypeTuple() has been extracted from getBaseTypeAndTypmod() but never used separately. In toast_flatten_tuple_to_datum() there is untoasted_value variable which is only used for meaningless assignment. (Should I send a patch for that kind of issues?) -- Ildar Musin [hidden email] |
In reply to this post by Ildus Kurbangaliev
Hi,
On 11/14/2017 02:23 PM, Ildus Kurbangaliev wrote: > > ... > > Attached version 4 of the patch. Fixed pg_upgrade and few other bugs. > I did a review of this today, and I think there are some things that need improvement / fixing. Firstly, some basic comments from just eye-balling the diff, then some bugs I discovered after writing an extension adding lz4. 1) formatRelOptions/freeRelOptions are no longer needed (I see Ildar already pointer that out) 2) There's unnecessary whitespace (extra newlines) on a couple of places, which is needlessly increasing the size of the patch. Small difference, but annoying. 3) tuptoaster.c Why do you change 'info' from int32 to uint32? Seems unnecessary. Adding new 'att' variable in toast_insert_or_update is confusing, as there already is 'att' in the very next loop. Technically it's correct, but I'd bet it'll lead to some WTF?! moments later. I propose to just use TupleDescAttr(tupleDesc,i) on the two places where it matters, around line 808. There are no comments for init_compression_options_htab and get_compression_options_info, so that needs to be fixed. Moreover, the names are confusing because what we really get is not just 'options' but the compression routines too. 4) gen_db_file_maps probably shouldn't do the fprints, right? 5) not sure why you modify src/tools/pgindent/exclude_file_patterns 6) I'm rather confused by AttributeCompression vs. ColumnCompression. I mean, attribute==column, right? Of course, one is for data from parser, the other one is for internal info. But can we make the naming clearer? 7) The docs in general are somewhat unsatisfactory, TBH. For example the ColumnCompression has no comments, unlike everything else in parsenodes. Similarly for the SGML docs - I suggest to expand them to resemble FDW docs (https://www.postgresql.org/docs/10/static/fdwhandler.html) which also follows the handler/routines pattern. 8) One of the unclear things if why we even need 'drop' routing. It seems that if it's defined DropAttributeCompression does something. But what should it do? I suppose dropping the options should be done using dependencies (just like we drop columns in this case). BTW why does DropAttributeCompression mess with att->attisdropped in this way? That seems a bit odd. 9) configure routines that only check if (options != NIL) and then error out (like tsvector_configure) seem a bit unnecessary. Just allow it to be NULL in CompressionMethodRoutine, and throw an error if options is not NIL for such compression method. 10) toast_compress_datum still does this: if (!ac && (valsize < PGLZ_strategy_default->min_input_size || valsize > PGLZ_strategy_default->max_input_size)) which seems rather pglz-specific (the naming is a hint). Why shouldn't this be specific to compression, exposed either as min/max constants, or wrapped in another routine - size_is_valid() or something like that? 11) The comments in toast_compress_datum probably need updating, as it still references to pglz specifically. I guess the new compression methods do matter too. 12) get_compression_options_info organizes the compression info into a hash table by OID. The hash table implementation assumes the hash key is at the beginning of the entry, but AttributeCompression is defined like this: typedef struct { CompressionMethodRoutine *routine; List *options; Oid cmoptoid; } AttributeCompression; Which means get_compression_options_info is busted, will never lookup anything, and the hash table will grow by adding more and more entries into the same bucket. Of course, this has extremely negative impact on performance (pretty much arbitrarily bad, depending on how many entries you've already added to the hash table). Moving the OID to the beginning of the struct fixes the issue. 13) When writing the experimental extension, I was extremely confused about the regular varlena headers, custom compression headers, etc. In the end I stole the code from tsvector.c and whacked it a bit until it worked, but I wouldn't dare to claim I understand how it works. This needs to be documented somewhere. For example postgres.h has a bunch of paragraphs about varlena headers, so perhaps it should be there? I see the patch tweaks some of the constants, but does not update the comment at all. Perhaps it would be useful to provide some additional macros making access to custom-compressed varlena values easier. Or perhaps the VARSIZE_ANY / VARSIZE_ANY_EXHDR / VARDATA_ANY already support that? This part is not very clear to me. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services |
In reply to this post by Robert Haas
On 11/15/2017 02:13 PM, Robert Haas wrote: > On Wed, Nov 15, 2017 at 4:09 AM, Ildus Kurbangaliev > <[hidden email]> wrote: >> So in the next version of the patch I can just unlink the options from >> compression methods and dropping compression method will not affect >> already compressed tuples. They still could be decompressed. > > I guess I don't understand how that can work. I mean, if somebody > removes a compression method - i.e. uninstalls the library - and you > don't have a way to make sure there are no tuples that can only be > uncompressed by that library - then you've broken the database. > Ideally, there should be a way to add a new compression method via an > extension ... and then get rid of it and all dependencies thereupon. > I share your confusion. Once you do DROP COMPRESSION METHOD, there must be no remaining data compressed with it. But that's what the patch is doing already - it enforces this using dependencies, as usual. Ildus, can you explain what you meant? How could the data still be decompressed after DROP COMPRESSION METHOD, and possibly after removing the .so library? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services |
On Mon, 20 Nov 2017 00:23:23 +0100
Tomas Vondra <[hidden email]> wrote: > On 11/15/2017 02:13 PM, Robert Haas wrote: > > On Wed, Nov 15, 2017 at 4:09 AM, Ildus Kurbangaliev > > <[hidden email]> wrote: > >> So in the next version of the patch I can just unlink the options > >> from compression methods and dropping compression method will not > >> affect already compressed tuples. They still could be > >> decompressed. > > > > I guess I don't understand how that can work. I mean, if somebody > > removes a compression method - i.e. uninstalls the library - and you > > don't have a way to make sure there are no tuples that can only be > > uncompressed by that library - then you've broken the database. > > Ideally, there should be a way to add a new compression method via > > an extension ... and then get rid of it and all dependencies > > thereupon. > > I share your confusion. Once you do DROP COMPRESSION METHOD, there > must be no remaining data compressed with it. But that's what the > patch is doing already - it enforces this using dependencies, as > usual. > > Ildus, can you explain what you meant? How could the data still be > decompressed after DROP COMPRESSION METHOD, and possibly after > removing the .so library? The removal of the .so library will broke all compressed tuples. I don't see a way to avoid it. I meant that DROP COMPRESSION METHOD could remove the record from 'pg_compression' table, but actually the compressed tuple needs only a record from 'pg_compression_opt' where its options are located. And there is dependency between an extension and the options so you can't just remove the extension without CASCADE, postgres will complain. Still it's a problem if the user used for example `SELECT <compressed_column> INTO * FROM *` because postgres will copy compressed tuples, and there will not be any dependencies between destination and the options. Also thank you for review. I will look into it today. -- --- Ildus Kurbangaliev Postgres Professional: http://www.postgrespro.com Russian Postgres Company |
On 11/20/2017 10:44 AM, Ildus Kurbangaliev wrote: > On Mon, 20 Nov 2017 00:23:23 +0100 > Tomas Vondra <[hidden email]> wrote: > >> On 11/15/2017 02:13 PM, Robert Haas wrote: >>> On Wed, Nov 15, 2017 at 4:09 AM, Ildus Kurbangaliev >>> <[hidden email]> wrote: >>>> So in the next version of the patch I can just unlink the options >>>> from compression methods and dropping compression method will not >>>> affect already compressed tuples. They still could be >>>> decompressed. >>> >>> I guess I don't understand how that can work. I mean, if somebody >>> removes a compression method - i.e. uninstalls the library - and you >>> don't have a way to make sure there are no tuples that can only be >>> uncompressed by that library - then you've broken the database. >>> Ideally, there should be a way to add a new compression method via >>> an extension ... and then get rid of it and all dependencies >>> thereupon. >> >> I share your confusion. Once you do DROP COMPRESSION METHOD, there >> must be no remaining data compressed with it. But that's what the >> patch is doing already - it enforces this using dependencies, as >> usual. >> >> Ildus, can you explain what you meant? How could the data still be >> decompressed after DROP COMPRESSION METHOD, and possibly after >> removing the .so library? > > The removal of the .so library will broke all compressed tuples. I > don't see a way to avoid it. I meant that DROP COMPRESSION METHOD could > remove the record from 'pg_compression' table, but actually the > compressed tuple needs only a record from 'pg_compression_opt' where > its options are located. And there is dependency between an extension > and the options so you can't just remove the extension without CASCADE, > postgres will complain. > I don't think we need to do anything smart here - it should behave just like dropping a data type, for example. That is, error out if there are columns using the compression method (without CASCADE), and drop all the columns (with CASCADE). Leaving around the pg_compression_opt is not a solution. Not only it's confusing and I'm not aware about any extension because the user is likely to remove the .so file (perhaps not directly, but e.g. by removing the rpm package providing it). > Still it's a problem if the user used for example `SELECT > <compressed_column> INTO * FROM *` because postgres will copy compressed > tuples, and there will not be any dependencies between destination and > the options. > This seems like a rather fatal design flaw, though. I'd say we need to force recompression of the data, in such cases. Otherwise all the dependency tracking is rather pointless. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services |
What about instead of dropping column we leave data uncompressed? |
Free forum by Nabble | Edit this page |