libpq compression

classic Classic list List threaded Threaded
52 messages Options
123
Reply | Threaded
Open this post in threaded view
|

libpq compression

konstantin knizhnik
Hi hackers,

One of our customers was managed to improve speed about 10 times by using SSL compression for the system where client and servers are located in different geographical regions
and query results are very large because of JSON columns. Them actually do not need encryption, just compression.
I expect that it is not the only case where compression of libpq protocol can be useful. Please notice that Postgres replication is also using libpq protocol.

Taken in account that vulnerability was found in SSL compression and so SSLComppression is considered to be deprecated and insecure (http://www.postgresql-archive.org/disable-SSL-compression-td6010072.html), it will be nice to have some alternative mechanism of reducing  libpq traffic.

I have implemented some prototype implementation of it (patch is attached).
To use zstd compression, Postgres should be configured with --with-zstd. Otherwise compression will use zlib unless it is disabled by --without-zlib option.
I have added compression=on/off parameter to connection string and -Z option to psql and pgbench utilities.
Below are some results:

Compression ratio (raw->compressed):


libz (level=1)
libzstd (level=1)
pgbench -i -s 10
16997209->2536330
16997209->268077
pgbench -t 100000 -S
6289036->1523862
6600338<-900293
6288933->1777400
6600338<-1000318

There is no mistyping: libzstd compress COPY data about 10 times better than libz, with wonderful compression ratio 63.

Influence on execution time is minimal (I have tested local configuration when client and server are at the same host):


no compression
libz (level=1)
libzstd (level=1)
pgbench -i -s 10
1.552
1.572
1.611
pgbench -t 100000 -S
4.482
4.926
4.877

    
-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

libpq-compression-2.patch (35K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: libpq compression

Dmitry Dolgov
> On 30 March 2018 at 14:53, Konstantin Knizhnik <[hidden email]> wrote:
> Hi hackers,
> One of our customers was managed to improve speed about 10 times by using SSL compression for the system where client and servers are located in different geographical regions
> and query results are very large because of JSON columns. Them actually do not need encryption, just compression.
> I expect that it is not the only case where compression of libpq protocol can be useful. Please notice that Postgres replication is also using libpq protocol.
>
> Taken in account that vulnerability was found in SSL compression and so SSLComppression is considered to be deprecated and insecure (http://www.postgresql-archive.org/disable-SSL-compression-td6010072.html), it will be nice to have some alternative mechanism of reducing  libpq traffic.
>
> I have implemented some prototype implementation of it (patch is attached).
> To use zstd compression, Postgres should be configured with --with-zstd. Otherwise compression will use zlib unless it is disabled by --without-zlib option.
> I have added compression=on/off parameter to connection string and -Z option to psql and pgbench utilities.

I'm a bit confused why there was no reply to this. I mean, it wasn't sent on
1st April, the patch still can be applied on top of the master branch and looks
like it even works.

I assume the main concern her is that it's implemented in a rather not
extensible way. Also, if I understand correctly, it compresses the data stream
in both direction server <-> client, not sure if there is any value in
compressing what a client sends to a server. But still I'm wondering why it
didn't start at least a discussion about how it can be implemented. Do I miss
something?
Reply | Threaded
Open this post in threaded view
|

Re: libpq compression

konstantin knizhnik


On 15.05.2018 13:23, Dmitry Dolgov wrote:
> On 30 March 2018 at 14:53, Konstantin Knizhnik <[hidden email]> wrote:
> Hi hackers,
> One of our customers was managed to improve speed about 10 times by using SSL compression for the system where client and servers are located in different geographical regions
> and query results are very large because of JSON columns. Them actually do not need encryption, just compression.
> I expect that it is not the only case where compression of libpq protocol can be useful. Please notice that Postgres replication is also using libpq protocol.
>
> Taken in account that vulnerability was found in SSL compression and so SSLComppression is considered to be deprecated and insecure (http://www.postgresql-archive.org/disable-SSL-compression-td6010072.html), it will be nice to have some alternative mechanism of reducing  libpq traffic.
>
> I have implemented some prototype implementation of it (patch is attached).
> To use zstd compression, Postgres should be configured with --with-zstd. Otherwise compression will use zlib unless it is disabled by --without-zlib option.
> I have added compression=on/off parameter to connection string and -Z option to psql and pgbench utilities.

I'm a bit confused why there was no reply to this. I mean, it wasn't sent on
1st April, the patch still can be applied on top of the master branch and looks
like it even works.

I assume the main concern her is that it's implemented in a rather not
extensible way. Also, if I understand correctly, it compresses the data stream
in both direction server <-> client, not sure if there is any value in
compressing what a client sends to a server. But still I'm wondering why it
didn't start at least a discussion about how it can be implemented. Do I miss
something?

Implementation of libpq compression will be included in next release of PgProEE.
Looks like community is not so interested in this patch. Frankly speaking I do not understand why.
Compression of libpq traffic can significantly increase speed of:
1. COPY
2. Replication (both streaming and logical)
3. Queries returning large results sets (for example JSON) through slow connections.

It is possible to compress libpq traffic using SSL compression.  But SSL compression is unsafe and deteriorated feature.

Yes, this patch is not extensible: it can use either zlib either zstd. Unfortunately internal Postgres compression pglz doesn't provide streaming API.
May be it is good idea to combine it with Ildus patch (custom compression methods): https://commitfest.postgresql.org/18/1294/
In this case it will be possible to use any custom compression algorithm. But we need to design and implement streaming API for pglz and other compressors.


-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 
Reply | Threaded
Open this post in threaded view
|

Re: libpq compression

Andrew Dunstan-8


On 05/15/2018 08:53 AM, Konstantin Knizhnik wrote:

>
>
> On 15.05.2018 13:23, Dmitry Dolgov wrote:
>> > On 30 March 2018 at 14:53, Konstantin Knizhnik
>> <[hidden email] <mailto:[hidden email]>> wrote:
>> > Hi hackers,
>> > One of our customers was managed to improve speed about 10 times by
>> using SSL compression for the system where client and servers are
>> located in different geographical regions
>> > and query results are very large because of JSON columns. Them
>> actually do not need encryption, just compression.
>> > I expect that it is not the only case where compression of libpq
>> protocol can be useful. Please notice that Postgres replication is
>> also using libpq protocol.
>> >
>> > Taken in account that vulnerability was found in SSL compression
>> and so SSLComppression is considered to be deprecated and insecure
>> (http://www.postgresql-archive.org/disable-SSL-compression-td6010072.html),
>> it will be nice to have some alternative mechanism of reducing  libpq
>> traffic.
>> >
>> > I have implemented some prototype implementation of it (patch is
>> attached).
>> > To use zstd compression, Postgres should be configured with
>> --with-zstd. Otherwise compression will use zlib unless it is
>> disabled by --without-zlib option.
>> > I have added compression=on/off parameter to connection string and
>> -Z option to psql and pgbench utilities.
>>
>> I'm a bit confused why there was no reply to this. I mean, it wasn't
>> sent on
>> 1st April, the patch still can be applied on top of the master branch
>> and looks
>> like it even works.
>>
>> I assume the main concern her is that it's implemented in a rather not
>> extensible way. Also, if I understand correctly, it compresses the
>> data stream
>> in both direction server <-> client, not sure if there is any value in
>> compressing what a client sends to a server. But still I'm wondering
>> why it
>> didn't start at least a discussion about how it can be implemented.
>> Do I miss
>> something?
>
> Implementation of libpq compression will be included in next release
> of PgProEE.
> Looks like community is not so interested in this patch. Frankly
> speaking I do not understand why.
> Compression of libpq traffic can significantly increase speed of:
> 1. COPY
> 2. Replication (both streaming and logical)
> 3. Queries returning large results sets (for example JSON) through
> slow connections.
>
> It is possible to compress libpq traffic using SSL compression. But
> SSL compression is unsafe and deteriorated feature.
>
> Yes, this patch is not extensible: it can use either zlib either zstd.
> Unfortunately internal Postgres compression pglz doesn't provide
> streaming API.
> May be it is good idea to combine it with Ildus patch (custom
> compression methods): https://commitfest.postgresql.org/18/1294/
> In this case it will be possible to use any custom compression
> algorithm. But we need to design and implement streaming API for pglz
> and other compressors.
>
>


I'm sure there is plenty of interest in this. However, you guys need to
understand where we are in the development cycle. We're trying to wrap
up Postgres 11, which was feature frozen before this patch ever landed.
So it's material for Postgres 12. That means it will probably need to
wait a little while before it gets attention. It doesn't mean nobody is
interested.

I disagree with Dmitry about compressing in both directions - I can
think of plenty of good cases where we would want to compress traffic
from the client.

cheers

andrew


--
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: libpq compression

Craig Ringer-3
On 15 May 2018 at 21:36, Andrew Dunstan <[hidden email]> wrote:


> To use zstd compression, Postgres should be configured with --with-zstd. Otherwise compression will use zlib unless it is disabled by --without-zlib option.
> I have added compression=on/off parameter to connection string and -Z option to psql and pgbench utilities.

I'm a bit confused why there was no reply to this. I mean, it wasn't sent on
1st April, the patch still can be applied on top of the master branch and looks
like it even works.

I assume the main concern her is that it's implemented in a rather not
extensible way. Also, if I understand correctly, it compresses the data stream
in both direction server <-> client, not sure if there is any value in
compressing what a client sends to a server. But still I'm wondering why it
didn't start at least a discussion about how it can be implemented. Do I miss
something?

Implementation of libpq compression will be included in next release of PgProEE.
Looks like community is not so interested in this patch. Frankly speaking I do not understand why.

I'm definitely very interested, and simply missed the post.

I'll talk with some team mates as we're doing some PG12 planning now.
 
Yes, this patch is not extensible: it can use either zlib either zstd. Unfortunately internal Postgres compression pglz doesn't provide streaming API.
May be it is good idea to combine it with Ildus patch (custom compression methods): https://commitfest.postgresql.org/18/1294/

Given the history of issues with attempting custom/pluggable compression for toast etc, I really wouldn't want to couple those up.

pglz wouldn't make much sense for protocol compression anyway, except maybe for fast local links where it was worth a slight compression overhead but not the cpu needed for gzip. I don't think it's too exciting. zlib/gzip is likely the sweet spot for the reasonable future for protocol compression, or a heck of a lot better than what we have, anyway.

We should make sure the protocol part is extensible, but the implementation doesn't need to be pluggable.

In this case it will be possible to use any custom compression algorithm. But we need to design and implement streaming API for pglz and other compressors.

I'm sure there is plenty of interest in this. However, you guys need to understand where we are in the development cycle. We're trying to wrap up Postgres 11, which was feature frozen before this patch ever landed. So it's material for Postgres 12. That means it will probably need to wait a little while before it gets attention. It doesn't mean nobody is interested.

I disagree with Dmitry about compressing in both directions - I can think of plenty of good cases where we would want to compress traffic from the client.

Agreed. The most obvious case being COPY, but there's also big bytea values, etc. 


--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
Reply | Threaded
Open this post in threaded view
|

Re: libpq compression

Euler Taveira
In reply to this post by konstantin knizhnik
2018-05-15 9:53 GMT-03:00 Konstantin Knizhnik <[hidden email]>:
> Looks like community is not so interested in this patch. Frankly speaking I
> do not understand why.
>
AFAICS the lack of replies is due to feature freeze. I'm pretty sure
people are interested in this topic (at least I am). Did you review a
previous discussion [1] about this?

I did a prototype a few years ago. I didn't look at your patch yet.
I'll do in a few weeks. Please add your patch to the next CF [2].


[1] https://www.postgresql.org/message-id/4FD9698F.2090407%40timbira.com
[2] https://commitfest.postgresql.org/18/


--
   Euler Taveira                                   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

Reply | Threaded
Open this post in threaded view
|

Re: libpq compression

Grigory Smolkin
In reply to this post by konstantin knizhnik

Hello!
I have noticed that psql --help lack -Z|--compression option.
Also it would be nice to have option like --compression-level in psql and pgbench.


On 03/30/2018 03:53 PM, Konstantin Knizhnik wrote:
Hi hackers,

One of our customers was managed to improve speed about 10 times by using SSL compression for the system where client and servers are located in different geographical regions
and query results are very large because of JSON columns. Them actually do not need encryption, just compression.
I expect that it is not the only case where compression of libpq protocol can be useful. Please notice that Postgres replication is also using libpq protocol.

Taken in account that vulnerability was found in SSL compression and so SSLComppression is considered to be deprecated and insecure (http://www.postgresql-archive.org/disable-SSL-compression-td6010072.html), it will be nice to have some alternative mechanism of reducing  libpq traffic.

I have implemented some prototype implementation of it (patch is attached).
To use zstd compression, Postgres should be configured with --with-zstd. Otherwise compression will use zlib unless it is disabled by --without-zlib option.
I have added compression=on/off parameter to connection string and -Z option to psql and pgbench utilities.
Below are some results:

Compression ratio (raw->compressed):


libz (level=1)
libzstd (level=1)
pgbench -i -s 10
16997209->2536330
16997209->268077
pgbench -t 100000 -S
6289036->1523862
6600338<-900293
6288933->1777400
6600338<-1000318

There is no mistyping: libzstd compress COPY data about 10 times better than libz, with wonderful compression ratio 63.

Influence on execution time is minimal (I have tested local configuration when client and server are at the same host):


no compression
libz (level=1)
libzstd (level=1)
pgbench -i -s 10
1.552
1.572
1.611
pgbench -t 100000 -S
4.482
4.926
4.877
-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

-- 
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Reply | Threaded
Open this post in threaded view
|

Re: libpq compression

konstantin knizhnik


On 16.05.2018 18:09, Grigory Smolkin wrote:
>
> Hello!
> I have noticed that psql --help lack -Z|--compression option.
> Also it would be nice to have option like --compression-level in psql
> and pgbench.
>
Thank you for this notice.
Updated and rebased patch is attached.
Concerning specification of compression level: I have made many
experiments with different data sets and both zlib/zstd and in both
cases using compression level higher than default doesn't cause some
noticeable increase of compression ratio, but quite significantly reduce
speed. Moreover, for "pgbench -i" zstd provides better compression ratio
(63 times!) with compression level 1 than with with largest recommended
compression level 22! This is why I decided not to allow user to choose
compression level.

libpq-compression-3.patch (36K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: libpq compression

Thomas Munro-3
On Thu, May 17, 2018 at 3:54 AM, Konstantin Knizhnik
<[hidden email]> wrote:
> Thank you for this notice.
> Updated and rebased patch is attached.

Hi Konstantin,

Seems very useful.  +1.

+ rc = inflate(&zs->rx, Z_SYNC_FLUSH);
+ if (rc != Z_OK)
+ {
+ return ZPQ_DECOMPRESS_ERROR;
+ }

Does this actually guarantee that zs->rx.msg is set to a string?  I
looked at some documentation here:

https://www.zlib.net/manual.html

It looks like return value Z_DATA_ERROR means that msg is set, but for
the other error codes Z_STREAM_ERROR, Z_BUF_ERROR, Z_MEM_ERROR it
doesn't explicitly say that.  From a casual glance at
https://github.com/madler/zlib/blob/master/inflate.c I think it might
be set to Z_NULL and then never set to a string except in the mode =
BAD paths that produce the Z_DATA_ERROR return code.  That's
interesting because later we do this:

+ if (r == ZPQ_DECOMPRESS_ERROR)
+ {
+ ereport(COMMERROR,
+ (errcode_for_socket_access(),
+ errmsg("Failed to decompress data: %s", zpq_error(PqStream))));
+ return EOF;

... where zpq_error() returns zs->rx.msg.  That might crash or show
"(null)" depending on libc.

Also, message style: s/F/f/

+ssize_t zpq_read(ZpqStream* zs, void* buf, size_t size, size_t* processed)

Code style:  We write "Type *foo", not "Type* var".  We put the return
type of a function definition on its own line.

It looks like there is at least one place where zpq_stream.o's symbols
are needed but it isn't being linked in, so the build fails in some
ecpg stuff reached by make check-world:

gcc -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
-fwrapv -fexcess-precision=standard -g -O2 -pthread -D_REENTRANT
-D_THREAD_SAFE -D_POSIX_PTHREAD_SEMANTICS test1.o
-L../../../../../src/port -L../../../../../src/common -L../../ecpglib
-lecpg -L../../pgtypeslib -lpgtypes
-L../../../../../src/interfaces/libpq -lpq   -Wl,--as-needed
-Wl,-rpath,'/usr/local/pgsql/lib',--enable-new-dtags  -lpgcommon
-lpgport -lpthread -lz -lreadline -lrt -lcrypt -ldl -lm   -o test1
../../../../../src/interfaces/libpq/libpq.so: undefined reference to `zpq_free'
../../../../../src/interfaces/libpq/libpq.so: undefined reference to `zpq_error'
../../../../../src/interfaces/libpq/libpq.so: undefined reference to `zpq_read'
../../../../../src/interfaces/libpq/libpq.so: undefined reference to
`zpq_buffered'
../../../../../src/interfaces/libpq/libpq.so: undefined reference to
`zpq_create'
../../../../../src/interfaces/libpq/libpq.so: undefined reference to `zpq_write'

--
Thomas Munro
http://www.enterprisedb.com

Reply | Threaded
Open this post in threaded view
|

Re: libpq compression

Thomas Munro-3
In reply to this post by konstantin knizhnik
On Thu, May 17, 2018 at 3:54 AM, Konstantin Knizhnik
<[hidden email]> wrote:
> Concerning specification of compression level: I have made many experiments
> with different data sets and both zlib/zstd and in both cases using
> compression level higher than default doesn't cause some noticeable increase
> of compression ratio, but quite significantly reduce speed. Moreover, for
> "pgbench -i" zstd provides better compression ratio (63 times!) with
> compression level 1 than with with largest recommended compression level 22!
> This is why I decided not to allow user to choose compression level.

Speaking of configuration, are you planning to support multiple
compression libraries at the same time?  It looks like the current
patch implicitly requires client and server to use the same configure
option, without any attempt to detect or negotiate.  Do I guess
correctly that a library mismatch would produce an incomprehensible
corrupt stream message?

--
Thomas Munro
http://www.enterprisedb.com

Reply | Threaded
Open this post in threaded view
|

Re: libpq compression

Michael Paquier-2
On Tue, Jun 05, 2018 at 06:04:21PM +1200, Thomas Munro wrote:
> On Thu, May 17, 2018 at 3:54 AM, Konstantin Knizhnik
> Speaking of configuration, are you planning to support multiple
> compression libraries at the same time?  It looks like the current
> patch implicitly requires client and server to use the same configure
> option, without any attempt to detect or negotiate.  Do I guess
> correctly that a library mismatch would produce an incomprehensible
> corrupt stream message?

I just had a quick look at this patch, lured by the smell of your latest
messages...  And it seems to me that this patch needs a heavy amount of
work as presented.  There are a couple of things which are not really
nice, like forcing the presentation of the compression option in the
startup packet to begin with.  The high-jacking around secure_read() is
not nice either as it is aimed at being a rather high-level API on top
of the method used with the backend.  On top of adding some
documentation, I think that you could get some inspiration from the
recent GSSAPI encription patch which has been submitted again for v12
cycle, which has spent a large amount of time designing its set of
options.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: libpq compression

konstantin knizhnik
In reply to this post by Thomas Munro-3
On 05.06.2018 08:26, Thomas Munro wrote:

> On Thu, May 17, 2018 at 3:54 AM, Konstantin Knizhnik
> <[hidden email]> wrote:
>> Thank you for this notice.
>> Updated and rebased patch is attached.
> Hi Konstantin,
>
> Seems very useful.  +1.
>
> + rc = inflate(&zs->rx, Z_SYNC_FLUSH);
> + if (rc != Z_OK)
> + {
> + return ZPQ_DECOMPRESS_ERROR;
> + }
>
> Does this actually guarantee that zs->rx.msg is set to a string?  I
> looked at some documentation here:
>
> https://www.zlib.net/manual.html
>
> It looks like return value Z_DATA_ERROR means that msg is set, but for
> the other error codes Z_STREAM_ERROR, Z_BUF_ERROR, Z_MEM_ERROR it
> doesn't explicitly say that.  From a casual glance at
> https://github.com/madler/zlib/blob/master/inflate.c I think it might
> be set to Z_NULL and then never set to a string except in the mode =
> BAD paths that produce the Z_DATA_ERROR return code.  That's
> interesting because later we do this:
>
> + if (r == ZPQ_DECOMPRESS_ERROR)
> + {
> + ereport(COMMERROR,
> + (errcode_for_socket_access(),
> + errmsg("Failed to decompress data: %s", zpq_error(PqStream))));
> + return EOF;
>
> ... where zpq_error() returns zs->rx.msg.  That might crash or show
> "(null)" depending on libc.
>
> Also, message style: s/F/f/
>
> +ssize_t zpq_read(ZpqStream* zs, void* buf, size_t size, size_t* processed)
>
> Code style:  We write "Type *foo", not "Type* var".  We put the return
> type of a function definition on its own line.
>
> It looks like there is at least one place where zpq_stream.o's symbols
> are needed but it isn't being linked in, so the build fails in some
> ecpg stuff reached by make check-world:
>
> gcc -Wall -Wmissing-prototypes -Wpointer-arith
> -Wdeclaration-after-statement -Wendif-labels
> -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
> -fwrapv -fexcess-precision=standard -g -O2 -pthread -D_REENTRANT
> -D_THREAD_SAFE -D_POSIX_PTHREAD_SEMANTICS test1.o
> -L../../../../../src/port -L../../../../../src/common -L../../ecpglib
> -lecpg -L../../pgtypeslib -lpgtypes
> -L../../../../../src/interfaces/libpq -lpq   -Wl,--as-needed
> -Wl,-rpath,'/usr/local/pgsql/lib',--enable-new-dtags  -lpgcommon
> -lpgport -lpthread -lz -lreadline -lrt -lcrypt -ldl -lm   -o test1
> ../../../../../src/interfaces/libpq/libpq.so: undefined reference to `zpq_free'
> ../../../../../src/interfaces/libpq/libpq.so: undefined reference to `zpq_error'
> ../../../../../src/interfaces/libpq/libpq.so: undefined reference to `zpq_read'
> ../../../../../src/interfaces/libpq/libpq.so: undefined reference to
> `zpq_buffered'
> ../../../../../src/interfaces/libpq/libpq.so: undefined reference to
> `zpq_create'
> ../../../../../src/interfaces/libpq/libpq.so: undefined reference to `zpq_write'
>
Hi Thomas,
Thank you for review. Updated version of the patch fixing all reported
problems is attached.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


libpq-compression-4.patch (38K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: libpq compression

konstantin knizhnik
In reply to this post by Thomas Munro-3


On 05.06.2018 09:04, Thomas Munro wrote:

> On Thu, May 17, 2018 at 3:54 AM, Konstantin Knizhnik
> <[hidden email]> wrote:
>> Concerning specification of compression level: I have made many experiments
>> with different data sets and both zlib/zstd and in both cases using
>> compression level higher than default doesn't cause some noticeable increase
>> of compression ratio, but quite significantly reduce speed. Moreover, for
>> "pgbench -i" zstd provides better compression ratio (63 times!) with
>> compression level 1 than with with largest recommended compression level 22!
>> This is why I decided not to allow user to choose compression level.
> Speaking of configuration, are you planning to support multiple
> compression libraries at the same time?  It looks like the current
> patch implicitly requires client and server to use the same configure
> option, without any attempt to detect or negotiate.  Do I guess
> correctly that a library mismatch would produce an incomprehensible
> corrupt stream message?
>
Frankly speaking I am not sure that support of multiple compression
libraries at the same time is actually needed.
If we build Postgres from sources, then both frontend and backend
libraries will use the same compression library.
zlib is available almost everywhere and Postgres in any case is using it.
zstd is faster and provides better compression ratio. So in principle it
may be useful to try first to use zstd and if it is not available then
use zlib.
It will require dynamic loading of this libraries.

libpq stream compression is not the only place where compression is used
in Postgres. So I think that the problem of  choosing compression algorithm
and supporting custom compression methods should be fixed at some upper
level. There is patch for custom compression method at commit fest.
May be it should be combined with this one.

Right now if client and server libpq libraries were built with different
compression libraries, then decompress error will be reported.
Supporting multiple compression methods will require more sophisticated
handshake protocol so that client and server can choose compression
method which is supported by both of them.
But certainly it can be done.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Reply | Threaded
Open this post in threaded view
|

Re: libpq compression

konstantin knizhnik
In reply to this post by Michael Paquier-2


On 05.06.2018 10:09, Michael Paquier wrote:

> On Tue, Jun 05, 2018 at 06:04:21PM +1200, Thomas Munro wrote:
>> On Thu, May 17, 2018 at 3:54 AM, Konstantin Knizhnik
>> Speaking of configuration, are you planning to support multiple
>> compression libraries at the same time?  It looks like the current
>> patch implicitly requires client and server to use the same configure
>> option, without any attempt to detect or negotiate.  Do I guess
>> correctly that a library mismatch would produce an incomprehensible
>> corrupt stream message?
> I just had a quick look at this patch, lured by the smell of your latest
> messages...  And it seems to me that this patch needs a heavy amount of
> work as presented.  There are a couple of things which are not really
> nice, like forcing the presentation of the compression option in the
> startup packet to begin with.  The high-jacking around secure_read() is
> not nice either as it is aimed at being a rather high-level API on top
> of the method used with the backend.  On top of adding some
> documentation, I think that you could get some inspiration from the
> recent GSSAPI encription patch which has been submitted again for v12
> cycle, which has spent a large amount of time designing its set of
> options.
> --
> Michael
Thank you for feedback,
I have considered this patch mostly as prototype to estimate efficiency
of libpq  protocol compression and compare it with SSL compression.
So I agree with you that there are a lot of things which should be improved.

But can you please clarify what is wrong with "forcing the presentation
of the compression option in the startup packet to begin with"?
Do you mean that it will be better to be able switch on/off compression
during session?

Also I do not completely understand what do you mean by "high-jacking
around secure_read()".
I looked at GSSAPI patch. It does injection in secure_read:

+#ifdef ENABLE_GSS
+    if (port->gss->enc)
+    {
+        n = be_gssapi_read(port, ptr, len);
+        waitfor = WL_SOCKET_READABLE;
+    }
+    else

But the main difference between encryption and compression is that
encryption is not changing data size, while compression does.
To be able to use streaming compression, I need to specify some function
for reading data from the stream. I am using secure_read for this purpose:

        PqStream = zpq_create((zpq_tx_func)secure_write,
(zpq_rx_func)secure_read, MyProcPort);

Can you please explain what is the problem with it?

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Reply | Threaded
Open this post in threaded view
|

Re: libpq compression

Peter Eisentraut-6
In reply to this post by Michael Paquier-2
On 6/5/18 03:09, Michael Paquier wrote:
> I just had a quick look at this patch, lured by the smell of your latest
> messages...  And it seems to me that this patch needs a heavy amount of
> work as presented.  There are a couple of things which are not really
> nice, like forcing the presentation of the compression option in the
> startup packet to begin with.

Yeah, at this point we will probably need a discussion and explanation
of the protocol behavior this is adding, such as how to negotiate
different compression settings.

Unrelatedly, I suggest skipping the addition of -Z options to various
client-side tools.  This is unnecessary, since generic connection
options can already be specified via -d typically, and it creates
confusion because -Z is already used to specify output compression by
some programs.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: libpq compression

Dave Cramer-8

On 5 June 2018 at 13:06, Peter Eisentraut <[hidden email]> wrote:
On 6/5/18 03:09, Michael Paquier wrote:
> I just had a quick look at this patch, lured by the smell of your latest
> messages...  And it seems to me that this patch needs a heavy amount of
> work as presented.  There are a couple of things which are not really
> nice, like forcing the presentation of the compression option in the
> startup packet to begin with.

Yeah, at this point we will probably need a discussion and explanation
of the protocol behavior this is adding, such as how to negotiate
different compression settings.

Unrelatedly, I suggest skipping the addition of -Z options to various
client-side tools.  This is unnecessary, since generic connection
options can already be specified via -d typically, and it creates
confusion because -Z is already used to specify output compression by
some programs.


As the maintainer of the JDBC driver I would think we would also like to leverage this as well.

There are a few other drivers that implement the protocol as well and I'm sure they would want in as well.
I haven't looked at the patch but if we get to the point of negotiating compression please let me know.

Thanks,


Reply | Threaded
Open this post in threaded view
|

Re: libpq compression

Thomas Munro-3
In reply to this post by konstantin knizhnik
On Wed, Jun 6, 2018 at 2:06 AM, Konstantin Knizhnik
<[hidden email]> wrote:
> Thank you for review. Updated version of the patch fixing all reported
> problems is attached.

Small problem on Windows[1]:

  C:\projects\postgresql\src\include\common/zpq_stream.h(17): error
C2143: syntax error : missing ')' before '*'
[C:\projects\postgresql\libpq.vcxproj]
2395

You used ssize_t in zpq_stream.h, but Windows doesn't have that type.
We have our own typedef in win32_port.h.  Perhaps zpq_stream.c should
include postgres.h/postgres_fe.h (depending on FRONTEND) like the
other .c files in src/common, before it includes zpq_stream.h?
Instead of "c.h".

[1] https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.1106

--
Thomas Munro
http://www.enterprisedb.com

Reply | Threaded
Open this post in threaded view
|

Re: libpq compression

Michael Paquier-2
In reply to this post by konstantin knizhnik
On Tue, Jun 05, 2018 at 06:58:42PM +0300, Konstantin Knizhnik wrote:
> I have considered this patch mostly as prototype to estimate efficiency of
> libpq protocol compression and compare it with SSL compression.
> So I agree with you that there are a lot of things which should be
> improved.

Cool.  It seems that there is some meaning for such a feature with
environments with spare CPU and network limitations.

> But can you please clarify what is wrong with "forcing the presentation of
> the compression option in the startup packet to begin with"?

Sure, I am referring to that in your v4:
    if (conn->replication && conn->replication[0])
       ADD_STARTUP_OPTION("replication", conn->replication);
+   if (conn->compression && conn->compression[0])
+       ADD_STARTUP_OPTION("compression", conn->compression);
There is no point in adding that as a mandatory field of the startup
packet.

> Do you mean that it will be better to be able switch on/off compression
> during session?

Not really, I get that this should be defined when the session is
established and remain until the session finishes.  You have a couple of
restrictions like what to do with the first set of messages exchanged
but that could be delayed until the negotiation is done.

> But the main difference between encryption and compression is that
> encryption is not changing data size, while compression does.
> To be able to use streaming compression, I need to specify some function for
> reading data from the stream. I am using secure_read for this purpose:
>
>        PqStream = zpq_create((zpq_tx_func)secure_write,
> (zpq_rx_func)secure_read, MyProcPort);
>
> Can you please explain what is the problem with it?

Likely I have not looked at your patch sufficiently, but the point I am
trying to make is that secure_read or pqsecure_read are entry points
which switch method depending on the connection details.  The GSSAPI
encryption patch does that.  Yours does not with stuff like that:

 retry4:
-   nread = pqsecure_read(conn, conn->inBuffer + conn->inEnd,
-                         conn->inBufSize - conn->inEnd);

This makes the whole error handling more consistent, and the new option
layer as well more consistent with what happens in SSL, except that you
want to be able to combine SSL and compression as well so you need an
extra process which decompresses/compresses the data after doing a "raw"
or "ssl" read/write.  I have not actually looked much at your patch, but
I am wondering if it could not be possible to make the whole footprint
less invasive which really worries me as now shaped.  As you need to
register functions with say zpq_create(), it would be instinctively
nicer to do the handling directly in secure_read() and such.

Just my 2c.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: libpq compression

konstantin knizhnik


On 06.06.2018 10:53, Michael Paquier wrote:

> On Tue, Jun 05, 2018 at 06:58:42PM +0300, Konstantin Knizhnik wrote:
>> I have considered this patch mostly as prototype to estimate efficiency of
>> libpq protocol compression and compare it with SSL compression.
>> So I agree with you that there are a lot of things which should be
>> improved.
> Cool.  It seems that there is some meaning for such a feature with
> environments with spare CPU and network limitations.
>
>> But can you please clarify what is wrong with "forcing the presentation of
>> the compression option in the startup packet to begin with"?
> Sure, I am referring to that in your v4:
>      if (conn->replication && conn->replication[0])
>         ADD_STARTUP_OPTION("replication", conn->replication);
> +   if (conn->compression && conn->compression[0])
> +       ADD_STARTUP_OPTION("compression", conn->compression);
> There is no point in adding that as a mandatory field of the startup
> packet.

Sorry, but ADD_STARTUP_OPTION is not adding manatory field of startup
package. This option any be omitted.
There are a lto of other options registered using ADD_STARTUP_OPTION,
for example all environment-driven GUCs:

     /* Add any environment-driven GUC settings needed */
     for (next_eo = options; next_eo->envName; next_eo++)
     {
         if ((val = getenv(next_eo->envName)) != NULL)
         {
             if (pg_strcasecmp(val, "default") != 0)
                 ADD_STARTUP_OPTION(next_eo->pgName, val);
         }
     }


So I do not understand what is wrong here registering "compression" as
option of startup package and what is the alternative for it...

>> Do you mean that it will be better to be able switch on/off compression
>> during session?
> Not really, I get that this should be defined when the session is
> established and remain until the session finishes.  You have a couple of
> restrictions like what to do with the first set of messages exchanged
> but that could be delayed until the negotiation is done.
>
>> But the main difference between encryption and compression is that
>> encryption is not changing data size, while compression does.
>> To be able to use streaming compression, I need to specify some function for
>> reading data from the stream. I am using secure_read for this purpose:
>>
>>         PqStream = zpq_create((zpq_tx_func)secure_write,
>> (zpq_rx_func)secure_read, MyProcPort);
>>
>> Can you please explain what is the problem with it?
> Likely I have not looked at your patch sufficiently, but the point I am
> trying to make is that secure_read or pqsecure_read are entry points
> which switch method depending on the connection details.  The GSSAPI
> encryption patch does that.  Yours does not with stuff like that:
>
>   retry4:
> -   nread = pqsecure_read(conn, conn->inBuffer + conn->inEnd,
> -                         conn->inBufSize - conn->inEnd);
>
> This makes the whole error handling more consistent, and the new option
> layer as well more consistent with what happens in SSL, except that you
> want to be able to combine SSL and compression as well so you need an
> extra process which decompresses/compresses the data after doing a "raw"
> or "ssl" read/write.  I have not actually looked much at your patch, but
> I am wondering if it could not be possible to make the whole footprint
> less invasive which really worries me as now shaped.  As you need to
> register functions with say zpq_create(), it would be instinctively
> nicer to do the handling directly in secure_read() and such.

Once again sorry, but i still do not understand the problem here.
If compression is anabled, then I am using zpq_read instead of
secure_read/pqsecure_read. But  zpq_read is in turn calling
secure_read/pqsecure_read
to fetch more raw data. So if "ecure_read or pqsecure_read are entry
points which switch method depending on the connection details", then 
compression is not preventing them from making this choice. Compression
should be done prior to encryption otherwise compression will have no sense.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Reply | Threaded
Open this post in threaded view
|

Re: libpq compression

konstantin knizhnik
In reply to this post by Thomas Munro-3


On 06.06.2018 02:03, Thomas Munro wrote:

> On Wed, Jun 6, 2018 at 2:06 AM, Konstantin Knizhnik
> <[hidden email]> wrote:
>> Thank you for review. Updated version of the patch fixing all reported
>> problems is attached.
> Small problem on Windows[1]:
>
>    C:\projects\postgresql\src\include\common/zpq_stream.h(17): error
> C2143: syntax error : missing ')' before '*'
> [C:\projects\postgresql\libpq.vcxproj]
> 2395
>
> You used ssize_t in zpq_stream.h, but Windows doesn't have that type.
> We have our own typedef in win32_port.h.  Perhaps zpq_stream.c should
> include postgres.h/postgres_fe.h (depending on FRONTEND) like the
> other .c files in src/common, before it includes zpq_stream.h?
> Instead of "c.h".
>
> [1] https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.1106
>
Thank you very much for reporting the problem.
I attached new patch with include of postgres_fe.h added to zpq_stream.c


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


libpq-compression-5.patch (38K) Download Attachment
123