pglz compression performance, take two

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

pglz compression performance, take two

Andrey Borodin-2
Hi hackers!

A year ago Vladimir Leskov proposed patch to speed up pglz compression[0]. PFA the patch with some editorialisation by me.
I saw some reports of bottlenecking in pglz WAL compression [1].

Hopefully soon we will have compression codecs developed by compression specialists. The work is going on in nearby thread about custom compression methods.
Is it viable to work on pglz optimisation? It's about x1.4 faster. Or should we rely on future use of lz4\zstd and others?

Best regards, Andrey Borodin.

[0] https://www.postgresql.org/message-id/169163A8-C96F-4DBE-A062-7D1CECBE9E5D@...
[1] https://smalldatum.blogspot.com/2020/12/tuning-for-insert-benchmark-postgres_4.html


0001-Reorganize-pglz-compression-code.patch (33K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pglz compression performance, take two

Andrey Borodin-2


> 9 дек. 2020 г., в 12:44, Andrey Borodin <[hidden email]> написал(а):
> PFA the patch with some editorialisation by me.
> I saw some reports of bottlenecking in pglz WAL compression [1].

I've checked that on my machine simple test
echo "wal_compression = on" >> $PGDATA/postgresql.conf
pgbench -i -s 20 && pgbench -T 30
shows ~2-3% of improvement, but the result is not very stable, deviation is comparable. In fact, bottleneck is just shifted from pglz, thus impact is not that measurable.

I've found out that the patch continues ideas from thread [0] and commit 031cc55 [1], but in much more shotgun-surgery way.
Out of curiosity I've rerun tests from that thread
postgres=# with patched as (select testname, avg(seconds) patched from testresults0 group by testname),unpatched as (select testname, avg(seconds) unpatched from testresults group by testname) select * from unpatched join patched using (testname);
     testname      |       unpatched        |        patched        
-------------------+------------------------+------------------------
 512b random       |     4.5568015000000000 |     4.3512980000000000
 100k random       | 1.03342300000000000000 | 1.00326200000000000000
 100k of same byte |     2.1689715000000000 |     2.0958155000000000
 2k random         |     3.1613815000000000 |     3.1861350000000000
 512b text         |     5.7233600000000000 |     5.3602330000000000
 5k text           |     1.7044835000000000 |     1.8086770000000000
(6 rows)


Results of direct call are somewhat more clear.
Unpatched:
     testname      |   auto    
-------------------+-----------
 5k text           |  1100.705
 512b text         |   240.585
 2k random         |   106.865
 100k random       |     2.663
 512b random       |   145.736
 100k of same byte | 13426.880
(6 rows)

Patched:
     testname      |   auto  
-------------------+----------
 5k text           |  767.535
 512b text         |  159.076
 2k random         |   77.126
 100k random       |    1.698
 512b random       |   95.768
 100k of same byte | 6035.159
(6 rows)

Thanks!

Best regards, Andrey Borodin.


[0] https://www.postgresql.org/message-id/flat/5130C914.8080106%40vmware.com
[1] https://github.com/x4m/postgres_g/commit/031cc55bbea6b3a6b67c700498a78fb1d4399476

Reply | Threaded
Open this post in threaded view
|

Re: pglz compression performance, take two

Andrey Borodin-2


> 12 дек. 2020 г., в 22:47, Andrey Borodin <[hidden email]> написал(а):
>
>

I've cleaned up comments, checked that memory alignment stuff actually make sense for 32-bit ARM (according to Godbolt) and did some more code cleanup. PFA v2 patch.

I'm still in doubt should I register this patch on CF or not. I'm willing to work on this, but it's not clear will it hit PGv14. And I hope for PGv15 we will have lz4 or something better for WAL compression.

Best regards, Andrey Borodin.


v2-0001-Reorganize-pglz-compression-code.patch (33K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pglz compression performance, take two

Tomas Vondra-6


On 12/26/20 8:06 AM, Andrey Borodin wrote:

>
>
>> 12 дек. 2020 г., в 22:47, Andrey Borodin <[hidden email]> написал(а):
>>
>>
>
> I've cleaned up comments, checked that memory alignment stuff actually make sense for 32-bit ARM (according to Godbolt) and did some more code cleanup. PFA v2 patch.
>
> I'm still in doubt should I register this patch on CF or not. I'm willing to work on this, but it's not clear will it hit PGv14. And I hope for PGv15 we will have lz4 or something better for WAL compression.
>

I'd suggest registering it, otherwise people are much less likely to
give you feedback. I don't see why it couldn't land in PG14.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: pglz compression performance, take two

Tom Lane-2
Tomas Vondra <[hidden email]> writes:
> On 12/26/20 8:06 AM, Andrey Borodin wrote:
>> I'm still in doubt should I register this patch on CF or not. I'm willing to work on this, but it's not clear will it hit PGv14. And I hope for PGv15 we will have lz4 or something better for WAL compression.

> I'd suggest registering it, otherwise people are much less likely to
> give you feedback. I don't see why it couldn't land in PG14.

Even if lz4 or something else shows up, the existing code will remain
important for TOAST purposes.  It would be years before we lose interest
in it.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: pglz compression performance, take two

Justin Pryzby
In reply to this post by Andrey Borodin-2

On Sat, Dec 26, 2020 at 12:06:59PM +0500, Andrey Borodin wrote:
> > 12 дек. 2020 г., в 22:47, Andrey Borodin <[hidden email]> написал(а):
> I've cleaned up comments, checked that memory alignment stuff actually make sense for 32-bit ARM (according to Godbolt) and did some more code cleanup. PFA v2 patch.

>
> I'm still in doubt should I register this patch on CF or not. I'm willing to work on this, but it's not clear will it hit PGv14. And I hope for PGv15 we will have lz4 or something better for WAL compression.

Thanks for registering it.

There's some typos in the current patch;

farer (further: but it's not your typo)
positiion
reduce a => reduce the
monotonicity what => monotonicity, which
lesser good => less good
allign: align

This comment I couldn't understand:
+        * As initial compare for short matches compares 4 bytes then for the end
+        * of stream length of match should be cut

--
Justin


Reply | Threaded
Open this post in threaded view
|

Re: pglz compression performance, take two

Andrey Borodin-2
Thanks for looking into this, Justin!

> 30 дек. 2020 г., в 09:39, Justin Pryzby <[hidden email]> написал(а):
>
> There's some typos in the current patch;
>
> farer (further: but it's not your typo)
> positiion
> reduce a => reduce the
> monotonicity what => monotonicity, which
> lesser good => less good
> allign: align
Fixed.
>
> This comment I couldn't understand:
> +        * As initial compare for short matches compares 4 bytes then for the end
> +        * of stream length of match should be cut

I've reworded comments.

Best regards, Andrey Borodin.


v3-0001-Reorganize-pglz-compression-code.patch (33K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pglz compression performance, take two

Justin Pryzby
@cfbot: rebased

0001-Reorganize-pglz-compression-code.patch (22K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pglz compression performance, take two

Andrey Borodin-2


> 22 янв. 2021 г., в 07:48, Justin Pryzby <[hidden email]> написал(а):
>
> @cfbot: rebased
> <0001-Reorganize-pglz-compression-code.patch>

Thanks!

I'm experimenting with TPC-C over PostgreSQL 13 on production-like cluster in the cloud. Overall performance is IO-bound, but compression is burning a lot energy too (according to perf top). Cluster consists of 3 nodes(only HA, no standby queries) with 32 vCPU each, 128GB RAM, sync replication, 2000 warehouses, 240GB PGDATA.

Samples: 1M of event 'cpu-clock', 4000 Hz, Event count (approx.): 177958545079
Overhead  Shared Object                                     Symbol
  18.36%  postgres                                          [.] pglz_compress
   3.88%  [kernel]                                          [k] _raw_spin_unlock_irqrestore
   3.39%  postgres                                          [.] hash_search_with_hash_value
   3.00%  [kernel]                                          [k] finish_task_switch
   2.03%  [kernel]                                          [k] copy_user_enhanced_fast_string
   1.14%  [kernel]                                          [k] filemap_map_pages
   1.02%  postgres                                          [.] AllocSetAlloc
   0.93%  postgres                                          [.] _bt_compare
   0.89%  postgres                                          [.] PinBuffer
   0.82%  postgres                                          [.] SearchCatCache1
   0.79%  postgres                                          [.] LWLockAttemptLock
   0.78%  postgres                                          [.] GetSnapshotData

Overall cluster runs 862tps (52KtpmC, though only 26KtmpC is qualified on 2K warehouses).

Thanks!

Best regards, Andrey Borodin.