speed up unicode decomposition and recomposition

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
25 messages Options
12
Reply | Threaded
Open this post in threaded view
|

speed up unicode decomposition and recomposition

John Naylor-3
Having committed the optimization for unicode normalization quick check, Michael Paquier suggested I might do the same for decomposition as well. I wrote:

> I'll
> do some performance testing soon. Note that a 25kB increase in size could
> be present in frontend binaries as well in this case. While looking at
> decomposition, I noticed that recomposition does a linear search through
> all 6600+ entries, although it seems only about 800 are valid for that.
> That could be optimized as well now, since with hashing we have more
> flexibility in the ordering and can put the recomp-valid entries in front.
> I'm not yet sure if it's worth the additional complexity. I'll take a look
> and start a new thread.

The attached patch uses a perfect hash for codepoint decomposition, and for recomposing reduces the linear search from 6604 entries to 942.

The performance is very nice, and if I'd known better I would have done this first, since the decomp array is as big as the two quick check arrays put together:

Normalize, decomp only

select count(normalize(t, NFD)) from (
select md5(i::text) as t from
generate_series(1,100000) as i
) s;

master   patchÏ
887ms    231ms

select count(normalize(t, NFD)) from (
select repeat(U&'\00E4\00C5\0958\00F4\1EBF\3300\1FE2\3316\2465\322D', i % 3 + 1) as t from
generate_series(1,100000) as i
) s;

master   patch
1110ms   208ms


Normalize, decomp+recomp (note: 100x less data)

select count(normalize(t, NFC)) from (
select md5(i::text) as t from
generate_series(1,1000) as i
) s;

master   patch
194ms    50.6ms

select count(normalize(t, NFC)) from (
select repeat(U&'\00E4\00C5\0958\00F4\1EBF\3300\1FE2\3316\2465\322D', i % 3 + 1) as t from
generate_series(1,1000) as i
) s;

master   patch
137ms    39.4ms


Quick check is another 2x faster on top of previous gains, since it tests canonical class via the decomposition array:

-- all chars are quickcheck YES
select count(*) from (
select md5(i::text) as t from
generate_series(1,100000) as i
) s
where t is NFC normalized;

master   patch
296ms    131ms


Some other considerations:
- As I alluded above, this adds ~26kB to libpq because of SASLPrep. Since the decomp array was reordered to optimize linear search, it can no longer be used for binary search. It's possible to build two arrays, one for frontend and one for backend, but that's additional complexity. We could also force frontend to do a linear search all the time, but that seems foolish. I haven't checked if it's possible to exclude the hash from backend's libpq.
- I could split out the two approaches into separate patches, but it'd be rather messy.

I'll add a CF entry for this.

--
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company 

v1-0001-Optimize-unicode-decomposition-and-recomposition.patch (254K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: speed up unicode decomposition and recomposition

Tom Lane-2
John Naylor <[hidden email]> writes:
> Some other considerations:
> - As I alluded above, this adds ~26kB to libpq because of SASLPrep. Since
> the decomp array was reordered to optimize linear search, it can no longer
> be used for binary search. It's possible to build two arrays, one for
> frontend and one for backend, but that's additional complexity. We could
> also force frontend to do a linear search all the time, but that seems
> foolish. I haven't checked if it's possible to exclude the hash from
> backend's libpq.

IIUC, the only place libpq uses this is to process a password-sized string
or two during connection establishment.  It seems quite silly to add
26kB in order to make that faster.  Seems like a nice speedup on the
backend side, but I'd vote for keeping the frontend as-is.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: speed up unicode decomposition and recomposition

Michael Paquier-2
On Wed, Oct 14, 2020 at 01:06:40PM -0400, Tom Lane wrote:

> John Naylor <[hidden email]> writes:
>> Some other considerations:
>> - As I alluded above, this adds ~26kB to libpq because of SASLPrep. Since
>> the decomp array was reordered to optimize linear search, it can no longer
>> be used for binary search.  It's possible to build two arrays, one for
>> frontend and one for backend, but that's additional complexity. We could
>> also force frontend to do a linear search all the time, but that seems
>> foolish. I haven't checked if it's possible to exclude the hash from
>> backend's libpq.
>
> IIUC, the only place libpq uses this is to process a password-sized string
> or two during connection establishment.  It seems quite silly to add
> 26kB in order to make that faster.  Seems like a nice speedup on the
> backend side, but I'd vote for keeping the frontend as-is.
Agreed.  Let's only use the perfect hash in the backend.  It would be
nice to avoid an extra generation of the decomposition table for that,
and a table ordered by codepoints is easier to look at.  How much do
you think would be the performance impact if we don't use for the
linear search the most-optimized decomposition table?
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: speed up unicode decomposition and recomposition

John Naylor-3

On Wed, Oct 14, 2020 at 8:25 PM Michael Paquier <[hidden email]> wrote:
On Wed, Oct 14, 2020 at 01:06:40PM -0400, Tom Lane wrote:
> IIUC, the only place libpq uses this is to process a password-sized string
> or two during connection establishment.  It seems quite silly to add
> 26kB in order to make that faster.  Seems like a nice speedup on the
> backend side, but I'd vote for keeping the frontend as-is.

Agreed.  Let's only use the perfect hash in the backend.  It would be
nice to avoid an extra generation of the decomposition table for that,
and a table ordered by codepoints is easier to look at.  How much do
you think would be the performance impact if we don't use for the
linear search the most-optimized decomposition table?

With those points in mind and thinking more broadly, I'd like to try harder on recomposition. Even several times faster, recomposition is still orders of magnitude slower than ICU, as measured by Daniel Verite [1]. I only did it this way because I couldn't think of how to do the inverse lookup with a hash. But I think if we constructed the hash key like

pg_hton64((code1 << 32) | code2)

and on the Perl side do something like

pack('N',$code1) . pack('N',$code2)

that might work. Or something that looks more like the C side. And make sure to use the lowest codepoint for the result. That way, we can still keep the large decomp array ordered, making it easier to keep the current implementation in the front end, and hopefully getting even better performance in the backend.


--
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company 
Reply | Threaded
Open this post in threaded view
|

Re: speed up unicode decomposition and recomposition

Tom Lane-2
John Naylor <[hidden email]> writes:
> With those points in mind and thinking more broadly, I'd like to try harder
> on recomposition. Even several times faster, recomposition is still orders
> of magnitude slower than ICU, as measured by Daniel Verite [1].

Huh.  Has anyone looked into how they do it?

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: speed up unicode decomposition and recomposition

Kyotaro Horiguchi-4
At Wed, 14 Oct 2020 23:06:28 -0400, Tom Lane <[hidden email]> wrote in
> John Naylor <[hidden email]> writes:
> > With those points in mind and thinking more broadly, I'd like to try harder
> > on recomposition. Even several times faster, recomposition is still orders
> > of magnitude slower than ICU, as measured by Daniel Verite [1].
>
> Huh.  Has anyone looked into how they do it?

I'm not sure it is that, but it would be that..  It uses separate
tables for decomposition and composition pointed from a trie?

That table is used after trying algorithmic decomposition/composition
for, for example, Hangul. I didn't look it any fruther but just for
information, icu4c/source/common/normalizer2impl.cpp seems doing that.
For example icu4c/srouce/common/norm2_nfc_data.h defines the static data.

icu4c/source/common/normalier2impl.h:244 points a design documentation
of normalization.

http://site.icu-project.org/design/normalization/custom

> Old and New Implementation Details
>
> The old normalization data format (unorm.icu, ca. 2001..2009) uses
> three data structures for normalization: A trie for looking up 32-bit
> values for every code point, a 16-bit-unit array with decompositions
> and some other data, and a composition table (16-bit-unit array,
> linear search list per starter). The data is combined for all 4
> standard normalization forms: NFC, NFD, NFKC and NFKD.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


Reply | Threaded
Open this post in threaded view
|

Re: speed up unicode decomposition and recomposition

John Naylor-3

On Thu, Oct 15, 2020 at 1:30 AM Kyotaro Horiguchi <[hidden email]> wrote:
At Wed, 14 Oct 2020 23:06:28 -0400, Tom Lane <[hidden email]> wrote in
> John Naylor <[hidden email]> writes:
> > With those points in mind and thinking more broadly, I'd like to try harder
> > on recomposition. Even several times faster, recomposition is still orders
> > of magnitude slower than ICU, as measured by Daniel Verite [1].
>
> Huh.  Has anyone looked into how they do it?

I'm not sure it is that, but it would be that..  It uses separate
tables for decomposition and composition pointed from a trie?

I think I've seen a trie recommended somewhere, maybe the official website. That said, I was able to get the hash working for recomposition (split into a separate patch, and both of them now leave frontend alone), and I'm pleased to say it's 50-75x faster than linear search in simple tests. I'd be curious how it compares to ICU now. Perhaps Daniel Verite would be interested in testing again? (CC'd)

select count(normalize(t, NFC)) from (
select md5(i::text) as t from
generate_series(1,100000) as i
) s;

master     patch
18800ms    257ms

select count(normalize(t, NFC)) from (
select repeat(U&'\00E4\00C5\0958\00F4\1EBF\3300\1FE2\3316\2465\322D', i % 3 + 1) as t from
generate_series(1,100000) as i
) s;

master     patch
13000ms    254ms

--
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company 

v2-0001-Speed-up-unicode-decomposition.patch (161K) Download Attachment
v2-0002-Speed-up-unicode-recomposition.patch (73K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: speed up unicode decomposition and recomposition

Michael Paquier-2
On Thu, Oct 15, 2020 at 01:59:38PM -0400, John Naylor wrote:
> I think I've seen a trie recommended somewhere, maybe the official website.
> That said, I was able to get the hash working for recomposition (split into
> a separate patch, and both of them now leave frontend alone), and I'm
> pleased to say it's 50-75x faster than linear search in simple tests. I'd
> be curious how it compares to ICU now. Perhaps Daniel Verite would be
> interested in testing again? (CC'd)

Yeah, that would be interesting to compare.  Now the gains proposed by
this patch are already a good step forward, so I don't think that it
should be a blocker for a solution we have at hand as the numbers
speak by themselves here.  So if something better gets proposed, we
could always change the decomposition and recomposition logic as
needed.

> select count(normalize(t, NFC)) from (
> select md5(i::text) as t from
> generate_series(1,100000) as i
> ) s;
>
> master     patch
> 18800ms    257ms

My environment was showing HEAD as being a bit faster with 15s, while
the patch gets "only" down to 290~300ms (compiled with -O2, as I guess
you did).  Nice.

+   # Then the second
+   return -1 if $a2 < $b2;
+   return 1 if $a2 > $b2;
Should say "second code point" here?

+       hashkey = pg_hton64(((uint64) start << 32) | (uint64) code);
+       h = recompinfo.hash(&hashkey);
This choice should be documented, and most likely we should have
comments on the perl and C sides to keep track of the relationship
between the two.

The binary sizes of libpgcommon_shlib.a and libpgcommon.a change
because Decomp_hash_func() gets included, impacting libpq.
Structurally, wouldn't it be better to move this part into its own,
backend-only, header?  It could be possible to paint the difference
with some ifdef FRONTEND of course, or just keep things as they are
because this can be useful for some out-of-core frontend tool?  But if
we keep that as a separate header then any C part can decide to
include it or not, so frontend tools could also make this choice.
Note that we don't include unicode_normprops_table.h for frontends in
unicode_norm.c, but that's the case of unicode_norm_table.h.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: speed up unicode decomposition and recomposition

Daniel Verite
In reply to this post by John Naylor-3
        John Naylor wrote:

> I'd be curious how it compares to ICU now

I've made another run of the test in [1] with your v2 patches
from this thread against icu_ext built with ICU-67.1.
The results show the times in milliseconds to process
about 10 million short strings:

 operation  | unpatched | patched | icu_ext
------------+-----------+---------+---------
 nfc check  |   7968 |    5989 |    4076
 nfc conv   | 700894 |   15163 |    6808
 nfd check  |  16399 |    9852 |    3849
 nfd conv   |  17391 |   10916 |    6758
 nfkc check |   8259 |    6092 |    4301
 nfkc conv  | 700241 |   15354 |    7034
 nfkd check |  16585 |   10049 |    4038
 nfkd conv  |  17587 |   11109 |    7086

The ICU implementation still wins by a large margin, but
the improvements brought by the patch are gorgeous,
especially for the conversion to NFC/NFKC.
This test ran on a slower machine than what I used for [1], so
that's why all queries take longer.

For the two queries upthread, I get this:

1)
select count(normalize(t, NFC)) from (
select md5(i::text) as t from
generate_series(1,100000) as i
) s;
count  
--------
 100000
(1 row)

Time: 371.043 ms

VS ICU:

select count(icu_normalize(t, 'NFC')) from (
select md5(i::text) as t from
generate_series(1,100000) as i
) s;
 count
--------
 100000
(1 row)

Time: 125.809 ms


2)
select count(normalize(t, NFC)) from (
select repeat(U&'\00E4\00C5\0958\00F4\1EBF\3300\1FE2\3316\2465\322D', i % 3
+ 1) as t from
generate_series(1,100000) as i
) s;
 count
--------
 100000
(1 row)
Time: 428.214 ms


VS ICU:

select count(icu_normalize(t, 'NFC')) from (
select repeat(U&'\00E4\00C5\0958\00F4\1EBF\3300\1FE2\3316\2465\322D', i % 3
+ 1) as t from
generate_series(1,100000) as i
) s;
 count
--------
 100000
(1 row)

Time: 147.713 ms


[1]
https://www.postgresql.org/message-id/2c5e8df9-43b8-41fa-88e6-286e8634f00a%40manitou-mail.org


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite


Reply | Threaded
Open this post in threaded view
|

Re: speed up unicode decomposition and recomposition

John Naylor-3
In reply to this post by Michael Paquier-2


On Thu, Oct 15, 2020 at 11:32 PM Michael Paquier <[hidden email]> wrote:

The binary sizes of libpgcommon_shlib.a and libpgcommon.a change
because Decomp_hash_func() gets included, impacting libpq.

I don't see any difference on gcc/Linux in those two files, nor in unicode_norm_shlib.o -- I do see a difference in unicode_norm_srv.o as expected. Could it depend on the compiler?

--
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company 
Reply | Threaded
Open this post in threaded view
|

Re: speed up unicode decomposition and recomposition

John Naylor-3
In reply to this post by Daniel Verite


On Fri, Oct 16, 2020 at 2:08 PM Daniel Verite <[hidden email]> wrote:
        John Naylor wrote:

> I'd be curious how it compares to ICU now

I've made another run of the test in [1] with your v2 patches
from this thread against icu_ext built with ICU-67.1.
The results show the times in milliseconds to process
about 10 million short strings:

Thanks for testing!

--
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company 
Reply | Threaded
Open this post in threaded view
|

Re: speed up unicode decomposition and recomposition

Michael Paquier-2
In reply to this post by John Naylor-3
On Mon, Oct 19, 2020 at 10:34:33AM -0400, John Naylor wrote:
> I don't see any difference on gcc/Linux in those two files, nor in
> unicode_norm_shlib.o -- I do see a difference in unicode_norm_srv.o as
> expected. Could it depend on the compiler?

Hmm.  My guess is that you don't have --enable-debug in your set of
configure options?  It is not unusual to have this one enabled for GCC
even on production systems, and the size of the libs is impacted in
this case with your patch.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: speed up unicode decomposition and recomposition

John Naylor-3


On Tue, Oct 20, 2020 at 3:22 AM Michael Paquier <[hidden email]> wrote:
On Mon, Oct 19, 2020 at 10:34:33AM -0400, John Naylor wrote:
> I don't see any difference on gcc/Linux in those two files, nor in
> unicode_norm_shlib.o -- I do see a difference in unicode_norm_srv.o as
> expected. Could it depend on the compiler?

Hmm.  My guess is that you don't have --enable-debug in your set of
configure options?  It is not unusual to have this one enabled for GCC
even on production systems, and the size of the libs is impacted in
this case with your patch.

I've confirmed that. How about a new header unicode_norm_hashfunc.h which would include unicode_norm_table.h at the top. In unicode.c, we can include one of these depending on frontend or backend. 

--
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company 
Reply | Threaded
Open this post in threaded view
|

Re: speed up unicode decomposition and recomposition

Michael Paquier-2
On Tue, Oct 20, 2020 at 08:03:12AM -0400, John Naylor wrote:
> I've confirmed that. How about a new header unicode_norm_hashfunc.h which
> would include unicode_norm_table.h at the top. In unicode.c, we can include
> one of these depending on frontend or backend.

Sounds good to me.  Looking at the code, I would just generate the
second file within generate-unicode_norm_table.pl.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: speed up unicode decomposition and recomposition

John Naylor-3
In reply to this post by Michael Paquier-2
Attached v3 addressing review points below:

On Thu, Oct 15, 2020 at 11:32 PM Michael Paquier <[hidden email]> wrote:
+   # Then the second
+   return -1 if $a2 < $b2;
+   return 1 if $a2 > $b2;
Should say "second code point" here?

Done. Also changed the tiebreaker to the composed codepoint. Beforehand, it was the index into DecompMain[], which is only equivalent if the list is in order (it is but we don't want correctness to depend on that), and not very clear.
 
+       hashkey = pg_hton64(((uint64) start << 32) | (uint64) code);
+       h = recompinfo.hash(&hashkey);
This choice should be documented, and most likely we should have
comments on the perl and C sides to keep track of the relationship
between the two.

Done.
 
<separate headers>

Done. 

Other cosmetic changes:
- format recomp array comments like /* U+0045+032D -> U+1E18 */
- make sure to comment #endif's that are far from the #if
- small whitespace fixes

Note: for the new header I simply adapted from unicode_norm_table.h the choice of "There is deliberately not an #ifndef PG_UNICODE_NORM_TABLE_H here", although I must confess I'm not sure what the purpose of that is, in this case.

--
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company 

v3-0001-Speed-up-unicode-decomposition.patch (146K) Download Attachment
v3-0002-Speed-up-unicode-recomposition.patch (73K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: speed up unicode decomposition and recomposition

John Naylor-3
There was a mistake in v3 with pgindent/exclude_file_patterns, fixed in v4.

--
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company 

v4-0001-Speed-up-unicode-decomposition.patch (146K) Download Attachment
v4-0002-Speed-up-unicode-recomposition.patch (73K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: speed up unicode decomposition and recomposition

Michael Paquier-2
On Wed, Oct 21, 2020 at 07:35:12PM -0400, John Naylor wrote:
> There was a mistake in v3 with pgindent/exclude_file_patterns, fixed in v4.

Thanks for the updated version, that was fast.  I have found a couple
of places that needed to be adjusted, like the comment at the top of
generate-unicode_norm_table.pl or some comments, an incorrect include
in the new headers and the indentation was not right in perl (we use
perltidy v20170521, see the README in src/tools/pgindent).

Except that, this looks good to me.  Attached is the updated version
with all my tweaks, that I would like to commit.  If there are any
comments, please feel free of course.
--
Michael

unicode-derecomp-v5.patch (162K) Download Attachment
signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: speed up unicode decomposition and recomposition

John Naylor-3
On Thu, Oct 22, 2020 at 12:34 AM Michael Paquier <[hidden email]> wrote:
Thanks for the updated version, that was fast.  I have found a couple
of places that needed to be adjusted, like the comment at the top of
generate-unicode_norm_table.pl or some comments, an incorrect include
in the new headers and the indentation was not right in perl (we use
perltidy v20170521, see the README in src/tools/pgindent).

Except that, this looks good to me.  Attached is the updated version
with all my tweaks, that I would like to commit.  If there are any
comments, please feel free of course.

Looks good to me.

--
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company 
Reply | Threaded
Open this post in threaded view
|

Re: speed up unicode decomposition and recomposition

Michael Paquier-2
On Thu, Oct 22, 2020 at 05:50:52AM -0400, John Naylor wrote:
> Looks good to me.

Thanks.  Committed, then.  Great work!
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: speed up unicode decomposition and recomposition

John Naylor-3

On Thu, Oct 22, 2020 at 10:11 PM Michael Paquier <[hidden email]> wrote:
On Thu, Oct 22, 2020 at 05:50:52AM -0400, John Naylor wrote:
> Looks good to me.

Thanks.  Committed, then.  Great work!

Thank you!

--
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company 
12