BUG #15285: Query used index over field with ICU collation in some cases wrongly return 0 rows

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|

BUG #15285: Query used index over field with ICU collation in some cases wrongly return 0 rows

PG Bug reporting form
The following bug has been logged on the website:

Bug reference:      15285
Logged by:          Roman Lytovchenko
Email address:      [hidden email]
PostgreSQL version: 10.4
Operating system:   fedora
Description:        

How to reproduce:
CREATE COLLATION digitslast (provider = icu, locale =
'en@colReorder=latn-digit');
CREATE TABLE t (b CHAR(4) NOT NULL COLLATE digitslast);
insert into t select '0000' from generate_series (0, 1000) as f(x);
insert into t select '0001' from generate_series (0, 1000) as f(x);
insert into t select 'ABCD' from generate_series (0, 1000) as f(x);

create index i on t(b);
select * from t where b = '0000' ;
-- 0 rows, and this is a bug

explain analyze select * from t where b = '0000' ;
Index Only Scan using i on t  (cost=0.28..41.80 rows=1001 width=5) (actual
time=0.045..0.045 rows=0 loops=1)
  Index Cond: (b = '0000'::bpchar)
  Heap Fetches: 0
Planning time: 0.146 ms
Execution time: 0.080 ms

drop index i;
select * from t where b = '0000' ;
-- 1001 rows

So, select version();
PostgreSQL 10.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.0.1 20180324
(Red Hat 8.0.1-0.20), 64-bit
$cat /proc/version
Linux version 3.10.0-514.10.2.el7.x86_64 ([hidden email])
(gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Fri Mar 3
00:04:05 UTC 2017

Reply | Threaded
Open this post in threaded view
|

Re: BUG #15285: Query used index over field with ICU collation in some cases wrongly return 0 rows

Peter Geoghegan-4
On Thu, Jul 19, 2018 at 9:03 AM, PG Bug reporting form
<[hidden email]> wrote:
> How to reproduce:
> CREATE COLLATION digitslast (provider = icu, locale =
> 'en@colReorder=latn-digit');
> CREATE TABLE t (b CHAR(4) NOT NULL COLLATE digitslast);
> insert into t select '0000' from generate_series (0, 1000) as f(x);
> insert into t select '0001' from generate_series (0, 1000) as f(x);
> insert into t select 'ABCD' from generate_series (0, 1000) as f(x);

I can confirm the bug on the master branch:

pg@~[25013]=# select bt_index_parent_check('i');
ERROR:  item order invariant violated for index "i"
DETAIL:  Lower index tid=(3,3) (points to index tid=(4,23)) higher
index tid=(3,4) (points to index tid=(5,98)) page lsn=0/169CD78.

It looks like a problem with char(n) abbreviated keys not agreeing
with B-Tree support function 1 for the same opclass. "ABCD" appears
before "0000" and "0001" in the index, which seems like the expected
behavior:

pg@~[25013]=# select * from bt_page_items('i', 3);
 itemoffset │   ctid   │ itemlen │ nulls │ vars │          data
────────────┼──────────┼─────────┼───────┼──────┼─────────────────────────
          1 │ (1,0)    │       8 │ f     │ f    │
          2 │ (2,109)  │      16 │ f     │ t    │ 0b 41 42 43 44 00 00 00
          3 │ (4,23)   │      16 │ f     │ t    │ 0b 41 42 43 44 00 00 00
          4 │ (5,98)   │      16 │ f     │ t    │ 0b 30 30 30 30 00 00 00
          5 │ (6,12)   │      16 │ f     │ t    │ 0b 30 30 30 30 00 00 00
          6 │ (7,152)  │      16 │ f     │ t    │ 0b 30 30 30 30 00 00 00
          7 │ (8,66)   │      16 │ f     │ t    │ 0b 30 30 30 31 00 00 00
          8 │ (9,206)  │      16 │ f     │ t    │ 0b 30 30 30 31 00 00 00
          9 │ (10,120) │      16 │ f     │ t    │ 0b 30 30 30 31 00 00 00
(9 rows)

(This is the root page.)

It appears that the main support function 1 routine disagrees with the
CREATE INDEX sort order, which is wrong. I'll try to isolate the
problem a bit further.

--
Peter Geoghegan

Reply | Threaded
Open this post in threaded view
|

Re: BUG #15285: Query used index over field with ICU collation in some cases wrongly return 0 rows

Peter Geoghegan-4
On Thu, Jul 19, 2018 at 9:44 AM, Peter Geoghegan <[hidden email]> wrote:
> It appears that the main support function 1 routine disagrees with the
> CREATE INDEX sort order, which is wrong. I'll try to isolate the
> problem a bit further.

As far as I can tell, this is an ICU bug. ucol_strcollUTF8() is buggy
with this digitslast collation, which ucol_nextSortKeyPart() fails to
be bug-compatible with. Other similar customized collations (e.g.
'en-u-kf-upper') work fine. (Ugh, that's familiar in an unpleasant
way.)

I'm using libicu60. What version are you using, Roman?

I tried to find something that matches this on the ICU bug tracker.
This might be a match: https://ssl.icu-project.org/trac/ticket/12518

--
Peter Geoghegan

Reply | Threaded
Open this post in threaded view
|

Re: BUG #15285: Query used index over field with ICU collation in some cases wrongly return 0 rows

Thomas Munro-3
On Fri, Jul 20, 2018 at 11:26 AM, Peter Geoghegan <[hidden email]> wrote:

> On Thu, Jul 19, 2018 at 9:44 AM, Peter Geoghegan <[hidden email]> wrote:
>> It appears that the main support function 1 routine disagrees with the
>> CREATE INDEX sort order, which is wrong. I'll try to isolate the
>> problem a bit further.
>
> As far as I can tell, this is an ICU bug. ucol_strcollUTF8() is buggy
> with this digitslast collation, which ucol_nextSortKeyPart() fails to
> be bug-compatible with. Other similar customized collations (e.g.
> 'en-u-kf-upper') work fine. (Ugh, that's familiar in an unpleasant
> way.)
>
> I'm using libicu60. What version are you using, Roman?
>
> I tried to find something that matches this on the ICU bug tracker.
> This might be a match: https://ssl.icu-project.org/trac/ticket/12518

FWIW I see the same result with icu 61.1 and 62.1_1 from FreeBSD ports.

--
Thomas Munro
http://www.enterprisedb.com

Reply | Threaded
Open this post in threaded view
|

Re: BUG #15285: Query used index over field with ICU collation in some cases wrongly return 0 rows

Jehan-Guillaume de Rorthais
Hello,

I didn't find any other discussion related to this bug, neither on pgsql-bugs
or pgsql-hackers. Hopefully, this is the best thread to give some update.

On Sat, 21 Jul 2018 13:39:12 +1200
Thomas Munro <[hidden email]> wrote:

> On Fri, Jul 20, 2018 at 11:26 AM, Peter Geoghegan <[hidden email]> wrote:
> > On Thu, Jul 19, 2018 at 9:44 AM, Peter Geoghegan <[hidden email]> wrote:  
> >> It appears that the main support function 1 routine disagrees with the
> >> CREATE INDEX sort order, which is wrong. I'll try to isolate the
> >> problem a bit further.  
> >
> > As far as I can tell, this is an ICU bug. ucol_strcollUTF8() is buggy
> > with this digitslast collation, which ucol_nextSortKeyPart() fails to
> > be bug-compatible with. Other similar customized collations (e.g.
> > 'en-u-kf-upper') work fine. (Ugh, that's familiar in an unpleasant
> > way.)
> >
> > I'm using libicu60. What version are you using, Roman?
> >
> > I tried to find something that matches this on the ICU bug tracker.
> > This might be a match: https://ssl.icu-project.org/trac/ticket/12518 
>
> FWIW I see the same result with icu 61.1 and 62.1_1 from FreeBSD ports.

Some colleagues hit this bug as well last week and reported it to me. I can
reproduce this bug with ICU current master branch, version post 67.1.

I wrote a regression test for icu4c and posted it on ICU-12518. See:
https://unicode-org.atlassian.net/browse/ICU-12518

As Peter wrote, ucol_strcollUTF8 (and ucol_strcoll) functions are affected. A
quick and dirty patch to replace ucol_strcoll* by ucol_getSortKey/strcmp
everywhere fixed the bug for my tests.

After playing with ICU regression tests, I found functions ucol_strcollIter
and ucol_nextSortKeyPart are safe. I'll do some performance tests and report
here.

In the meantime, I've been working on various workarounds. The only one I found
is to use "fr-u-kr-latn-digit-kn" instead of "fr-u-kr-latn-digit".
Unfortunately, the two collations are not equivalent, but I believe it might be
useful in many case.

I've been working on a second workaround: creating a type (a char variant for
our usecase), its operators and opfamily. All operators and function 1 relies
on ucol_getSortKey. Most of the workaround works good but surprisingly, the
sort order is only enforced if the field is in the first position:

  * this works: "SORT BY f1 COLLATE digitslast"
  * this fails: "SORT BY f2, f1 COLLATE digitslast"

I hadn't time to investigate further on this last topic.

Regards,


Reply | Threaded
Open this post in threaded view
|

Re: BUG #15285: Query used index over field with ICU collation in some cases wrongly return 0 rows

Jehan-Guillaume de Rorthais
On Wed, 10 Jun 2020 00:29:33 +0200
Jehan-Guillaume de Rorthais <[hidden email]> wrote:
[...]
> After playing with ICU regression tests, I found functions ucol_strcollIter
> and ucol_nextSortKeyPart are safe. I'll do some performance tests and report
> here.

I did some benchmarks. See attachment for the script and its header to
reproduce.

It sorts 935895 french phrases from 0 to 122 chars with an average of 49.
Performance tests were done on current master HEAD (buggy) and using the patch
in attachment, relying on ucol_strcollIter.

My preliminary test with ucol_getSortKey was catastrophic, as we might
expect. 15-17x slower than the current HEAD. So I removed it from actual tests.
I didn't try with ucol_nextSortKeyPart though.

Using ucol_strcollIter performs ~20% slower than HEAD on UTF8 databases, but
this might be acceptable. Here are the numbers:

   DB Encoding   HEAD  strcollIter   ratio
   UTF8          2.74         3.27   1.19x
   LATIN1        5.34         5.40   1.01x

I plan to add a regression test soon.

> In the meantime, I've been working on various workarounds. The only one I
> found is to use "fr-u-kr-latn-digit-kn" instead of "fr-u-kr-latn-digit".
> Unfortunately, the two collations are not equivalent, but I believe it might
> be useful in many case.
>
> I've been working on a second workaround: creating a type (a char variant for
> our usecase), its operators and opfamily. All operators and function 1 relies
> on ucol_getSortKey. Most of the workaround works good but surprisingly, the
> sort order is only enforced if the field is in the first position:
>
>   * this works: "SORT BY f1 COLLATE digitslast"
>   * this fails: "SORT BY f2, f1 COLLATE digitslast"
I fixed this. I didn't declare my opclass as default for the type I created.
I'm not sure people would like to see/discuss this user workaround here?

Regards,

test-icu.bash (1K) Download Attachment
v1-0001-Replace-buggy-ucol_strcoll-funcs-with-ucol_strcollIt.patch (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: BUG #15285: Query used index over field with ICU collation in some cases wrongly return 0 rows

Jehan-Guillaume de Rorthais
On Fri, 12 Jun 2020 18:40:55 +0200
Jehan-Guillaume de Rorthais <[hidden email]> wrote:

> On Wed, 10 Jun 2020 00:29:33 +0200
> Jehan-Guillaume de Rorthais <[hidden email]> wrote:
> [...]
> > After playing with ICU regression tests, I found functions ucol_strcollIter
> > and ucol_nextSortKeyPart are safe. I'll do some performance tests and report
> > here.
>
> I did some benchmarks. See attachment for the script and its header to
> reproduce.
>
> It sorts 935895 french phrases from 0 to 122 chars with an average of 49.
> Performance tests were done on current master HEAD (buggy) and using the patch
> in attachment, relying on ucol_strcollIter.
>
> My preliminary test with ucol_getSortKey was catastrophic, as we might
> expect. 15-17x slower than the current HEAD. So I removed it from actual
> tests. I didn't try with ucol_nextSortKeyPart though.
>
> Using ucol_strcollIter performs ~20% slower than HEAD on UTF8 databases, but
> this might be acceptable. Here are the numbers:
>
>    DB Encoding   HEAD  strcollIter   ratio
>    UTF8          2.74         3.27   1.19x
>    LATIN1        5.34         5.40   1.01x
>
> I plan to add a regression test soon.
Please, find in attachment the second version of the patch, with a
regression test.

Regards,

v2-0001-Replace-buggy-ucol_strcoll-funcs-with-ucol_strcollIt.patch (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: BUG #15285: Query used index over field with ICU collation in some cases wrongly return 0 rows

Peter Geoghegan-4
On Fri, Jun 12, 2020 at 3:43 PM Jehan-Guillaume de Rorthais
<[hidden email]> wrote:
> Please, find in attachment the second version of the patch, with a
> regression test.

Is it possible to fix this by making the existing
HAVE_UCOL_STRCOLLUTF8 test more conservative about ICU version? IOW,
by making it only use ucol_strcollUTF8() on versions that are known to
not be affected by this bug?

This related ICU bug describes an issue affecting only versions 53/54:

https://unicode-org.atlassian.net/browse/ICU-11388

Why not just broaden the existing HAVE_UCOL_STRCOLLUTF8 workaround to
recognize that the related functionality is broken on my versions of
ICU than initially suspected?

--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: BUG #15285: Query used index over field with ICU collation in some cases wrongly return 0 rows

Jehan-Guillaume de Rorthais
Hi Peter,

On Tue, 11 Aug 2020 21:14:03 -0700
Peter Geoghegan <[hidden email]> wrote:

> On Fri, Jun 12, 2020 at 3:43 PM Jehan-Guillaume de Rorthais
> <[hidden email]> wrote:
> > Please, find in attachment the second version of the patch, with a
> > regression test.
>
> Is it possible to fix this by making the existing
> HAVE_UCOL_STRCOLLUTF8 test more conservative about ICU version? IOW,
> by making it only use ucol_strcollUTF8() on versions that are known to
> not be affected by this bug?

I might be missing something, but according to my tests back in June, the bug
exists in both ucol_strcollUTF8()/ucol_strcoll() and is still affecting the very
last version of ICU (67.1).

That's why my patch replaces both functions altogether using ucol_strcollIter
as replacement.

> This related ICU bug describes an issue affecting only versions 53/54:
>
> https://unicode-org.atlassian.net/browse/ICU-11388

This bug is related to ICU4J, not ICU4C. AS far as I understand, this was
related to a bad variable type when porting the code to java. Do I miss
something?

Regards,


Reply | Threaded
Open this post in threaded view
|

Re: BUG #15285: Query used index over field with ICU collation in some cases wrongly return 0 rows

Daniel Verite
        Jehan-Guillaume de Rorthais wrote:

> I might be missing something, but according to my tests back in
> June, the bug exists in both ucol_strcollUTF8()/ucol_strcoll() and
> is still affecting the very last version of ICU (67.1).

Yes. This what I've seen as well when investigating bug #16570
a couple weeks ago, before you pointed out it was the same bug:

https://www.postgresql.org/message-id/16570-58cc04e1a6ef3c3f%40postgresql.org

In that thread I've tried with ICU-60.2-3ubuntu but the results are identical
with 67.1.


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite


Reply | Threaded
Open this post in threaded view
|

Re: BUG #15285: Query used index over field with ICU collation in some cases wrongly return 0 rows

Peter Geoghegan-4
In reply to this post by Jehan-Guillaume de Rorthais
On Tue, Aug 18, 2020 at 9:02 AM Jehan-Guillaume de Rorthais
<[hidden email]> wrote:
> I might be missing something, but according to my tests back in June, the bug
> exists in both ucol_strcollUTF8()/ucol_strcoll() and is still affecting the very
> last version of ICU (67.1).
>
> That's why my patch replaces both functions altogether using ucol_strcollIter
> as replacement.

I see. I misunderstood.

> > This related ICU bug describes an issue affecting only versions 53/54:
> >
> > https://unicode-org.atlassian.net/browse/ICU-11388
>
> This bug is related to ICU4J, not ICU4C. AS far as I understand, this was
> related to a bad variable type when porting the code to java. Do I miss
> something?

That was based on a comment from TracBot on the bug tracker page you
linked to. Clearly it's totally unrelated, though. I jumped the gun.

--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: BUG #15285: Query used index over field with ICU collation in some cases wrongly return 0 rows

Peter Eisentraut-6
In reply to this post by Jehan-Guillaume de Rorthais
On 2020-06-10 00:29, Jehan-Guillaume de Rorthais wrote:
> In the meantime, I've been working on various workarounds. The only one I found
> is to use "fr-u-kr-latn-digit-kn" instead of "fr-u-kr-latn-digit".
> Unfortunately, the two collations are not equivalent, but I believe it might be
> useful in many case.

What precisely is broken in the ICU library?  All the examples so far
refer to kr-latn-digit.  Are all reorderings broken, or something
specifically related to latn and/or digit?  Are any collation
customizations other than reorderings affected?

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: BUG #15285: Query used index over field with ICU collation in some cases wrongly return 0 rows

Jehan-Guillaume de Rorthais
On Wed, 2 Sep 2020 14:06:18 +0200
Peter Eisentraut <[hidden email]> wrote:

> On 2020-06-10 00:29, Jehan-Guillaume de Rorthais wrote:
> > In the meantime, I've been working on various workarounds. The only one I
> > found is to use "fr-u-kr-latn-digit-kn" instead of "fr-u-kr-latn-digit".
> > Unfortunately, the two collations are not equivalent, but I believe it
> > might be useful in many case.  
>
> What precisely is broken in the ICU library?

Using ucol_strcoll/ucol_strcollUTF8 with a custom collation sorting digits after
latn.

> All the examples so far refer to kr-latn-digit.  Are all reorderings broken,
> or something specifically related to latn and/or digit?

I don't know. So far, I only found a couple of reports (mine included) using
kr-latn-digit in different languages. And as I wrote, kr-latn-digit-kn doesn't
seem affected. So all reorderings might not be broken.

But I have no strong facts about this, just tests.

> Are any collation customizations other than reorderings affected?

I didn't poke around to try some other random customizations. The answer lies
somewhere in the ICU codebase. I suppose we'll be able to answer this question
as soon as the bug will be explained.

However, the bug reported here are all about sorting: wrong result order and/or
wrong result because of badly sorted index.

Maybe Daniel has some more experience feedback with other customizations as he
seems to work extensively with ICU and PostgreSQL?

Regards,


Reply | Threaded
Open this post in threaded view
|

Re: BUG #15285: Query used index over field with ICU collation in some cases wrongly return 0 rows

Daniel Verite
        Jehan-Guillaume de Rorthais wrote:

> Maybe Daniel has some more experience feedback with other customizations

No, I've just tried various other reorderings, and didn't find any other that
seems to have the same bug as latn-digit.
My tests consisted of indexing a large corpus of text and running the
index through amcheck.


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite


Reply | Threaded
Open this post in threaded view
|

Re: BUG #15285: Query used index over field with ICU collation in some cases wrongly return 0 rows

Peter Eisentraut-6
On 2020-09-03 09:41, Daniel Verite wrote:
> Jehan-Guillaume de Rorthais wrote:
>
>> Maybe Daniel has some more experience feedback with other customizations
>
> No, I've just tried various other reorderings, and didn't find any other that
> seems to have the same bug as latn-digit.
> My tests consisted of indexing a large corpus of text and running the
> index through amcheck.

In this case I'm tempted to just leave it alone and write it off as a
bug in ICU.  We could potentially inspect the collator object at CREATE
COLLATION time and issues warnings if we find something we know to be buggy.

I don't think we want to make our code uglier and slower for normal uses
to work around a bug in some niche feature in ICU.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: BUG #15285: Query used index over field with ICU collation in some cases wrongly return 0 rows

Jehan-Guillaume de Rorthais
On Thu, 3 Sep 2020 10:26:03 +0200
Peter Eisentraut <[hidden email]> wrote:

> On 2020-09-03 09:41, Daniel Verite wrote:
> > Jehan-Guillaume de Rorthais wrote:
> >  
> >> Maybe Daniel has some more experience feedback with other customizations  
> >
> > No, I've just tried various other reorderings, and didn't find any other
> > that seems to have the same bug as latn-digit.
> > My tests consisted of indexing a large corpus of text and running the
> > index through amcheck.  
>
> In this case I'm tempted to just leave it alone and write it off as a
> bug in ICU.  We could potentially inspect the collator object at CREATE
> COLLATION time and issues warnings if we find something we know to be buggy.
>
> I don't think we want to make our code uglier and slower

It's not that uglier, only slower. And maybe we could wrap the logic inside
some dedicated func/macro checking for versions, etc.

> for normal uses to work around a bug in some niche feature in ICU.

Well, indeed, I was wondering in another thread if we should fix it or
document it.

However, raising some WARNING doesn't seem enough as we would effectively leave
the user create a buggy collation and maybe corrupted index on top of it. *If*
we choose this way, I would vote for an ERROR.

However, as I wrote earlier, we have no hard evidence latn-digit is the only
buggy customization with ICU. Even if there is very little probability, we
might have to pile up some more tests about versions, customization, etc. As
instance, we would have to exclude latn-digit, but not latn-digit-kn, for
some ICU versions, etc, etc... until proven otherwise. Code maintenance for
each new version of ICU might become boring.

But maybe I am being silly while planing on some unknown things and ICU is only
affected for latn-digit?

I really have no strong feeling right now about the best solution to adopt.
However, I feel the least to do would be document it somewhere with a lot of
strong emphasis.

Regards,


Reply | Threaded
Open this post in threaded view
|

Re: BUG #15285: Query used index over field with ICU collation in some cases wrongly return 0 rows

Daniel Verite
        Jehan-Guillaume de Rorthais wrote:

> I really have no strong feeling right now about the best solution to adopt.
> However, I feel the least to do would be document it somewhere with a lot of
> strong emphasis.

Right now https://www.postgresql.org/docs/devel/collation.html
includes this example:

<quote>
CREATE COLLATION digitslast (provider = icu, locale = 'en-u-kr-latn-digit');
CREATE COLLATION digitslast (provider = icu, locale =
'en@colReorder=latn-digit');

    Sort digits after Latin letters. (The default is digits before letters.)
</quote>

Now that we know that this collation is problematic, we could remove
this example, even if we don't want to go as far as documenting
ICU bugs. In fact bug reports used the same name "digitslast", so
I wonder if people tried this straight from our doc.


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite


Reply | Threaded
Open this post in threaded view
|

Re: BUG #15285: Query used index over field with ICU collation in some cases wrongly return 0 rows

Jehan-Guillaume de Rorthais
On Thu, 03 Sep 2020 13:49:24 -0400
Tom Lane <[hidden email]> wrote:

>     =?UTF-8?Q?=D0=BE=D0=B2=D1=87=D0=B5=D0=BD=D0=BA=D0=BE" ?=
>     <[hidden email]>,
>     "PostgreSQL mailing lists" <[hidden email]>
> Subject: Re: BUG #15285: Query used index over field with ICU collation in
> some cases wrongly return 0 rows In-reply-to:
> <[hidden email]> References:
> <[hidden email]> Comments: In-reply-to
> "Daniel Verite" <[hidden email]> message dated "Thu, 03 Sep 2020
> 11:29:15 +0200" Fcc: inbox
> --------
Something broke in this answer, so I try to hook it back to the appropriate
thread.

> "Daniel Verite" <[hidden email]> writes:
> > Now that we know that this collation is problematic, we could remove
> > this example, even if we don't want to go as far as documenting
> > ICU bugs. In fact bug reports used the same name "digitslast", so
> > I wonder if people tried this straight from our doc.  
>
> If we aren't going to try to work around the bug, I agree that
> removing that example (or replacing it with a less buggy one?)
> is a good idea.

OK.
Please, find a patch in attachment. It removes the buggy collation from doc and
adapt existing ones to keep an example of combination of rules.

> I tend to agree with Peter that trying to work around a bug that
> isn't ours and that we don't fully understand is not going to
> be very productive.  What is the argument, other than observation
> of a small number of test cases, that these other subroutines
> don't have bugs of their own?

What about adding it as a "known bug"/"will not fix" in
https://wiki.postgresql.org/wiki/Todo and link it from the doc in a note bloc? I
strongly feel most user do not know where to find such list of bugs in
PostgreSQL ecosystem.

Regards,

v1-0001-doc-remove-buggy-ICU-collation-from-documentation.patch (3K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: BUG #15285: Query used index over field with ICU collation in some cases wrongly return 0 rows

Peter Eisentraut-6
On 2020-09-07 15:27, Jehan-Guillaume de Rorthais wrote:
> Please, find a patch in attachment. It removes the buggy collation from doc and
> adapt existing ones to keep an example of combination of rules.

I agree with this patch in principle.  But perhaps we could keep another
reordering example, maybe latin/greek?

> What about adding it as a "known bug"/"will not fix" in
> https://wiki.postgresql.org/wiki/Todo and link it from the doc in a note bloc? I
> strongly feel most user do not know where to find such list of bugs in
> PostgreSQL ecosystem.

If you feel more users will make use of the Todo list in the wiki, feel
free to add something there.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: BUG #15285: Query used index over field with ICU collation in some cases wrongly return 0 rows

Jehan-Guillaume de Rorthais
On Tue, 8 Sep 2020 16:36:23 +0200
Peter Eisentraut <[hidden email]> wrote:

> On 2020-09-07 15:27, Jehan-Guillaume de Rorthais wrote:
> > Please, find a patch in attachment. It removes the buggy collation from doc
> > and adapt existing ones to keep an example of combination of rules.  
>
> I agree with this patch in principle.  But perhaps we could keep another
> reordering example, maybe latin/greek?

Please, find in attachment a patch implementing your suggestion.

Regards,

v2-0001-doc-remove-buggy-ICU-collation-from-documentation.patch (2K) Download Attachment
12