Create collation reporting the ICU locale display name

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
20 messages Options
Reply | Threaded
Open this post in threaded view
|

Create collation reporting the ICU locale display name

Daniel Verite
 Hi,

The 'locale' or 'lc_collate/lc_ctype' argument of an ICU collation may
have a complicated syntax, especially with non-deterministic
collations, and input mistakes in these names will not necessarily be
detected as such by ICU.

The "display name" of a locale is a simple way to get human-readable
feedback about the characteristics of that locale.
pg_import_system_collations() already push these as comments when
creating locales en masse.

I think it would be nice to have CREATE COLLATION report this
information as feedback in the form of a NOTICE message.
PFA a simple patch implementing that.


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

collation-icu-notice.patch (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Create collation reporting the ICU locale display name

Michael Paquier-2
On Wed, Sep 11, 2019 at 04:53:16PM +0200, Daniel Verite wrote:
> I think it would be nice to have CREATE COLLATION report this
> information as feedback in the form of a NOTICE message.
> PFA a simple patch implementing that.

Why is that better than the descriptions provided with \dO[S]+ in
psql?
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Create collation reporting the ICU locale display name

Daniel Verite
        Michael Paquier wrote:

> On Wed, Sep 11, 2019 at 04:53:16PM +0200, Daniel Verite wrote:
> > I think it would be nice to have CREATE COLLATION report this
> > information as feedback in the form of a NOTICE message.
> > PFA a simple patch implementing that.
>
> Why is that better than the descriptions provided with \dO[S]+ in
> psql?

There is no description for collations created outside of
pg_import_system_collations().

Example:

db=# create collation mycoll(provider=icu, locale='fr-FR-u-ks-level1');
NOTICE:  ICU locale: "French (France, colstrength=primary)"

db=# \x auto

db=# \dO+
List of collations
-[ RECORD 1 ]--+------------------
Schema       | public
Name       | mycoll
Collate        | fr-FR-u-ks-level1
Ctype       | fr-FR-u-ks-level1
Provider       | icu
Deterministic? | yes
Description    |

The NOTICE above is with the patch. Otherwise, the "display name"
is never shown nor stored anywhere AFAICS.


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite


Reply | Threaded
Open this post in threaded view
|

Re: Create collation reporting the ICU locale display name

Peter Geoghegan-4
In reply to this post by Daniel Verite
On Wed, Sep 11, 2019 at 7:53 AM Daniel Verite <[hidden email]> wrote:
> The 'locale' or 'lc_collate/lc_ctype' argument of an ICU collation may
> have a complicated syntax, especially with non-deterministic
> collations, and input mistakes in these names will not necessarily be
> detected as such by ICU.

That's a real problem.

> The "display name" of a locale is a simple way to get human-readable
> feedback about the characteristics of that locale.
> pg_import_system_collations() already push these as comments when
> creating locales en masse.
>
> I think it would be nice to have CREATE COLLATION report this
> information as feedback in the form of a NOTICE message.
> PFA a simple patch implementing that.

I like this idea.

I wonder if it's possible to display a localized version of the
display string in the NOTICE message? Does that work, or could it? For
example, do you see the message in French?

--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: Create collation reporting the ICU locale display name

Peter Geoghegan-4
On Thu, Sep 12, 2019 at 11:30 AM Peter Geoghegan <[hidden email]> wrote:
> I wonder if it's possible to display a localized version of the
> display string in the NOTICE message? Does that work, or could it? For
> example, do you see the message in French?

BTW, I already know for sure that ICU supports localized display
names. The question is whether or not this patch can take advantage of
that.

The way that we use display name in pg_import_system_collations() is
an ugly hack. It insists on only storing ASCII-safe strings in
pg_collation.

--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: Create collation reporting the ICU locale display name

Alvaro Herrera-9
In reply to this post by Daniel Verite
On 2019-Sep-12, Daniel Verite wrote:

> Michael Paquier wrote:
>
> > On Wed, Sep 11, 2019 at 04:53:16PM +0200, Daniel Verite wrote:
> > > I think it would be nice to have CREATE COLLATION report this
> > > information as feedback in the form of a NOTICE message.
> > > PFA a simple patch implementing that.
> >
> > Why is that better than the descriptions provided with \dO[S]+ in
> > psql?
>
> There is no description for collations created outside of
> pg_import_system_collations().

Hmm, sounds like the collation should automatically acquire the display
name as a comment even when created via CREATE COLLATION.

I wonder if INFO is better than NOTICE (I think it is).

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Create collation reporting the ICU locale display name

Tom Lane-2
Alvaro Herrera <[hidden email]> writes:
> I wonder if INFO is better than NOTICE (I think it is).

You're just waving a red flag in front of a bull, you know.

I don't especially like the idea of having this emit a NOTICE;
it's ugly and in-your-face.  INFO is right out.

The idea of having CREATE COLLATION automatically create a comment
is sort of interesting, although it seems pretty orthogonal to
normal command behavior.  I wonder whether the seeming need for
this indicates that we should add a descriptive field to pg_collation
proper, and not usurp the user-oriented comment feature for that.

The difficulty with localization is that whatever we put into
template1 has got to be ASCII-only, so that the template DB
can be copied to other encodings.  I suppose we could consider
having CREATE COLLATION act differently during initdb than
later, but that seems ugly too.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Create collation reporting the ICU locale display name

Michael Paquier-2
On Thu, Sep 12, 2019 at 03:03:43PM -0400, Tom Lane wrote:

> The idea of having CREATE COLLATION automatically create a comment
> is sort of interesting, although it seems pretty orthogonal to
> normal command behavior.  I wonder whether the seeming need for
> this indicates that we should add a descriptive field to pg_collation
> proper, and not usurp the user-oriented comment feature for that.
>
> The difficulty with localization is that whatever we put into
> template1 has got to be ASCII-only, so that the template DB
> can be copied to other encodings.  I suppose we could consider
> having CREATE COLLATION act differently during initdb than
> later, but that seems ugly too.
Or could it make sense to provide a system function which returns a
collation description for at least an ICU-provided one?  We could make
use of that in psql for example.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Create collation reporting the ICU locale display name

Tom Lane-2
Michael Paquier <[hidden email]> writes:

> On Thu, Sep 12, 2019 at 03:03:43PM -0400, Tom Lane wrote:
>> The idea of having CREATE COLLATION automatically create a comment
>> is sort of interesting, although it seems pretty orthogonal to
>> normal command behavior.  I wonder whether the seeming need for
>> this indicates that we should add a descriptive field to pg_collation
>> proper, and not usurp the user-oriented comment feature for that.
>>
>> The difficulty with localization is that whatever we put into
>> template1 has got to be ASCII-only, so that the template DB
>> can be copied to other encodings.  I suppose we could consider
>> having CREATE COLLATION act differently during initdb than
>> later, but that seems ugly too.

> Or could it make sense to provide a system function which returns a
> collation description for at least an ICU-provided one?  We could make
> use of that in psql for example.

Oh, that seems like a good way to tackle it.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Create collation reporting the ICU locale display name

Daniel Verite
In reply to this post by Michael Paquier-2
        Michael Paquier wrote:

> Or could it make sense to provide a system function which returns a
> collation description for at least an ICU-provided one?  We could make
> use of that in psql for example.

If we prefer having a function over the instant feedback effect of
the NOTICE, the function might look like icu_collation_attributes() [1]
from the icu_ext extension. It returns a set of (attribute,value)
tuples, among which the displayname is one of the values.

An advantage of this approach is that you may execute the
function before creating the collation, instead of creating the
collation, realizing there was something wrong in your
locale/lc_collate argument, dropping the collation and trying again.

Another advantage would be the possibility of localizing the
display name, leaving the localization as a choice to the user.
Currently get_icu_locale_comment() forces "en" as the language because
it want results in US-ASCII, but a user-callable function could have the
language code as an optional argument. When not being forced, the
language has a default value obtained by ICU from the environment
(so that would be from where the postmaster is started in our case),
and is also settable with uloc_setDefault().

Example with icu_ext functions:

test=> select icu_set_default_locale('es');
 icu_set_default_locale
------------------------
 es

test=> select value from icu_collation_attributes('en-US-u-ka-shifted')
       where attribute='displayname';
                   value    
--------------------------------------------
 inglés (Estados Unidos, alternate=shifted)

This output tend to reveal mistakes with tags, which is why I thought
to expose it as a NOTICE. It addresses the case of a user
who wouldn't suspect an error, so the "in-your-face" effect is
intentional. With the function approach, the user must be
proactive.

An example of mistake I found myself doing is forgetting the '-u-' before
the collation tags, which doesn't error out but is detected relatively
easily with the display name.

-- wrong
test=> select value from icu_collation_attributes('de-DE-ks-level1')  
       where attribute='displayname';
             value      
-----------------------------
 German (Germany, KS_LEVEL1)

-- right
test=> select value from icu_collation_attributes('de-DE-u-ks-level1')
       where attribute='displayname';
                 value      
---------------------------------------
 German (Germany, colstrength=primary)


[1] https://github.com/dverite/icu_ext#icu_collation_attributes


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite


Reply | Threaded
Open this post in threaded view
|

Re: Create collation reporting the ICU locale display name

Robert Haas
On Fri, Sep 13, 2019 at 9:57 AM Daniel Verite <[hidden email]> wrote:
> An advantage of this approach is that you may execute the
> function before creating the collation, instead of creating the
> collation, realizing there was something wrong in your
> locale/lc_collate argument, dropping the collation and trying again.

That would be really nice.

I also think that the documentation is entirely inadequate in this
area.  https://www.postgresql.org/docs/11/collation.html#COLLATION-CREATE
gives some examples, but those don't help you understand the general
principles very much, and it has some links to the ICU documentation,
which helps less than one might think. For example it links to
http://userguide.icu-project.org/locale which describes locales like
en_IE@currency=IEP and es__TRADITIONAL, but if you can figure out what
all the valid possibilities are by reading that page, you are much
smarter than me.  Then, too, according to the PostgreSQL documentation
you ought to prefer forms using the newer syntax, which looks like a
bunch of dash-separated things, e.g. de-u-co-phonebk. But neither the
PostgreSQL documentation itself nor either of the links to ICU
included there actually describe the rules for that syntax. They just
say things like 'use BCP-47', which doesn't help at all.  So I am just
reduced to trying a bunch of things and seeing which collations appear
to behave differently when I use them.

This proposal wouldn't fix the problem that I have to guess what
strings to use, but at least it might be clearer when I have or have
not guessed correctly.

I seriously hate this stuff with a fiery passion that cannot be
quenched. How does anybody manage to use software that seems to have
no usable documentation and doesn't even tell whether or not you
supplied it with input that it thinks is valid?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: Create collation reporting the ICU locale display name

Tom Lane-2
Robert Haas <[hidden email]> writes:
> On Fri, Sep 13, 2019 at 9:57 AM Daniel Verite <[hidden email]> wrote:
>> An advantage of this approach is that you may execute the
>> function before creating the collation, instead of creating the
>> collation, realizing there was something wrong in your
>> locale/lc_collate argument, dropping the collation and trying again.

> That would be really nice.

I think that's a useful function, but it's a different function from
the one first proposed, which was to tell you the properties of a
collation you already installed (which might not be ICU, even).
Perhaps we should have both.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Create collation reporting the ICU locale display name

Tom Lane-2
In reply to this post by Daniel Verite
"Daniel Verite" <[hidden email]> writes:
> This output tend to reveal mistakes with tags, which is why I thought
> to expose it as a NOTICE. It addresses the case of a user
> who wouldn't suspect an error, so the "in-your-face" effect is
> intentional. With the function approach, the user must be
> proactive.

That argument presupposes (a) manual execution of the creation query,
and (b) that the user pays close attention to the NOTICE output.
Unfortunately, I think our past over-use of notices has trained
people to ignore them.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Create collation reporting the ICU locale display name

Robert Haas
On Fri, Sep 13, 2019 at 11:12 AM Tom Lane <[hidden email]> wrote:
> That argument presupposes (a) manual execution of the creation query,
> and (b) that the user pays close attention to the NOTICE output.
> Unfortunately, I think our past over-use of notices has trained
> people to ignore them.

Our past overuse aside, it's just easy to ignore chatter. It often
happens to me that I realize 10 minutes after I did something that I
didn't look carefully enough at the output ... which is usually
followed by an attempt to scroll back through my terminal buffer to
find it. But after a few thousand lines of subsequent output that's
hard. So I like the idea of making the information available
on-demand, rather than only at creation time.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: Create collation reporting the ICU locale display name

Michael Paquier-2
In reply to this post by Tom Lane-2
On Fri, Sep 13, 2019 at 11:09:52AM -0400, Tom Lane wrote:
> I think that's a useful function, but it's a different function from
> the one first proposed, which was to tell you the properties of a
> collation you already installed (which might not be ICU, even).
> Perhaps we should have both.

Perhaps.  Having a default description for the collations imported by
initdb is nice to have, but because of the gap with collations defined
after initialization it seems to me that there is an argument to
switch to that function for psql instead of grepping the default
description added to pg_description.  Enforcing a comment for a
collation manually created based on what libicu tells us does not
feel right either, as we don't enforce a comment for the creation of
other objects.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Create collation reporting the ICU locale display name

Daniel Verite
In reply to this post by Tom Lane-2
        Tom Lane wrote:

> I think that's a useful function, but it's a different function from
> the one first proposed, which was to tell you the properties of a
> collation you already installed (which might not be ICU, even).
> Perhaps we should have both.

The pre-create use case would look  like:
 SELECT * FROM describe_collation(locale_string text, collprovider "char")

Post-creation, one could do:
 SELECT * FROM describe_collation(collcollate, collprovider)
   FROM pg_catalog.pg_collation WHERE oid = :OID;

Possibly it could exists as SELECT * FROM describe_collation(oid)
but that's essentially the same function.
Or I'm missing something about why we'd need two functions.


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite


Reply | Threaded
Open this post in threaded view
|

Re: Create collation reporting the ICU locale display name

Daniel Verite
In reply to this post by Tom Lane-2
        Tom Lane wrote:

> > This output tend to reveal mistakes with tags, which is why I thought
> > to expose it as a NOTICE. It addresses the case of a user
> > who wouldn't suspect an error, so the "in-your-face" effect is
> > intentional. With the function approach, the user must be
> > proactive.
>
> That argument presupposes (a) manual execution of the creation query,
> and (b) that the user pays close attention to the NOTICE output.
> Unfortunately, I think our past over-use of notices has trained
> people to ignore them.

What about DEBUG1 as the level?
Surely we can draw a line somewhere beyond which the benefit of
getting that information surpasses the annoyance factor that
you're foreseeing?


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite


Reply | Threaded
Open this post in threaded view
|

Re: Create collation reporting the ICU locale display name

Tom Lane-2
In reply to this post by Daniel Verite
"Daniel Verite" <[hidden email]> writes:
> Tom Lane wrote:
>> I think that's a useful function, but it's a different function from
>> the one first proposed, which was to tell you the properties of a
>> collation you already installed (which might not be ICU, even).
>> Perhaps we should have both.

> The pre-create use case would look  like:
>  SELECT * FROM describe_collation(locale_string text, collprovider "char")

> Post-creation, one could do:
>  SELECT * FROM describe_collation(collcollate, collprovider)
>    FROM pg_catalog.pg_collation WHERE oid = :OID;

> Possibly it could exists as SELECT * FROM describe_collation(oid)
> but that's essentially the same function.

The advantage of describe_collation(oid) is that we would not be
building knowledge into the callers about which columns of pg_collation
matter for this purpose.  I'm not even convinced that the two you posit
here are sufficient --- the encoding seems relevant, for instance.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Create collation reporting the ICU locale display name

Peter Geoghegan-4
On Sat, Sep 14, 2019 at 8:13 AM Tom Lane <[hidden email]> wrote:
> The advantage of describe_collation(oid) is that we would not be
> building knowledge into the callers about which columns of pg_collation
> matter for this purpose.  I'm not even convinced that the two you posit
> here are sufficient --- the encoding seems relevant, for instance.

+1. It seems like a good idea to consider the ICU display name to be
just that -- a display name. It should be considered a dynamic thing.
For one thing, it is subject to localization, so it isn't fixed even
when nothing changes internally. But there is also the question of
external changes. Internationalization is inherently a squishy
business.

I believe that the main goal of BCP 47 (i.e. ICU's CREATE COLLATION
locale strings) is to fail gracefully when cultural or political
developments occur that change the expectations of users. BCP 47 is
actually an IETF standard -- it's not from the Unicode consortium, or
from ICU. It is supposed to be highly forgiving -- this is a feature,
not a bug. Of course, many facets of a locale control things that we
don't care about, or at least don't involve ICU with. For example,
locale controls the default currency symbol.

There are pg_upgrade scenarios in which the display string for a
collation will legitimately change due to external changes. For
example, somebody that lived in Serbia and Montenegro (a country which
ceased to exist in 2006) could have used a locale string with "cs" (an
ISO 3166-1 code), which has been deprecated [1]. If memory serves,
there is a 5 year grace period codified by some ISO standard or other,
so recent ICU versions know nothing about Serbia and Montenegro
specifically. But they'll still recognize the Serbian language code,
as well as language codes for minority languages spoken in Serbia and
Montenegro. So, for the most part, the impact of sticking with this
old/somewhat inaccurate locale definition string is minimal.
(Actually, maybe downgrade scenarios are more interesting in
practice.)

[1] https://en.wikipedia.org/wiki/ISO_3166-2:CS#Codes_deleted_in_Newsletter_I-8
--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: Create collation reporting the ICU locale display name

Michael Paquier-2
In reply to this post by Daniel Verite
On Sat, Sep 14, 2019 at 03:51:03PM +0200, Daniel Verite wrote:
> What about DEBUG1 as the level?
> Surely we can draw a line somewhere beyond which the benefit of
> getting that information surpasses the annoyance factor that
> you're foreseeing?

DEBUG1 is even more chatty.  I agree with the others that making only
this information available at creation time is a no-go.
--
Michael

signature.asc (849 bytes) Download Attachment