Collation versions on Windows (help wanted, apply within)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Collation versions on Windows (help wanted, apply within)

Thomas Munro-5
Hello hackers,

Here's a draft patch that teaches PostgreSQL how to ask for collation
versions on Windows.  It receives a pair of DWORDs, which, it displays
as hex.  The values probably have an internal structure that is
displayed in a user-friendly way by software like Active Directory and
SQL Server (I'm pretty sure they both track collation versions and
reindex), but I don't know.  It is based on the documentation at:

https://docs.microsoft.com/en-us/windows/win32/win7appqual/nls-sorting-changes

My understanding of our OS and tool chain version strategy on that
platform is limited, but it looks like we only allow ourselves to use
Vista (and later) APIs if the compiler is Visual Studio 2015 (aka
14.0) or later.  So I tested that this builds cleanly on AppVeyor
using that compiler (see attached CI patch).  The regression tests
failed with Windows error code 87 before I added in the check to skip
"C" and "POSIX", so I know the new code is reached.  I don't have an
environment to test it beyond that.

The reason for returning an empty string for "C" and "POSIX" is the
following comment for get_collation_actual_version():

 * A particular provider must always either return a non-NULL string or return
 * NULL (if it doesn't support versions).  It must not return NULL for some
 * collcollate and not NULL for others.

I'm not sure why, or if that really makes sense.

Do any Windows hackers want to help get it into shape?  Some things to
do: test it, verify that the _WIN32_WINNT >= 0x0600 stuff makes sense
(why do we target such ancient Windows releases anyway?), see if there
is way we could use GetNLSVersion() (no "Ex") to make this work on
older Windows system, check if it makes sense to assume that
collcollate is encoded with CP_ACP ("the system default Windows ANSI
code page", used elsewhere in the PG source tree for a similar
purpose, but this seems likely to go wrong for locale names that have
non-ASCII characters, and indeed we see complaints on the lists
involving the word "Bokmål"), and recommend a better way to display
the collation version as text.  I'll add this to the next commitfest
to attract some eyeballs (but note that when cfbot compiles it, it
will be using an older tool chain and Win32 target, so the new code
will be ifdef'd out and regression test success means nothing).

To test that it works, you'd need to look at the contents of
pg_collation to confirm that you see the new version strings, create
an index on a column that explicitly uses a collation that has a
version, update the pg_collation table by hand to have a to a
different value, and then open a new session and to access the index
to check that you get a warning about the version changing.  The
warning can be cleared by using ALTER COLLATION ... REFRESH VERSION.

0001-Add-collation-versions-for-Windows.patch (2K) Download Attachment
0002-CI.patch (3K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Collation versions on Windows (help wanted, apply within)

Juan José Santamaría Flecha

On Fri, Nov 8, 2019 at 12:44 AM Thomas Munro <[hidden email]> wrote:

Do any Windows hackers want to help get it into shape?  Some things to
do: test it, verify that the _WIN32_WINNT >= 0x0600 stuff makes sense
(why do we target such ancient Windows releases anyway?)

You have to keep in mind that  _WIN32_WINNT also applies to MinGW, so any build with those tools will use a value of 0x0501 and this code will be ifdef'd out.

As from where this value comes, my take is that it has not been revised in a long time [1]. Windows 7 , Server 2008 and 2008 R2 support will end next year [2] [3], maybe you can make a case for updating this value.
 
see if there is way we could use GetNLSVersion() (no "Ex") to make this work on
older Windows system

Older systems is just Windows Server 2003, not sure if it is worth any effort.
 
check if it makes sense to assume that
collcollate is encoded with CP_ACP ("the system default Windows ANSI
code page", used elsewhere in the PG source tree for a similar
purpose, but this seems likely to go wrong for locale names that have
non-ASCII characters, and indeed we see complaints on the lists
involving the word "Bokmål"), and recommend a better way to display
the collation version as text.

The GetNLSVersionEx() function uses a "Language tag" value, check Language Code Identifier (LCID) [4], and these tags are plain ascii.
 
To test that it works, you'd need to look at the contents of
pg_collation to confirm that you see the new version strings, create
an index on a column that explicitly uses a collation that has a
version, update the pg_collation table by hand to have a to a
different value, and then open a new session and to access the index
to check that you get a warning about the version changing.  The
warning can be cleared by using ALTER COLLATION ... REFRESH VERSION.

The code works as expected with this collation:

postgres=# CREATE COLLATION en_US (LC_COLLATE = 'en-US', LC_CTYPE = 'en-US');
CREATE COLLATION
postgres=# select * from pg_collation;
  oid  | collname  | collnamespace | collowner | collprovider | collisdeterministic | collencoding | collcollate | collctype | collversion
-------+-----------+---------------+-----------+--------------+---------------------+--------------+-------------+-----------+-------------
   100 | default   |            11 |        10 | d            | t                   |           -1 |             |           |
   950 | C         |            11 |        10 | c            | t                   |           -1 | C           | C         |
   951 | POSIX     |            11 |        10 | c            | t                   |           -1 | POSIX       | POSIX     |
 12326 | ucs_basic |            11 |        10 | c            | t                   |            6 | C           | C         |
 16387 | en_us     |          2200 |        10 | c            | t                   |           24 | en-US       | en-US     | 6020f,6020f
(5 rows)

The error code 87 is an ERROR_INVALID_PARAMETER that is raised when the collate input does not match a valid tag, I would suggest not returning it directly.

Regards,

Juan José Santamaría Flecha 


Regards,

Juan José Santamaría Flecha 
Reply | Threaded
Open this post in threaded view
|

Re: Collation versions on Windows (help wanted, apply within)

Juan José Santamaría Flecha
In reply to this post by Thomas Munro-5

On Fri, Nov 8, 2019 at 12:44 AM Thomas Munro <[hidden email]> wrote:

recommend a better way to display the collation version as text.


There is a major and a minor version. The attached patch applies on top the previous.

Regards,

Juan José Santamaría Flecha 

0001-Add-collation-versions-for-Windows-with-major-minor.patch (1018 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Collation versions on Windows (help wanted, apply within)

Thomas Munro-5
On Sat, Nov 9, 2019 at 10:20 AM Juan José Santamaría Flecha
<[hidden email]> wrote:
>> Do any Windows hackers want to help get it into shape?  Some things to
>> do: test it, verify that the _WIN32_WINNT >= 0x0600 stuff makes sense
>> (why do we target such ancient Windows releases anyway?)
>
> You have to keep in mind that  _WIN32_WINNT also applies to MinGW, so any build with those tools will use a value of 0x0501 and this code will be ifdef'd out.
>
> As from where this value comes, my take is that it has not been revised in a long time [1]. Windows 7 , Server 2008 and 2008 R2 support will end next year [2] [3], maybe you can make a case for updating this value.

Ah, I see, thanks.  I think what I have is OK for now then.  If
someone else who is closer to the matter wants to argue that we should
always target Vista+ (for example on MinGW) in order to access this
functionality, I'll let them do that separately.

>> see if there is way we could use GetNLSVersion() (no "Ex") to make this work on
>> older Windows system
>
> Older systems is just Windows Server 2003, not sure if it is worth any effort.

Cool.  Nothing to do here then.

>  16387 | en_us     |          2200 |        10 | c            | t                   |           24 | en-US       | en-US     | 6020f,6020f

Thanks for testing!

On Sat, Nov 9, 2019 at 11:04 PM Juan José Santamaría Flecha
<[hidden email]> wrote:
> There is a major and a minor version. The attached patch applies on top the previous.

Perfect.  I've merged this into the patch.

It's interesting that minor version changes mean no order changed but
new code points were added; that must be useful if your system
prevents you from using code points before you add them, I guess (?).
I don't understand the difference between the NLS and "defined"
versions, but at this stage I don't think we can try to be too fancy
here, think we're just going to have to assume we need both of them
and treat this the same way across all providers: if it moves, reindex
it.

0001-Add-collation-versions-for-Windows-v2.patch (2K) Download Attachment