UTF-8 collation on Windows?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
28 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: UTF-8 collation on Windows?

Dev Kumkar
On Thu, Feb 20, 2014 at 4:34 PM, Daniel Verite <[hidden email]> wrote:
Despite windows-1252 being a monobyte encoding sharing most
of LATIN1 codes and character set, it does not mean that
English_United States.1252 is limited to this character set.
You may use UTF-8 databases with that locale.

Consider the 2nd paragraph of  "Character Set Support"
in the doc:
http://www.postgresql.org/docs/current/static/multibyte.html

    "For C or POSIX locale, any character set is allowed, but for other
     locales there is only one character set that will work
     correctly. (On Windows, however, UTF-8 encoding can be used with
     any locale.)"

This is a key difference with Unix when choosing a locale.

As for getting the exact same sort order than Linux, it's not possible but
that's not a Windows-vs-Unix issue. If you used FreeBSD or MacOS X, some
en_US.UTF-8 collation rules  would differ from Linux's libc too, resulting in
a different sort order for certain strings.

There is no issue of using windows-1252 with utf8 database. The point of discussion here is sorting order and windows code page for utf8?
The links http://msdn.microsoft.com/en-us/library/dd317756%28VS.85%29.aspx which I provided earlier has those code pages but creating database with these code pages fail.

Well overall with the discussion so far and whatever search I could over net/community it looks like there is no code page on windows corresponding to what is utf8 of linux. If there is then please let me know?

Conclusion: I have basically decided to have the database encoding UTF8 on both windows and linux. And then set the collation to 'C'.
At least my customers on linux and windows sees the same behavior when sorting. Any gotchas here?

Regards...
Reply | Threaded
Open this post in threaded view
|

Re: UTF-8 collation on Windows?

Dev Kumkar
In reply to this post by Gavin Flower-2
On Thu, Feb 20, 2014 at 3:04 AM, Gavin Flower <[hidden email]> wrote:
Upgrade servers to Linux?  :-P

Actually that's not the solution but running away from it.
There is a heavy footprint of customers and huge market on windows too and so not that easy to migrate and convince in market. 

Regards...
Reply | Threaded
Open this post in threaded view
|

Re: UTF-8 collation on Windows?

Gavin Flower-2
On 21/02/14 02:04, Dev Kumkar wrote:
On Thu, Feb 20, 2014 at 3:04 AM, Gavin Flower <[hidden email]> wrote:
Upgrade servers to Linux?  :-P

Actually that's not the solution but running away from it.
There is a heavy footprint of customers and huge market on windows too and so not that easy to migrate and convince in market. 

Regards...

I am aware of the heavy presence of Microsoft in the market place and the huge inertia of Microsoft dominated companies (even where management would like to change), hence I was not trying to push upgrading to Linux too strongly for this particular situation - more light hearted exasperation!

None-the-less there are more and more companies making that move - as there are a whole raft of very good reasons to do so.

If the sole reason for going to Linux was the collation problem, then it would probably be considered by most people to be a silly reason.


Cheers,
Gavin


P.S.  Once a Senior Systems Analyst left the company I was working for to become the DP manager of an IT department.  I spoke to him shortly after and he said it was an 'IBM shop' - about ten years later he was at the same place, but he now said it was a 'Microsoft shop'.  The dominant technology that appears set in stone, does eventually change despite the market inertia – my first commercial languages were FORTRAN & COBOL on minicomputers & IBM style mainframes, now I use Java on Linux.

Reply | Threaded
Open this post in threaded view
|

Re: UTF-8 collation on Windows?

Dev Kumkar
On Fri, Feb 21, 2014 at 12:14 AM, Gavin Flower <[hidden email]> wrote:
On 21/02/14 02:04, Dev Kumkar wrote:
On Thu, Feb 20, 2014 at 3:04 AM, Gavin Flower <[hidden email]> wrote:
Upgrade servers to Linux?  :-P

Actually that's not the solution but running away from it.
There is a heavy footprint of customers and huge market on windows too and so not that easy to migrate and convince in market. 

Regards...

I am aware of the heavy presence of Microsoft in the market place and the huge inertia of Microsoft dominated companies (even where management would like to change), hence I was not trying to push upgrading to Linux too strongly for this particular situation - more light hearted exasperation!

None-the-less there are more and more companies making that move - as there are a whole raft of very good reasons to do so.

If the sole reason for going to Linux was the collation problem, then it would probably be considered by most people to be a silly reason.


Cheers,
Gavin


P.S.  Once a Senior Systems Analyst left the company I was working for to become the DP manager of an IT department.  I spoke to him shortly after and he said it was an 'IBM shop' - about ten years later he was at the same place, but he now said it was a 'Microsoft shop'.  The dominant technology that appears set in stone, does eventually change despite the market inertia – my first commercial languages were FORTRAN & COBOL on minicomputers & IBM style mainframes, now I use Java on Linux.


Hmm. Don't want to digress here and loose the topic context.
Here would really appreciate if there are any suggestions for UTF-8 collation on Windows?

Regards...
Reply | Threaded
Open this post in threaded view
|

Re: UTF-8 collation on Windows?

Adrian Klaver-4
On 02/20/2014 11:40 AM, Dev Kumkar wrote:

>
>
> Hmm. Don't want to digress here and loose the topic context.
> Here would really appreciate if there are any suggestions for UTF-8
> collation on Windows?

Well I dug out a Windows machine and tried to get what you wanted, to no
avail. As far as I know there is no UTF8 collation, it is an encoding.
What you want if I am following, is the en_US locale (or equivalent for
another language) on Windows. Anything I tried resolved back to a
Windows code page. So the answer from my tests, is no you cannot match
en_US on Windows.

>
> Regards...



--
Sent via pgsql-general mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Reply | Threaded
Open this post in threaded view
|

Re: UTF-8 collation on Windows?

Dev Kumkar
On Fri, Feb 21, 2014 at 1:26 AM, Adrian Klaver <[hidden email]> wrote:
Well I dug out a Windows machine and tried to get what you wanted, to no avail. As far as I know there is no UTF8 collation, it is an encoding. What you want if I am following, is the en_US locale (or equivalent for another language) on Windows. Anything I tried resolved back to a Windows code page. So the answer from my tests, is no you cannot match en_US on Windows.

Thanks for taking time out and looking into it !
Yes all the scenarios we tested didn't work for any of the utf8 code pages specified on MSDN or may be I don't know the correct representation of "language_territory.code".

Regards...
Reply | Threaded
Open this post in threaded view
|

Re: UTF-8 collation on Windows?

Adrian Klaver-4
In reply to this post by Dev Kumkar
On 02/20/2014 11:40 AM, Dev Kumkar wrote:

>
>
> Hmm. Don't want to digress here and loose the topic context.
> Here would really appreciate if there are any suggestions for UTF-8
> collation on Windows?

Just had idea, not sure how feasible it is in your situation though. Run
Postgres in a Linux VM on Windows machines.

>
> Regards...



--
Sent via pgsql-general mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Reply | Threaded
Open this post in threaded view
|

Re: UTF-8 collation on Windows?

Adrian Klaver-4
In reply to this post by Dev Kumkar
On 02/20/2014 12:27 PM, Dev Kumkar wrote:

> On Fri, Feb 21, 2014 at 1:26 AM, Adrian Klaver
> <[hidden email] <mailto:[hidden email]>> wrote:
>
>     Well I dug out a Windows machine and tried to get what you wanted,
>     to no avail. As far as I know there is no UTF8 collation, it is an
>     encoding. What you want if I am following, is the en_US locale (or
>     equivalent for another language) on Windows. Anything I tried
>     resolved back to a Windows code page. So the answer from my tests,
>     is no you cannot match en_US on Windows.
>
>
> Thanks for taking time out and looking into it !
> Yes all the scenarios we tested didn't work for any of the utf8 code
> pages specified on MSDN or may be I don't know the correct
> representation of "language_territory.code".

It seems to be more basic then that. Microsoft has its own locale
mechanism and you will always be redirected back to it.

>
> Regards...



--
Sent via pgsql-general mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
12