The following bug has been logged online: Bug reference: 1721 Logged by: Chii-Tung Liu Email address: [hidden email] PostgreSQL version: 8.0.3 Operating system: Windows XP SP2 Description: mutiple bytes character string comaprison error Details: When compare two UTF-8 encoded string that contains Chinese words, the result is always TRUE 1. create a database test with encoding set to unicode CREATE DATABASE test WITH OWNER = postgres ENCODING = 'UNICODE' TABLESPACE = pg_default; 2. insert data with Chinese words INSERT into node set title='1 ??????' 3. SELECT title from node where title > '1.1 ' would return '1 ??????' 4. Both SELECT '1 ??????' > '1.1' and SELECT '1.1' > '1 ??????' return FALSE ---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
"Chii-Tung Liu" <[hidden email]> writes:
> PostgreSQL version: 8.0.3 > Operating system: Windows XP SP2 > When compare two UTF-8 encoded string that contains Chinese words, the > result is always TRUE Sorry, but UTF-8 encoding doesn't work properly on Windows (yet). Use some other database encoding. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster |
On Sun, 19 Jun 2005, Tom Lane wrote: > "Chii-Tung Liu" <[hidden email]> writes: > > PostgreSQL version: 8.0.3 > > Operating system: Windows XP SP2 > > > When compare two UTF-8 encoded string that contains Chinese words, the > > result is always TRUE > > Sorry, but UTF-8 encoding doesn't work properly on Windows (yet). > Use some other database encoding. > Shouldn't we forbid its creation then? At least a strongly worded warning? We see these complaints too often. Kris Jurka ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [hidden email]) |
Kris Jurka <[hidden email]> writes:
> On Sun, 19 Jun 2005, Tom Lane wrote: >> Sorry, but UTF-8 encoding doesn't work properly on Windows (yet). >> Use some other database encoding. > Shouldn't we forbid its creation then? There was serious discussion of that before the 8.0 release, but we decided not to forbid it. Check the archives; I don't recall the reasoning at the moment. > We see these complaints too often. There are lots of complaints we see way too often ;-) ... but distressingly, there are still only 24 hours in a day. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to [hidden email] |
In reply to this post by Chii-Tung Liu
> The following bug has been logged online:
> > Bug reference: 1721 > Logged by: Chii-Tung Liu > Email address: [hidden email] > PostgreSQL version: 8.0.3 > Operating system: Windows XP SP2 > Description: mutiple bytes character string comaprison error > Details: > > When compare two UTF-8 encoded string that contains Chinese words, the > result is always TRUE > 1. create a database test with encoding set to unicode > CREATE DATABASE test > WITH OWNER = postgres > ENCODING = 'UNICODE' > TABLESPACE = pg_default; > 2. insert data with Chinese words > INSERT into node set title='1 中文' > > 3. SELECT title from node where title > '1.1 ' > would return '1 中文' > > 4. Both SELECT '1 中文' > '1.1' and SELECT '1.1' > '1 中文' return > FALSE I think you need to use C locale. -- Tatsuo Ishii ---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [hidden email] so that your message can get through to the mailing list cleanly |
In reply to this post by Tom Lane-2
Tom Lane wrote:
> Kris Jurka <[hidden email]> writes: > > On Sun, 19 Jun 2005, Tom Lane wrote: > >> Sorry, but UTF-8 encoding doesn't work properly on Windows (yet). > >> Use some other database encoding. > > > Shouldn't we forbid its creation then? > > There was serious discussion of that before the 8.0 release, but > we decided not to forbid it. Check the archives; I don't recall > the reasoning at the moment. UTF8 encoding works with the C locale assuming you don't care about ordering of the character set, e.g. Japanese. -- Bruce Momjian | http://candle.pha.pa.us [hidden email] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster |
In reply to this post by Tom Lane-2
>
> UTF8 encoding works with the C locale assuming you don't care > about ordering of the character set, e.g. Japanese. > Has anyone with the ability to compile postgresql on windows tested the ICU patch? ... John ---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
In reply to this post by Tom Lane-2
> > UTF8 encoding works with the C locale assuming you don't care about
> > ordering of the character set, e.g. Japanese. > > > > Has anyone with the ability to compile postgresql on windows > tested the ICU patch? Yes. See http://archives.postgresql.org/pgsql-hackers/2005-05/msg00662.php //Magnus ---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
In reply to this post by Bruce Momjian-2
> Tom Lane wrote:
> > Kris Jurka <[hidden email]> writes: > > > On Sun, 19 Jun 2005, Tom Lane wrote: > > >> Sorry, but UTF-8 encoding doesn't work properly on Windows (yet). > > >> Use some other database encoding. > > > > > Shouldn't we forbid its creation then? > > > > There was serious discussion of that before the 8.0 release, but > > we decided not to forbid it. Check the archives; I don't recall > > the reasoning at the moment. > > UTF8 encoding works with the C locale assuming you don't care about > ordering of the character set, e.g. Japanese. No, sometimes Japanese needs char ordering too and I think this is not a Windows only problem. The real problem is Unicode defines char orderes in totally random manner because Chinese/Japanese/Korean Kanji characters are "Unified" in Unicode. To solve the problem, we can use convert UTF8 to EUC_JP using CONVERT. See archives for more details. Or you can use Unicode locale only if your platform's locale database is not broken and you only use single locale. -- Tatsuo Ishii ---------------------------(end of broadcast)--------------------------- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
Free forum by Nabble | Edit this page |