Facing issue in using special characters

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Facing issue in using special characters

M Tarkeshwar Rao

Hi all,

 

Facing issue in using special characters. We are trying to insert records to a remote Postgres Server and our application not able to perform this because of errors.

It seems that issue is because of the special characters that has been used in one of the field of a row.

 

Regards

Tarkeshwar

Reply | Threaded
Open this post in threaded view
|

Re: Facing issue in using special characters

David G Johnston
On Thursday, March 14, 2019, M Tarkeshwar Rao <[hidden email]> wrote:

Facing issue in using special characters. We are trying to insert records to a remote Postgres Server and our application not able to perform this because of errors.

It seems that issue is because of the special characters that has been used in one of the field of a row.


Emailing -general ONLY is both sufficient and polite.  Providing more detail, and ideally an example, is necessary.

David J.
Reply | Threaded
Open this post in threaded view
|

Re: Facing issue in using special characters

Chapman Flack
In reply to this post by M Tarkeshwar Rao
On 3/15/19 11:59 AM, Gunther wrote:
> This is not an issue for "hackers" nor "performance" in fact even for
> "general" it isn't really an issue.

As long as it's already been posted, may as well make it something
helpful to find in the archive.

> Understand charsets -- character set, code point, and encoding. Then
> understand how encoding and string literals and "escape sequences" in
> string literals might work.

Good advice for sure.

> Know that UNICODE today is the one standard, and there is no more need

I wasn't sure from the question whether the original poster was in
a position to choose the encoding of the database. Lots of things are
easier if it can be set to UTF-8 these days, but perhaps it's a legacy
situation.

Maybe a good start would be to go do

  SHOW server_encoding;
  SHOW client_encoding;

and then hit the internet and look up what that encoding (or those
encodings, if different) can and can't represent, and go from there.

It's worth knowing that, when the server encoding isn't UTF-8,
PostgreSQL will have the obvious limitations entailed by that,
but also some non-obvious ones that may be surprising, e.g. [1].

-Chap


[1]
https://www.postgresql.org/message-id/CA%2BTgmobUp8Q-wcjaKvV%3DsbDcziJoUUvBCB8m%2B_xhgOV4DjiA1A%40mail.gmail.com

Reply | Threaded
Open this post in threaded view
|

Re: Facing issue in using special characters

Warner, Gary, Jr
In reply to this post by M Tarkeshwar Rao
Many of us have faced character encoding issues because we are not in control of our input sources and made the common assumption that UTF-8 covers everything.

In my lab, as an example, some of our social media posts have included ZawGyi Burmese character sets rather than Unicode Burmese.  (Because Myanmar developed technology In a closed to the world environment, they made up their own non-standard character set which is very common still in Mobile phones.). We had fully tested the app with Unicode Burmese, but honestly didn’t know ZawGyi was even a thing that we would see in our dataset.  We’ve also had problems with non-Unicode word separators in Arabic.

What we’ve found to be helpful is to view the troubling code in a hex editor and determine what non-standard characters may be causing the problem.

It may be some data conversion is necessary before insertion. But the first step is knowing WHICH characters are causing the issue.

Reply | Threaded
Open this post in threaded view
|

Re: Facing issue in using special characters

Peter J. Holzer
On 2019-03-17 15:01:40 +0000, Warner, Gary, Jr wrote:
> Many of us have faced character encoding issues because we are not in control
> of our input sources and made the common assumption that UTF-8 covers
> everything.

UTF-8 covers "everything" in the sense that there is a round-trip from
each character in every commonly-used charset/encoding to Unicode and
back.

The actual code may of course be different. For example, the € sign is
0xA4 in iso-8859-15, but U+20AC in Unicode. So you need an
encoding/decoding step.

And "commonly-used" means just that. Unicode covers a lot of character
sets, but it can't cover every character set ever invented (I invented
my own character sets when I was sixteen. Nobody except me ever used
them and they have long succumbed to bit rot).

> In my lab, as an example, some of our social media posts have included ZawGyi
> Burmese character sets rather than Unicode Burmese.  (Because Myanmar developed
> technology In a closed to the world environment, they made up their own
> non-standard character set which is very common still in Mobile phones.).

I'd be surprised if there was a character set which is "very common in
Mobile phones", even in a relatively poor country like Myanmar. Does
ZawGyi actually include characters which aren't in Unicode are are they
just encoded differently?

        hp

--
   _  | Peter J. Holzer    | we build much bigger, better disasters now
|_|_) |                    | because we have much more sophisticated
| |   | [hidden email]         | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>

signature.asc (849 bytes) Download Attachment