BUG #15373: null / utf-8

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

BUG #15373: null / utf-8

PG Bug reporting form
The following bug has been logged on the website:

Bug reference:      15373
Logged by:          Eric Hansen
Email address:      [hidden email]
PostgreSQL version: 10.5
Operating system:   ubu 16
Description:        

doesn’t support converting NULL bytes to UTF-8 and therefore returns an
error

Reply | Threaded
Open this post in threaded view
|

Re: BUG #15373: null / utf-8

Bruce Momjian
On Sat, Sep  8, 2018 at 05:24:17PM +0000, PG Bug reporting form wrote:

> The following bug has been logged on the website:
>
> Bug reference:      15373
> Logged by:          Eric Hansen
> Email address:      [hidden email]
> PostgreSQL version: 10.5
> Operating system:   ubu 16
> Description:        
>
> doesn’t support converting NULL bytes to UTF-8 and therefore returns an
> error

You need to encode binary values containing nulls as bytea strings.

--
  Bruce Momjian  <[hidden email]>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

Reply | Threaded
Open this post in threaded view
|

AW: BUG #15373: null / utf-8

André Hänsel
Bruce Momjian wrote:
 
> On Sat, Sep  8, 2018 at 05:24:17PM +0000, PG Bug reporting form wrote:
>> doesn’t support converting NULL bytes to UTF-8 and therefore returns an
>> error

> You need to encode binary values containing nulls as bytea strings.

Actually this reminds me to log a documentation bug, or rather improvement.

It is already slightly confusing that Postgres doesn't support null bytes
in text columns at all and in bytea columns cannot take them in as a normal
string literal. I'm assuming this is traditional and difficult to change.

Now, there are two formats in which bytea column content can be specified,
the "hex" and the deprecated "escape" format.

This "escape" format is called that way because it allows escape sequences
which are un-escaped before the data is written to the column. This step is
comparable to the parsing of JSON for jsonb columns or the parsing of array
syntax for any array type.

However, during the parsing of an SQL there is a second, completely
different, un-escaping step going on, the un-escaping of C-style escapes in
string literals starting with an "E", *if* you start the string literal
with an "E".

For some reason the example given in the documentation for the "hex"
format uses such an "escape string literal":

SELECT E'\\xDEADBEEF';

I found this very confusing. Can this example be changed to a normal
string literal, like this?

SELECT '\xDEADBEEF';

Regards, Andre


Reply | Threaded
Open this post in threaded view
|

Re: BUG #15373: null / utf-8

Bruce Momjian
On Sat, Sep  8, 2018 at 09:43:18PM +0200, André Hänsel wrote:

> Bruce Momjian wrote:
>  
> > On Sat, Sep  8, 2018 at 05:24:17PM +0000, PG Bug reporting form wrote:
> >> doesn’t support converting NULL bytes to UTF-8 and therefore returns an
> >> error
>
> > You need to encode binary values containing nulls as bytea strings.
>
> Actually this reminds me to log a documentation bug, or rather improvement.
>
> It is already slightly confusing that Postgres doesn't support null bytes
> in text columns at all and in bytea columns cannot take them in as a normal
> string literal. I'm assuming this is traditional and difficult to change.
>
> Now, there are two formats in which bytea column content can be specified,
> the "hex" and the deprecated "escape" format.
>
> This "escape" format is called that way because it allows escape sequences
> which are un-escaped before the data is written to the column. This step is
> comparable to the parsing of JSON for jsonb columns or the parsing of array
> syntax for any array type.
>
> However, during the parsing of an SQL there is a second, completely
> different, un-escaping step going on, the un-escaping of C-style escapes in
> string literals starting with an "E", *if* you start the string literal
> with an "E".
>
> For some reason the example given in the documentation for the "hex"
> format uses such an "escape string literal":
>
> SELECT E'\\xDEADBEEF';
>
> I found this very confusing. Can this example be changed to a normal
> string literal, like this?
>
> SELECT '\xDEADBEEF';

You know, I 100% agree with you.  We used the E'' syntax so we would
produce the same results whether standard_conforming_strings was true or
false.  However, we changed the standard_conforming_strings default to
true in Postgres 9.1 on 2011-09-12, and that release has been
end-of-life for a year.

I think it is time to clarify our documentation examples by assuming
that standard_conforming_strings is true.  I will work on a patch.
Thanks.

--
  Bruce Momjian  <[hidden email]>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

Reply | Threaded
Open this post in threaded view
|

Re: BUG #15373: null / utf-8

Bruce Momjian
On Sat, Sep  8, 2018 at 04:36:19PM -0400, 'Bruce Momjian' wrote:

> On Sat, Sep  8, 2018 at 09:43:18PM +0200, André Hänsel wrote:
> > For some reason the example given in the documentation for the "hex"
> > format uses such an "escape string literal":
> >
> > SELECT E'\\xDEADBEEF';
> >
> > I found this very confusing. Can this example be changed to a normal
> > string literal, like this?
> >
> > SELECT '\xDEADBEEF';
>
> You know, I 100% agree with you.  We used the E'' syntax so we would
> produce the same results whether standard_conforming_strings was true or
> false.  However, we changed the standard_conforming_strings default to
> true in Postgres 9.1 on 2011-09-12, and that release has been
> end-of-life for a year.
>
> I think it is time to clarify our documentation examples by assuming
> that standard_conforming_strings is true.  I will work on a patch.
> Thanks.
I have developed the attached documentation patch to remove the E''
syntax.  standard_conforming_strings defaulted to 'on' in PG 9.1.

I also found that our bytea data type section wasn't properly adjusted
when we changed the the default bytea_output to 'hex' in PG 9.0.

I think the only question is how far back to apply this patch.  I am
thinking through 9.3.

--
  Bruce Momjian  <[hidden email]>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

escape.diff (19K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

AW: BUG #15373: null / utf-8

André Hänsel
Bruce Momjian wrote:
> I have developed the attached documentation patch to remove the E''
> syntax.  standard_conforming_strings defaulted to 'on' in PG 9.1.

On https://www.postgresql.org/docs/current/static/arrays.html, there
is a note at the bottom with an example currently using the C-style
escape syntax. Your patch changes that, as agreed, but then the text
of the note, as it is at the moment, doesn't make sense anymore
because the note is about the doubling of backslashes that you have
with C-style escape string literals.


Reply | Threaded
Open this post in threaded view
|

Re: BUG #15373: null / utf-8

Bruce Momjian
On Fri, Sep 14, 2018 at 10:15:48AM +0200, André Hänsel wrote:

> Bruce Momjian wrote:
> > I have developed the attached documentation patch to remove the E''
> > syntax.  standard_conforming_strings defaulted to 'on' in PG 9.1.
>
> On https://www.postgresql.org/docs/current/static/arrays.html, there
> is a note at the bottom with an example currently using the C-style
> escape syntax. Your patch changes that, as agreed, but then the text
> of the note, as it is at the moment, doesn't make sense anymore
> because the note is about the doubling of backslashes that you have
> with C-style escape string literals.
Wow, good you saw that.  I removed that entire paragraph since there is
no longer double-escape evaluation by default.  Updated patch attached.

--
  Bruce Momjian  <[hidden email]>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

escape.diff (20K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: BUG #15373: null / utf-8

Bruce Momjian
On Fri, Sep 14, 2018 at 03:49:17PM -0400, 'Bruce Momjian' wrote:

> On Fri, Sep 14, 2018 at 10:15:48AM +0200, André Hänsel wrote:
> > Bruce Momjian wrote:
> > > I have developed the attached documentation patch to remove the E''
> > > syntax.  standard_conforming_strings defaulted to 'on' in PG 9.1.
> >
> > On https://www.postgresql.org/docs/current/static/arrays.html, there
> > is a note at the bottom with an example currently using the C-style
> > escape syntax. Your patch changes that, as agreed, but then the text
> > of the note, as it is at the moment, doesn't make sense anymore
> > because the note is about the doubling of backslashes that you have
> > with C-style escape string literals.
>
> Wow, good you saw that.  I removed that entire paragraph since there is
> no longer double-escape evaluation by default.  Updated patch attached.

Patch applied through 9.3.  Thanks for the report.

--
  Bruce Momjian  <[hidden email]>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +