BUG #16277: xmlelement allows invalid XML characters when XML version is set to 1.0

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

BUG #16277: xmlelement allows invalid XML characters when XML version is set to 1.0

apt.postgresql.org Repository Update
The following bug has been logged on the website:

Bug reference:      16277
Logged by:          Andreas Lennartsson
Email address:      [hidden email]
PostgreSQL version: 10.7
Operating system:   Ubuntu
Description:        

The following example:
SELECT
  xmlroot (
     xmlelement (name "test", CHR(26))
  , version '1.0'
  )

Produces xml with the invalid ASCII character 26.

The documentation states:
Element content, if specified, will be formatted according to its data type.
If the content is itself of type xml, complex XML documents can be
constructed.
Content of other types will be formatted into valid XML character data. This
means in particular that the characters <, >, and & will be converted to
entities. Binary data (data type bytea) will be represented in base64 or hex
encoding, depending on the setting of the configuration parameter xmlbinary.
The particular behavior for individual data types is expected to evolve in
order to align the SQL and PostgreSQL data types with the XML Schema
specification, at which point a more precise description will appear.

Reply | Threaded
Open this post in threaded view
|

Re: BUG #16277: xmlelement allows invalid XML characters when XML version is set to 1.0

Tom Lane-2
PG Bug reporting form <[hidden email]> writes:
> The following example:
> SELECT
>   xmlroot (
>      xmlelement (name "test", CHR(26))
>   , version '1.0'
>   )

> Produces xml with the invalid ASCII character 26.

On what grounds do you call it invalid?  What other behavior
would you expect?

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: BUG #16277: xmlelement allows invalid XML characters when XML version is set to 1.0

Andreas Lennartsson
>On what grounds do you call it invalid?
Based on the valid control characters for XML 1.0 https://en.wikipedia.org/wiki/Valid_characters_in_XML

>What other behavior would you expect?
I would expect valid XML 1.0 to be generated on success.
If that is not possible I would expect an error.

Thanks,

Andreas 

On Tue, Feb 25, 2020 at 2:59 PM Tom Lane <[hidden email]> wrote:
PG Bug reporting form <[hidden email]> writes:
> The following example:
> SELECT
>   xmlroot (
>      xmlelement (name "test", CHR(26))
>   , version '1.0'
>   )

> Produces xml with the invalid ASCII character 26.

On what grounds do you call it invalid?  What other behavior
would you expect?

                        regards, tom lane
Reply | Threaded
Open this post in threaded view
|

Re: BUG #16277: xmlelement allows invalid XML characters when XML version is set to 1.0

Tom Lane-2
Andreas Lennartsson <[hidden email]> writes:
>> On what grounds do you call it invalid?

> Based on the valid control characters for XML 1.0
> https://en.wikipedia.org/wiki/Valid_characters_in_XML

Hm.  According to that, C0 control characters *are* legal in XML 1.1,
which would mean that to do this strictly correctly we'd have to
understand the differences between different XML versions, which we
don't --- and, as best I can tell in some quick testing, libxml2
doesn't either.  At least, it will happily take random values for the
document version.

xmlroot() just wraps the given XML text in a new outer <xml> declaration,
without any regard for whether the new version number allows or disallows
things that the possibly-implicit version would've allowed before.  That
seems of a piece with the generally cavalier treatment of the version
in the rest of xml.c, though.

TBH, it's unlikely that anyone is going to care about this enough
to fix it, even if you could get consensus that making the code
more strict was a good idea.  (Backwards compatibility would argue
against that, so I'm not sure such consensus would be easy to get.)
But if you're sufficiently excited about it, you could try submitting
a patch and see what happens.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: BUG #16277: xmlelement allows invalid XML characters when XML version is set to 1.0

Andreas Lennartsson
Thanks for the feedback. I get your point about backwards compatibility. Maybe update the documentation to make it clear what is going on?

On Tue, Feb 25, 2020 at 5:00 PM Tom Lane <[hidden email]> wrote:
Andreas Lennartsson <[hidden email]> writes:
>> On what grounds do you call it invalid?

> Based on the valid control characters for XML 1.0
> https://en.wikipedia.org/wiki/Valid_characters_in_XML

Hm.  According to that, C0 control characters *are* legal in XML 1.1,
which would mean that to do this strictly correctly we'd have to
understand the differences between different XML versions, which we
don't --- and, as best I can tell in some quick testing, libxml2
doesn't either.  At least, it will happily take random values for the
document version.

xmlroot() just wraps the given XML text in a new outer <xml> declaration,
without any regard for whether the new version number allows or disallows
things that the possibly-implicit version would've allowed before.  That
seems of a piece with the generally cavalier treatment of the version
in the rest of xml.c, though.

TBH, it's unlikely that anyone is going to care about this enough
to fix it, even if you could get consensus that making the code
more strict was a good idea.  (Backwards compatibility would argue
against that, so I'm not sure such consensus would be easy to get.)
But if you're sufficiently excited about it, you could try submitting
a patch and see what happens.

                        regards, tom lane