PostgreSQL vs SQL/XML Standards

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
112 messages Options
123456
Reply | Threaded
Open this post in threaded view
|

Re: PostgreSQL vs SQL/XML Standards

Thomas Kellerer
>> I have a access to too old 11.2 Oracle.  There I had to modify query
>> because there is not boolean type. I replaced bool by int, but I got a
>> error
>> ORA-19224:XPTY-004 .. expected node()*, got xs:string - it doesn't work
>> with/without string() wrappings.
>>
>The problem is in last line - the expression  "sale/@taxable = false()" is
>not valid on Oracle. Using string() wrapping is a issue, because it returns
">true", "false", but Oracle int doesn't accept it.

That line seems to be valid - but you need to pass an XMLTYPE value, not a
VARCHAR

https://dbfiddle.uk/?rdbms=oracle_11.2&fiddle=21cdf890a26e97fa8667b2d6a960bd33

As far as I can tell inside XQuery Oracle does support boolean, but not as a
return type





--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

Reply | Threaded
Open this post in threaded view
|

Re: PostgreSQL vs SQL/XML Standards

Pavel Stehule
In reply to this post by Pavel Stehule


po 29. 10. 2018 v 10:11 odesílatel Pavel Stehule <[hidden email]> napsal:
Hi

čt 25. 10. 2018 v 21:47 odesílatel Alvaro Herrera <[hidden email]> napsal:
On 2018-Oct-25, Pavel Stehule wrote:

> I am thinking so I can fix some issues related to XMLTABLE. Please, send me
> more examples and test cases.

Please see Markus Winand's patch that I referenced upthread.

here is a fix of some XMLTABLE mentioned issues.

this update allows cast boolean to numeric types from XPath expressions

Regards

Pavel


Regards

Pavel

 

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

xmltable-2.patch (3K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: PostgreSQL vs SQL/XML Standards

Chapman Flack
In reply to this post by Thomas Kellerer
On 10/29/18 6:40 AM, Thomas Kellerer wrote:
> That line seems to be valid - but you need to pass an XMLTYPE value,
> not a VARCHAR
>
> https://dbfiddle.uk/?rdbms=oracle_11.2&fiddle=21cdf890a26e97fa8667b2d6a960bd33

Oh, of course! Thank you. I had forgotten pass the context item
as explicitly an XML value.

That illustrates that, in a proper XMLTABLE, you can pass things
that are not XML values. When a varchar is passed in, the context
item has type xs:string. The third PATH tried to follow a node path
against a non-node context item, and correctly reported the error.

And thanks for the dbfiddle pointer. I can now confirm (in both 11g.2
and 18c):

SELECT *
FROM XMLTABLE('.'
  PASSING xmltype('<sale hatsize="7" customer="alice" taxable="false"/>')
  COLUMNS
    a varchar(10) PATH '"cat" < "dog"',
    b varchar(10) PATH '"cat" > "dog"',
    c varchar(10) PATH 'sale/@taxable = false()'
);

A B C
true false true

Or as numbers (There's just no SQL boolean type in Oracle, even 18g!):

SELECT *
FROM XMLTABLE('.'
  PASSING xmltype('<sale hatsize="7" customer="alice" taxable="false"/>')
  COLUMNS
    a NUMBER PATH '"cat" < "dog"',
    b NUMBER PATH '"cat" > "dog"',
    c NUMBER PATH 'sale/@taxable = false()'
);

A B C
1 0 1


I removed the string() wrappings, which were only to allow the
same query to work in PG, but Pavel's proposed patches will make
them unnecessary.

Note, however, that the first proposed patch will work for the
first query (varchar results) and fail for the second (number
results). The second patch will work for the second query, but
produce the wrong strings ("1" or "0" instead of "true" or "false")
for the first. A proper XMLTABLE needs to apply the appropriate
conversion determined by the SQL type of the output column.

I believe a patch to do that correctly is possible; xml.c has
access to the type oids for the output columns, after all.

The fact that PG will return false|false|false or 0|0|0 instead
of true|false|true or 1|0|1 cannot be fixed by a patch. That is
the consequence of evaluating in XPath 1.0 (in XPath 2.0, which is
a subset of XQuery, the results would be correct).

On the same lines, we can take my original example where I forgot
to type the context item as XML, and make that work in Oracle too:

SELECT *
FROM XMLTABLE('.'
  PASSING '<sale hatsize="7" customer="alice" taxable="false"/>'
  COLUMNS
    a varchar(10) PATH 'substring-after(., "taxable=")'
);

A
"false"/>

A proper XMLTABLE is happy to be passed an atomic value, such as
a string, as the context item or any named parameter, and apply
type-appropriate operators and functions to it. XPath 1.0 blocks
that for PG.

-Chap

Reply | Threaded
Open this post in threaded view
|

Re: PostgreSQL vs SQL/XML Standards

Pavel Stehule
In reply to this post by Pavel Stehule


po 29. 10. 2018 v 11:45 odesílatel Pavel Stehule <[hidden email]> napsal:


po 29. 10. 2018 v 10:11 odesílatel Pavel Stehule <[hidden email]> napsal:
Hi

čt 25. 10. 2018 v 21:47 odesílatel Alvaro Herrera <[hidden email]> napsal:
On 2018-Oct-25, Pavel Stehule wrote:

> I am thinking so I can fix some issues related to XMLTABLE. Please, send me
> more examples and test cases.

Please see Markus Winand's patch that I referenced upthread.

here is a fix of some XMLTABLE mentioned issues.

this update allows cast boolean to numeric types from XPath expressions

Attached patch solves some cast issues mentioned by Chap. It solves issue reported by Markus. I didn't use Markus's code, but it was inspiration for me. I found native solution from libxml2.

Regards

Pavel
 

Regards

Pavel


Regards

Pavel

 

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

xmltable-xpath-result-processing-bugfix.patch (9K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: PostgreSQL vs SQL/XML Standards

Markus Winand

> On 2018-11-6, at 15:23 , Pavel Stehule <[hidden email]> wrote:
>
>
>
> po 29. 10. 2018 v 11:45 odesílatel Pavel Stehule <[hidden email]> napsal:
>
>
> po 29. 10. 2018 v 10:11 odesílatel Pavel Stehule <[hidden email]> napsal:
> Hi
>
> čt 25. 10. 2018 v 21:47 odesílatel Alvaro Herrera <[hidden email]> napsal:
> On 2018-Oct-25, Pavel Stehule wrote:
>
> > I am thinking so I can fix some issues related to XMLTABLE. Please, send me
> > more examples and test cases.
>
> Please see Markus Winand's patch that I referenced upthread.
>
> here is a fix of some XMLTABLE mentioned issues.
>
> this update allows cast boolean to numeric types from XPath expressions
>
> Attached patch solves some cast issues mentioned by Chap. It solves issue reported by Markus. I didn't use Markus's code, but it was inspiration for me. I found native solution from libxml2.
>
> Regards
>
> Pavel

Better than my patch.

But I think the chunk in xml_xmlnodetoxmltype of my patch is still needed — in one way or the other (see below).

# select * from xmltable('*' PASSING '<e>pre<!--c1--><?pi arg?><![CDATA[&ent1]]><n2>&amp;deep</n2>post</e>' COLUMNS x XML PATH 'node()');
                    x
-----------------------------------------
 prec1arg&amp;ent1<n2>&amp;deep</n2>post
(1 row)

Output is not the original XML.

I dug a little further and found another case that doesn’t looks right even with my change to xml_xmlnodetoxmltype applied:

# select * from xmltable('*' PASSING '<e>pre<!--c1--><?pi arg?><![CDATA[&ent1]]><n2>&amp;deep</n2>post</e>' COLUMNS x XML PATH '/');
             x
---------------------------
 pre&amp;ent1&amp;deeppost
(1 row)

Oracle gives in both cases XML.

To fix that I included XML_DOCUMENT_NODE in the list of nodes that use xmlNodeDump. Now I wonder if that logic should be reversed to use the xmlXPathCastNodeToString branch in a few selected cases but default to the branch xmlNodeDump for all other cases?

I guess those few cases might be XML_ATTRIBUTE_NODE and XML_TEXT_NODE. Regression tests are happy with that approach but I don’t think that proves a lot.

-markus

diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 37d85f7..7c1f884 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -3682,7 +3682,7 @@ xml_xmlnodetoxmltype(xmlNodePtr cur, PgXmlErrorContext *xmlerrcxt)
 {
        xmltype    *result;

-       if (cur->type == XML_ELEMENT_NODE)
+       if (cur->type != XML_ATTRIBUTE_NODE && cur->type != XML_TEXT_NODE)
        {
                xmlBufferPtr buf;
                xmlNodePtr      cur_copy;



Reply | Threaded
Open this post in threaded view
|

Re: PostgreSQL vs SQL/XML Standards

Pavel Stehule


čt 8. 11. 2018 v 15:18 odesílatel Markus Winand <[hidden email]> napsal:

> On 2018-11-6, at 15:23 , Pavel Stehule <[hidden email]> wrote:
>
>
>
> po 29. 10. 2018 v 11:45 odesílatel Pavel Stehule <[hidden email]> napsal:
>
>
> po 29. 10. 2018 v 10:11 odesílatel Pavel Stehule <[hidden email]> napsal:
> Hi
>
> čt 25. 10. 2018 v 21:47 odesílatel Alvaro Herrera <[hidden email]> napsal:
> On 2018-Oct-25, Pavel Stehule wrote:
>
> > I am thinking so I can fix some issues related to XMLTABLE. Please, send me
> > more examples and test cases.
>
> Please see Markus Winand's patch that I referenced upthread.
>
> here is a fix of some XMLTABLE mentioned issues.
>
> this update allows cast boolean to numeric types from XPath expressions
>
> Attached patch solves some cast issues mentioned by Chap. It solves issue reported by Markus. I didn't use Markus's code, but it was inspiration for me. I found native solution from libxml2.
>
> Regards
>
> Pavel

Better than my patch.

But I think the chunk in xml_xmlnodetoxmltype of my patch is still needed — in one way or the other (see below).

# select * from xmltable('*' PASSING '<e>pre<!--c1--><?pi arg?><![CDATA[&ent1]]><n2>&amp;deep</n2>post</e>' COLUMNS x XML PATH 'node()');
                    x
-----------------------------------------
 prec1arg&amp;ent1<n2>&amp;deep</n2>post
(1 row)

Output is not the original XML.

I dug a little further and found another case that doesn’t looks right even with my change to xml_xmlnodetoxmltype applied:

# select * from xmltable('*' PASSING '<e>pre<!--c1--><?pi arg?><![CDATA[&ent1]]><n2>&amp;deep</n2>post</e>' COLUMNS x XML PATH '/');
             x
---------------------------
 pre&amp;ent1&amp;deeppost
(1 row)

Oracle gives in both cases XML.

To fix that I included XML_DOCUMENT_NODE in the list of nodes that use xmlNodeDump. Now I wonder if that logic should be reversed to use the xmlXPathCastNodeToString branch in a few selected cases but default to the branch xmlNodeDump for all other cases?

I guess those few cases might be XML_ATTRIBUTE_NODE and XML_TEXT_NODE. Regression tests are happy with that approach but I don’t think that proves a lot.

-markus

diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 37d85f7..7c1f884 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -3682,7 +3682,7 @@ xml_xmlnodetoxmltype(xmlNodePtr cur, PgXmlErrorContext *xmlerrcxt)
 {
        xmltype    *result;

-       if (cur->type == XML_ELEMENT_NODE)
+       if (cur->type != XML_ATTRIBUTE_NODE && cur->type != XML_TEXT_NODE)
        {
                xmlBufferPtr buf;
                xmlNodePtr      cur_copy;


I used your patch and append regress tests. I checked the result against Oracle.

Regards

Pavel

xmltable-xpath-result-processing-bugfix-2.patch (11K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: PostgreSQL vs SQL/XML Standards

Markus Winand

On 2018-11-9, at 05:07 , Pavel Stehule <[hidden email]> wrote:



čt 8. 11. 2018 v 15:18 odesílatel Markus Winand <[hidden email]> napsal:

> On 2018-11-6, at 15:23 , Pavel Stehule <[hidden email]> wrote:
>
>
>
> po 29. 10. 2018 v 11:45 odesílatel Pavel Stehule <[hidden email]> napsal:
>
>
> po 29. 10. 2018 v 10:11 odesílatel Pavel Stehule <[hidden email]> napsal:
> Hi
>
> čt 25. 10. 2018 v 21:47 odesílatel Alvaro Herrera <[hidden email]> napsal:
> On 2018-Oct-25, Pavel Stehule wrote:
>
> > I am thinking so I can fix some issues related to XMLTABLE. Please, send me
> > more examples and test cases.
>
> Please see Markus Winand's patch that I referenced upthread.
>
> here is a fix of some XMLTABLE mentioned issues.
>
> this update allows cast boolean to numeric types from XPath expressions
>
> Attached patch solves some cast issues mentioned by Chap. It solves issue reported by Markus. I didn't use Markus's code, but it was inspiration for me. I found native solution from libxml2.
>
> Regards
>
> Pavel

Better than my patch.

But I think the chunk in xml_xmlnodetoxmltype of my patch is still needed — in one way or the other (see below).

# select * from xmltable('*' PASSING '<e>pre<!--c1--><?pi arg?><![CDATA[&ent1]]><n2>&amp;deep</n2>post</e>' COLUMNS x XML PATH 'node()');
                    x
-----------------------------------------
 prec1arg&amp;ent1<n2>&amp;deep</n2>post
(1 row)

Output is not the original XML.

I dug a little further and found another case that doesn’t looks right even with my change to xml_xmlnodetoxmltype applied:

# select * from xmltable('*' PASSING '<e>pre<!--c1--><?pi arg?><![CDATA[&ent1]]><n2>&amp;deep</n2>post</e>' COLUMNS x XML PATH '/');
             x
---------------------------
 pre&amp;ent1&amp;deeppost
(1 row)

Oracle gives in both cases XML.

To fix that I included XML_DOCUMENT_NODE in the list of nodes that use xmlNodeDump. Now I wonder if that logic should be reversed to use the xmlXPathCastNodeToString branch in a few selected cases but default to the branch xmlNodeDump for all other cases?

I guess those few cases might be XML_ATTRIBUTE_NODE and XML_TEXT_NODE. Regression tests are happy with that approach but I don’t think that proves a lot.

-markus

diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 37d85f7..7c1f884 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -3682,7 +3682,7 @@ xml_xmlnodetoxmltype(xmlNodePtr cur, PgXmlErrorContext *xmlerrcxt)
 {
        xmltype    *result;

-       if (cur->type == XML_ELEMENT_NODE)
+       if (cur->type != XML_ATTRIBUTE_NODE && cur->type != XML_TEXT_NODE)
        {
                xmlBufferPtr buf;
                xmlNodePtr      cur_copy;


I used your patch and append regress tests. I checked the result against Oracle.

Regards

Pavel

Fine from my side.

-markus

Reply | Threaded
Open this post in threaded view
|

Re: PostgreSQL vs SQL/XML Standards

Pavel Stehule


po 12. 11. 2018 v 6:58 odesílatel Markus Winand <[hidden email]> napsal:

On 2018-11-9, at 05:07 , Pavel Stehule <[hidden email]> wrote:



čt 8. 11. 2018 v 15:18 odesílatel Markus Winand <[hidden email]> napsal:

> On 2018-11-6, at 15:23 , Pavel Stehule <[hidden email]> wrote:
>
>
>
> po 29. 10. 2018 v 11:45 odesílatel Pavel Stehule <[hidden email]> napsal:
>
>
> po 29. 10. 2018 v 10:11 odesílatel Pavel Stehule <[hidden email]> napsal:
> Hi
>
> čt 25. 10. 2018 v 21:47 odesílatel Alvaro Herrera <[hidden email]> napsal:
> On 2018-Oct-25, Pavel Stehule wrote:
>
> > I am thinking so I can fix some issues related to XMLTABLE. Please, send me
> > more examples and test cases.
>
> Please see Markus Winand's patch that I referenced upthread.
>
> here is a fix of some XMLTABLE mentioned issues.
>
> this update allows cast boolean to numeric types from XPath expressions
>
> Attached patch solves some cast issues mentioned by Chap. It solves issue reported by Markus. I didn't use Markus's code, but it was inspiration for me. I found native solution from libxml2.
>
> Regards
>
> Pavel

Better than my patch.

But I think the chunk in xml_xmlnodetoxmltype of my patch is still needed — in one way or the other (see below).

# select * from xmltable('*' PASSING '<e>pre<!--c1--><?pi arg?><![CDATA[&ent1]]><n2>&amp;deep</n2>post</e>' COLUMNS x XML PATH 'node()');
                    x
-----------------------------------------
 prec1arg&amp;ent1<n2>&amp;deep</n2>post
(1 row)

Output is not the original XML.

I dug a little further and found another case that doesn’t looks right even with my change to xml_xmlnodetoxmltype applied:

# select * from xmltable('*' PASSING '<e>pre<!--c1--><?pi arg?><![CDATA[&ent1]]><n2>&amp;deep</n2>post</e>' COLUMNS x XML PATH '/');
             x
---------------------------
 pre&amp;ent1&amp;deeppost
(1 row)

Oracle gives in both cases XML.

To fix that I included XML_DOCUMENT_NODE in the list of nodes that use xmlNodeDump. Now I wonder if that logic should be reversed to use the xmlXPathCastNodeToString branch in a few selected cases but default to the branch xmlNodeDump for all other cases?

I guess those few cases might be XML_ATTRIBUTE_NODE and XML_TEXT_NODE. Regression tests are happy with that approach but I don’t think that proves a lot.

-markus

diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 37d85f7..7c1f884 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -3682,7 +3682,7 @@ xml_xmlnodetoxmltype(xmlNodePtr cur, PgXmlErrorContext *xmlerrcxt)
 {
        xmltype    *result;

-       if (cur->type == XML_ELEMENT_NODE)
+       if (cur->type != XML_ATTRIBUTE_NODE && cur->type != XML_TEXT_NODE)
        {
                xmlBufferPtr buf;
                xmlNodePtr      cur_copy;


I used your patch and append regress tests. I checked the result against Oracle.

Regards

Pavel

Fine from my side.

super

Thank you for investigation and tests, examples.

It is assigned to January commitfest.

Regards

Pavel


-markus

Reply | Threaded
Open this post in threaded view
|

Re: PostgreSQL vs SQL/XML Standards

Chapman Flack
On 11/12/18 04:58, Pavel Stehule wrote:
> It is assigned to January commitfest.

When I build this patch against master (4203842), it builds, but breaks
make check. The diffs seem only incidental (loss of some error context
output), but there are no make check errors building 4203842 unmodified,
so the patch has introduced them. Example:

*** ../postgresql/src/test/regress/expected/xml.out    Sat Dec 29 19:58:41 2018
--- src/test/regress/results/xml.out   Sat Dec 29 19:59:11 2018
***************
*** 9,16 ****
  LINE 1: INSERT INTO xmltest VALUES (3, '<wrong');
                                         ^
  DETAIL:  line 1: Couldn't find end of Start Tag wrong line 1
- <wrong
-       ^

=== ABOUT THE DOCUMENTATION ===

While this patch fixes bugs in the embedding of XPath 1.0 into PostgreSQL,
and that is good, I still think there is an elephant in the room, namely
that XPath 1.0 isn't the right language to embed. Clearly, that can't be
fixed as long as we rely on libxml, but I think our documentation could
(should) do a better job of warning users of the surprises that result.

My ideas for improving the documentation are not yet so finished as to
offer a patch, but here are some thoughts:

Of the 13 mentions in func.sgml of XPath (not counting its appearances
in function and parameter names), only one specifically says XPath 1.0,
and it is buried down at functions-xml-processing, not somewhere common
to all of them, like the introduction to functions-xml.

Really, all of our functions that refer to "XPath" mean XPath 1.0,
and because that version was essentially thrown away to produce the 2.0
and later versions, and that happened a dozen years ago, the documentation
needs to make that clear. Users who have learned XQuery, or XPath in the
last dozen years, need to know this is something very different, to avoid
walking into the traps.

I think:

1) Every reference to "XPath" or an "XPath expression" (that is the first
   in the description of a function or parameter) should be spelled out as
   "XPath 1.0".

2) There should be a section in one central place (clearly applying to
   all of the XML functions), perhaps in the functions-xml introduction,
   or a <footnote> to a paragraph there, or in an appendix (new section
   under SQL Conformance?) that summarizes the gulf between XPath 1.0
   and the later, XQuery-compatible versions. There should be cross-
   reference links to this section from the "XPath 1.0" mentions in
   individual function descriptions. The section could be a condensation
   of [1] (and perhaps contain a link to [1], if links out to the wiki
   are allowed), and link to the W3C compatibility notes at [2] and [3].

3) Currently, the description of XMLEXISTS notes that "the SQL standard
   specifies ... the construct to take an XQuery expression ... but
   PostgreSQL currently only supports XPath, which is a subset of XQuery".
   The description of XMLTABLE does not have such a note, and needs one.
   In both cases, the note should probably be nearer the top of the
   description (for XMLEXISTS, it is currently at the bottom, after the
   examples).

4) That note currently says "only supports XPath, which is a subset
   of XQuery". That would be a fair claim if we meant XPath 2.0
   or later, but not XPath 1.0 (which was not a subset of any version
   of XQuery, and has fundamental incompatibilities with it). The note,
   for both XMLEXISTS and XMLTABLE, should simply say "only supports
   XPath 1.0", with a link to the section summarizing the incompatibilities.

Those are my thoughts on how a patch to the documentation could be
organized. Do they seem reasonable?

-Chap


[1]
https://wiki.postgresql.org/wiki/PostgreSQL_vs_SQL/XML_Standards#Related_to_version_of_XPath
[2] https://www.w3.org/TR/xpath20/#id-backwards-compatibility
[3]
https://www.w3.org/TR/2010/REC-xpath-functions-20101214/#xpath1-compatibility

Reply | Threaded
Open this post in threaded view
|

Re: PostgreSQL vs SQL/XML Standards

Pavel Stehule


ne 30. 12. 2018 v 3:46 odesílatel Chapman Flack <[hidden email]> napsal:
On 11/12/18 04:58, Pavel Stehule wrote:
> It is assigned to January commitfest.

When I build this patch against master (4203842), it builds, but breaks
make check. The diffs seem only incidental (loss of some error context
output), but there are no make check errors building 4203842 unmodified,
so the patch has introduced them. Example:

*** ../postgresql/src/test/regress/expected/xml.out    Sat Dec 29 xml19:58:41 2018
--- src/test/regress/results/xml.out   Sat Dec 29 19:59:11 2018
***************
*** 9,16 ****
  LINE 1: INSERT INTO xmltest VALUES (3, '<wrong');
                                         ^
  DETAIL:  line 1: Couldn't find end of Start Tag wrong line 1
- <wrong
-       ^

Unfortunately, there is a different releases of libxml2 with different error reporting and it is hard (impossible) to prepare for all variants. :-/

I prepare xml.out for my FC29 (fresh libxml2) and for no support xml. Other I prepare by patching - and this error (in context) is expected.


=== ABOUT THE DOCUMENTATION ===

While this patch fixes bugs in the embedding of XPath 1.0 into PostgreSQL,
and that is good, I still think there is an elephant in the room, namely
that XPath 1.0 isn't the right language to embed. Clearly, that can't be
fixed as long as we rely on libxml, but I think our documentation could
(should) do a better job of warning users of the surprises that result.

My ideas for improving the documentation are not yet so finished as to
offer a patch, but here are some thoughts:

Of the 13 mentions in func.sgml of XPath (not counting its appearances
in function and parameter names), only one specifically says XPath 1.0,
and it is buried down at functions-xml-processing, not somewhere common
to all of them, like the introduction to functions-xml.

Really, all of our functions that refer to "XPath" mean XPath 1.0,
and because that version was essentially thrown away to produce the 2.0
and later versions, and that happened a dozen years ago, the documentation
needs to make that clear. Users who have learned XQuery, or XPath in the
last dozen years, need to know this is something very different, to avoid
walking into the traps.

I think:

1) Every reference to "XPath" or an "XPath expression" (that is the first
   in the description of a function or parameter) should be spelled out as
   "XPath 1.0".

2) There should be a section in one central place (clearly applying to
   all of the XML functions), perhaps in the functions-xml introduction,
   or a <footnote> to a paragraph there, or in an appendix (new section
   under SQL Conformance?) that summarizes the gulf between XPath 1.0
   and the later, XQuery-compatible versions. There should be cross-
   reference links to this section from the "XPath 1.0" mentions in
   individual function descriptions. The section could be a condensation
   of [1] (and perhaps contain a link to [1], if links out to the wiki
   are allowed), and link to the W3C compatibility notes at [2] and [3].

3) Currently, the description of XMLEXISTS notes that "the SQL standard
   specifies ... the construct to take an XQuery expression ... but
   PostgreSQL currently only supports XPath, which is a subset of XQuery".
   The description of XMLTABLE does not have such a note, and needs one.
   In both cases, the note should probably be nearer the top of the
   description (for XMLEXISTS, it is currently at the bottom, after the
   examples).

4) That note currently says "only supports XPath, which is a subset
   of XQuery". That would be a fair claim if we meant XPath 2.0
   or later, but not XPath 1.0 (which was not a subset of any version
   of XQuery, and has fundamental incompatibilities with it). The note,
   for both XMLEXISTS and XMLTABLE, should simply say "only supports
   XPath 1.0", with a link to the section summarizing the incompatibilities.

Those are my thoughts on how a patch to the documentation could be
organized. Do they seem reasonable?

I agree with more stronger detail description about difference between XPath, XPath2 and XQuery.

Some like "The XPath 1.0 is ancestor of XPath 2.0, that is part of XQuery. ..."

I don't think so link to wiki with any proposals is good idea. Documentation should to describe current state, not what can be at some future.

Regards

Pavel


-Chap


[1]
https://wiki.postgresql.org/wiki/PostgreSQL_vs_SQL/XML_Standards#Related_to_version_of_XPath
[2] https://www.w3.org/TR/xpath20/#id-backwards-compatibility
[3]
https://www.w3.org/TR/2010/REC-xpath-functions-20101214/#xpath1-compatibility
Reply | Threaded
Open this post in threaded view
|

Re: PostgreSQL vs SQL/XML Standards

Chapman Flack
On 12/30/18 03:23, Pavel Stehule wrote:
> I agree with more stronger detail description about difference between
> XPath, XPath2 and XQuery.
>
> Some like "The XPath 1.0 is ancestor of XPath 2.0, that is part of XQuery.
> ..."

I'll be happy to work on some wording to help detail the differences.
I don't think it can be condensed to a single sentence, or even if it
could, it would not serve users well; what's helpful is to know
the specifics of what changed between XPath 1.0 and the 2.0-3.0-3.1 lineage
that is now XPath, because that's what helps to understand what the
challenges will be when porting queries from other systems, how to recognize
when a query won't be possible to port at all, etc.

So I think it needs to be a short section. I am still undecided what's
the best place in the manual for the section to go:

- Added to the introductory material under functions-xml

- A long <footnote> attached to a short remark in that introductory matter
  (is there any precedent? I see there are <footnote>s already in the
  manual, but I haven't checked how long they get)

- A new appendix (or new section in the SQL Conformance appendix)
  linked from a short statement in the functions-xml introductory matter.

> I don't think so link to wiki with any proposals is good idea.

That makes sense. I was thinking only of linking directly to the specific
section on XPath 1.0 compatibility issues, not to the wiki page as a whole.
I see in grepping through the *.sgml that there are not many links to the
wiki.postgresql.org, and the ones that exist are only in release notes,
Further Information, The Source Code Repository, and sepgsql.

So precedent seems to stand against using a link to the wiki section.
So, the material /in/ that wiki section [1] (but omitting the two paras
about the DOCUMENT/CONTENT node-wrapping hack) is more or less what needs
to go into the new section in the manual.

I assume it's ok to directly include links [2] and [3] out to the w3.org
compatibility notes ... those are stable URLs to reference material.

=== BY REF and BY VALUE ===

How difficult would it be to make the grammar constructs that currently
accept "BY REF" (only as noise and ignore it) accept and ignore both BY REF
and BY VALUE? The documentation ought to state that, while both forms are
accepted and ignored, PostgreSQL's actual behavior corresponds to what the
standard calls BY VALUE (using a string serialization as the XML datatype's
internal form makes the BY REF semantics impossible). So it's when people
try to port queries that explicitly say BY REF that they need to know there
may be trouble ahead.

-Chap


>> [1] https://wiki.postgresql.org/wiki/PostgreSQL_vs_SQL/XML_Standards#Related_to_version_of_XPath
>> [2] https://www.w3.org/TR/xpath20/#id-backwards-compatibility
>> [3] https://www.w3.org/TR/2010/REC-xpath-functions-20101214/#xpath1-compatibility

Reply | Threaded
Open this post in threaded view
|

Re: PostgreSQL vs SQL/XML Standards

Pavel Stehule


ne 30. 12. 2018 v 20:06 odesílatel Chapman Flack <[hidden email]> napsal:
On 12/30/18 03:23, Pavel Stehule wrote:
> I agree with more stronger detail description about difference between
> XPath, XPath2 and XQuery.
>
> Some like "The XPath 1.0 is ancestor of XPath 2.0, that is part of XQuery.
> ..."

I'll be happy to work on some wording to help detail the differences.
I don't think it can be condensed to a single sentence, or even if it
could, it would not serve users well; what's helpful is to know
the specifics of what changed between XPath 1.0 and the 2.0-3.0-3.1 lineage
that is now XPath, because that's what helps to understand what the
challenges will be when porting queries from other systems, how to recognize
when a query won't be possible to port at all, etc.

So I think it needs to be a short section. I am still undecided what's
the best place in the manual for the section to go:


+1
 
- Added to the introductory material under functions-xml

- A long <footnote> attached to a short remark in that introductory matter
  (is there any precedent? I see there are <footnote>s already in the
  manual, but I haven't checked how long they get)

- A new appendix (or new section in the SQL Conformance appendix)
  linked from a short statement in the functions-xml introductory matter.

> I don't think so link to wiki with any proposals is good idea.

That makes sense. I was thinking only of linking directly to the specific
section on XPath 1.0 compatibility issues, not to the wiki page as a whole.
I see in grepping through the *.sgml that there are not many links to the
wiki.postgresql.org, and the ones that exist are only in release notes,
Further Information, The Source Code Repository, and sepgsql.

So precedent seems to stand against using a link to the wiki section.
So, the material /in/ that wiki section [1] (but omitting the two paras
about the DOCUMENT/CONTENT node-wrapping hack) is more or less what needs
to go into the new section in the manual.

I assume it's ok to directly include links [2] and [3] out to the w3.org
compatibility notes ... those are stable URLs to reference material.

=== BY REF and BY VALUE ===

How difficult would it be to make the grammar constructs that currently
accept "BY REF" (only as noise and ignore it) accept and ignore both BY REF
and BY VALUE? The documentation ought to state that, while both forms are
accepted and ignored, PostgreSQL's actual behavior corresponds to what the
standard calls BY VALUE (using a string serialization as the XML datatype's
internal form makes the BY REF semantics impossible). So it's when people
try to port queries that explicitly say BY REF that they need to know there
may be trouble ahead.

This is difficult question - a implementation is probably very easy, but it is hard to accept to possible break compatibility due syntactic sugar.

This is not probably related to just XPath/XQuery question - but it is related to different design of XML datatype (based on PostgreSQL TOAST) against ANSI/SQL (Oracle - clob).

So this is complicated topic and my opinion is better to don't touch it because we can't to fix it, change it - and I am not sure so ANSI/SQL is significantly better than PostgreSQL implementation.
 

-Chap


>> [1] https://wiki.postgresql.org/wiki/PostgreSQL_vs_SQL/XML_Standards#Related_to_version_of_XPath
>> [2] https://www.w3.org/TR/xpath20/#id-backwards-compatibility
>> [3] https://www.w3.org/TR/2010/REC-xpath-functions-20101214/#xpath1-compatibility
Reply | Threaded
Open this post in threaded view
|

Re: PostgreSQL vs SQL/XML Standards

Chapman Flack
On 12/30/18 15:18, Pavel Stehule wrote:

> ne 30. 12. 2018 v 20:06 odesílatel Chapman Flack <[hidden email]>
> napsal:
>> How difficult would it be to make the grammar constructs that currently
>> accept "BY REF" (only as noise and ignore it) accept and ignore both BY REF
>> and BY VALUE?
>
> This is difficult question - a implementation is probably very easy, but it
> is hard to accept to possible break compatibility due syntactic sugar.
>
> This is not probably related to just XPath/XQuery question - but it is
> related to different design of XML datatype (based on PostgreSQL TOAST)
> against ANSI/SQL (Oracle - clob).
>
> So this is complicated topic and my opinion is better to don't touch it
> because we can't to fix it, change it - and I am not sure so ANSI/SQL is
> significantly better than PostgreSQL implementation.

I am not sure I understand your point. It appears that Oracle (18c),
just like PostgreSQL, really only supports BY VALUE semantics. Here is
an Oracle fiddle that shows it:

https://dbfiddle.uk/?rdbms=oracle_18&fiddle=0cb353da0d94c6d5c2659222a1e419fd

When the same element is passed via two parameters, an 'is' test (node
identity equality) of the two parameters returns false, indicating that
Oracle has used BY VALUE semantics, not BY REF.

Oracle uses BY VALUE when BY VALUE is explicitly requested, and also when
no passing method is specified (i.e., BY VALUE is the default). Oracle also
uses BY VALUE when BY REF is explicitly requested, which seems rather rude,
but that must be the behavior PostgreSQL is imitating with the choice to
accept and ignore BY REF.

But the PostgreSQL situation is a little more strange. PG uses BY VALUE
semantics as the default when no passing method is specified. PG also uses
BY VALUE semantics when BY REF is explicitly requested, which is rude,
just like Oracle. But why should an explicit specification of BY VALUE
(which is, after all, the semantics we're going to use anyway!) produce
this?

ERROR:  syntax error at or near "value"

To me, that doesn't seem like least astonishment.

I am not seeing what would be complicated about removing that astonishment
by simply allowing the grammar productions to also consume BY VALUE and
ignore it.

-Chap

Reply | Threaded
Open this post in threaded view
|

doc: where best to add ~ 400 words to the manual?

Chapman Flack
In reply to this post by Chapman Flack
I would like to add material to the manual, detailing the differences
between the XPath 1.0 language supported in PG, and the XQuery/XPath 2.0+
expected by the standard.

A good preview of what that would look like is the "related to version
of XPath" wiki section at [1].

That wiki section (minus the two inside-baseball paragraphs about
commit 3963574 and its reversion) comes to a bit under 400 words.

So that's roughly the amount of material I'm thinking to add.
Where should it go?

I have thought of:

1. A new sect2 near the top of the "functions-xml" sect1. This makes sense
because the material applies to all the functions below that involve XPath.
But that seems like the wrong place for 400 words of gory details.

2. A short remark at the top of functions-xml, linked to a footnote with
the gory details. But that would be a long footnote. The longest currently
is a footnote with 101 words in the start tutorial.

3. A sect2 at the very end of functions-xml, linked to from a short remark
at the top.

4. A new appendix, or a new conformance section in the features.sgml
appendix, linked to from a short remark at the top of functions-xml.

Or is there an even better idea I have not thought of?

-Chap


[1]
https://wiki.postgresql.org/wiki/PostgreSQL_vs_SQL/XML_Standards#Related_to_version_of_XPath

Reply | Threaded
Open this post in threaded view
|

Re: PostgreSQL vs SQL/XML Standards

Pavel Stehule
In reply to this post by Chapman Flack


po 31. 12. 2018 v 3:15 odesílatel Chapman Flack <[hidden email]> napsal:
On 12/30/18 15:18, Pavel Stehule wrote:
> ne 30. 12. 2018 v 20:06 odesílatel Chapman Flack <[hidden email]>
> napsal:
>> How difficult would it be to make the grammar constructs that currently
>> accept "BY REF" (only as noise and ignore it) accept and ignore both BY REF
>> and BY VALUE?
>
> This is difficult question - a implementation is probably very easy, but it
> is hard to accept to possible break compatibility due syntactic sugar.
>
> This is not probably related to just XPath/XQuery question - but it is
> related to different design of XML datatype (based on PostgreSQL TOAST)
> against ANSI/SQL (Oracle - clob).
>
> So this is complicated topic and my opinion is better to don't touch it
> because we can't to fix it, change it - and I am not sure so ANSI/SQL is
> significantly better than PostgreSQL implementation.

I am not sure I understand your point. It appears that Oracle (18c),
just like PostgreSQL, really only supports BY VALUE semantics. Here is
an Oracle fiddle that shows it:

https://dbfiddle.uk/?rdbms=oracle_18&fiddle=0cb353da0d94c6d5c2659222a1e419fd

When the same element is passed via two parameters, an 'is' test (node
identity equality) of the two parameters returns false, indicating that
Oracle has used BY VALUE semantics, not BY REF.

Oracle uses BY VALUE when BY VALUE is explicitly requested, and also when
no passing method is specified (i.e., BY VALUE is the default). Oracle also
uses BY VALUE when BY REF is explicitly requested, which seems rather rude,
but that must be the behavior PostgreSQL is imitating with the choice to
accept and ignore BY REF.

But the PostgreSQL situation is a little more strange. PG uses BY VALUE
semantics as the default when no passing method is specified. PG also uses
BY VALUE semantics when BY REF is explicitly requested, which is rude,
just like Oracle. But why should an explicit specification of BY VALUE
(which is, after all, the semantics we're going to use anyway!) produce
this?

ERROR:  syntax error at or near "value"

To me, that doesn't seem like least astonishment.

I am not seeing what would be complicated about removing that astonishment
by simply allowing the grammar productions to also consume BY VALUE and
ignore it.

ok - I am not against implementation of ignored BY VALUE. But I don't like a idea to disable BY REF.

Regards

Pavel

-Chap
Reply | Threaded
Open this post in threaded view
|

Re: PostgreSQL vs SQL/XML Standards

a.zakirov
In reply to this post by Pavel Stehule
Hello Pavel,

On 09.11.2018 07:07, Pavel Stehule wrote:
> I used your patch and append regress tests. I checked the result against
> Oracle.

I checked the patch with Chap cases. The patch fixes handling of
boolean, number types which mentioned in the wiki.

I have a few comments related to the code and the documentation. I
attached the patch, which fixes it.

There is an example in the documentation:

SELECT xmltable.*
   FROM xmlelements, XMLTABLE('/root' PASSING data COLUMNS element text);
        element
----------------------
    Hello2a2   bbbCC

With the patch XMLTABLE returns different result now.

copy_and_safe_free_xmlchar() function should be hid by #ifdef
USE_LIBXML, otherwise I get an error if I build the Postgres without
--with-libxml.

There is a comment within XmlTableGetValue(). I changed it, mainly I
used Markus patch from the related thread mentioned by Alvaro.

Please see the changes in the patch.

--
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

xmltable-xpath-result-review-fix.patch (1K) Download Attachment
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
Reply | Threaded
Open this post in threaded view
|

Re: PostgreSQL vs SQL/XML Standards

Pavel Stehule
Hi

čt 10. 1. 2019 v 14:00 odesílatel Arthur Zakirov <[hidden email]> napsal:
Hello Pavel,

On 09.11.2018 07:07, Pavel Stehule wrote:
> I used your patch and append regress tests. I checked the result against
> Oracle.

I checked the patch with Chap cases. The patch fixes handling of
boolean, number types which mentioned in the wiki.

I have a few comments related to the code and the documentation. I
attached the patch, which fixes it.

There is an example in the documentation:

SELECT xmltable.*
   FROM xmlelements, XMLTABLE('/root' PASSING data COLUMNS element text);
        element
----------------------
    Hello2a2   bbbCC

With the patch XMLTABLE returns different result now.

copy_and_safe_free_xmlchar() function should be hid by #ifdef
USE_LIBXML, otherwise I get an error if I build the Postgres without
--with-libxml.

There is a comment within XmlTableGetValue(). I changed it, mainly I
used Markus patch from the related thread mentioned by Alvaro.

Please see the changes in the patch.

I merged your changes, and fixed regress tests.

Thank you for patch

Regards

Pavel



--
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

xmltable-xpath-result-processing-bugfix-3.patch (15K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: doc: where best to add ~ 400 words to the manual?

Chapman Flack
In reply to this post by Chapman Flack
On 12/30/18 22:23, Chapman Flack wrote:
> I would like to add material to the manual, detailing the differences
> between the XPath 1.0 language supported in PG, and the XQuery/XPath 2.0+
> expected by the standard.
> ...
> I have thought of:
> ...
> 3. A sect2 at the very end of functions-xml, linked to from a short remark
> at the top.

(3) seems to be supported by (the only) precedent, the "Limits and
Compatibility" sect3 within functions-posix-regexp.

So, absent objection, I'll work on a Limits and Compatibility sect2
within functions-xml.

-Chap

Reply | Threaded
Open this post in threaded view
|

Re: doc: where best to add ~ 400 words to the manual?

Chapman Flack
On 01/10/19 23:48, Chapman Flack wrote:
> On 12/30/18 22:23, Chapman Flack wrote:
> (3) seems to be supported by (the only) precedent, the "Limits and
> Compatibility" sect3 within functions-posix-regexp.
>
> So, absent objection, I'll work on a Limits and Compatibility sect2
> within functions-xml.

One fly in the ointment: tables of contents seem to list sect1 and sect2
elements, but not sect3. So the "Limits and Compatibility" sect3 under
functions-posix-regexp does not clutter the overall "Functions and
Operators" ToC, but a "Limits and Compatibility" sect2 under
functions-xml would be visible there. Tucking it as a sect3 within an
existing sect2 would prevent that, but it is common information that
pertains to more than one of them, so that doesn't seem quite right.

Or is it ok to have a "Limits and Compatibility" visible in the ToC
under XML Functions?

-Chap

Reply | Threaded
Open this post in threaded view
|

Re: PostgreSQL vs SQL/XML Standards

Chapman Flack
In reply to this post by Pavel Stehule
On 12/30/18 03:23, Pavel Stehule wrote:
> Unfortunately, there is a different releases of libxml2 with different
> error reporting and it is hard (impossible) to prepare for all variants. :-/
>
> I prepare xml.out for my FC29 (fresh libxml2) and for no support xml.
> Other I prepare by patching - and this error (in context) is expected.

It turns out that the variant was already accounted for in the xml_2.out
variant result file, it just needed to have the new results added.

Done in xmltable-xpath-result-processing-bugfix-4.patch attached.


On 12/31/18 01:03, Pavel Stehule wrote:

> po 31. 12. 2018 v 3:15 odesílatel Chapman Flack <[hidden email]>
> napsal:
>> But the PostgreSQL situation is a little more strange. PG uses BY VALUE
>> semantics as the default when no passing method is specified. PG also uses
>> BY VALUE semantics when BY REF is explicitly requested, which is rude,
>> just like Oracle. But why should an explicit specification of BY VALUE
>> (which is, after all, the semantics we're going to use anyway!) produce
>> this?
>>
>> ERROR:  syntax error at or near "value"
>>
>> To me, that doesn't seem like least astonishment.
>>
>> I am not seeing what would be complicated about removing that astonishment
>> by simply allowing the grammar productions to also consume BY VALUE and
>> ignore it.
>
> ok - I am not against implementation of ignored BY VALUE. But I don't like
> a idea to disable BY REF.
Done in attached xmltable-xmlexists-passing-mechanisms-1.patch along with
some corresponding documentation adjustments.

I am still working on more extensive documentation, but it seemed best
to include the changes related to BY REF / BY VALUE in the same patch
with the grammar change.

-Chap

xmltable-xpath-result-processing-bugfix-4.patch (13K) Download Attachment
xmltable-xmlexists-passing-mechanisms-1.patch (5K) Download Attachment
123456