Bug in error reporting for multi-line JSON

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Bug in error reporting for multi-line JSON

Simon Riggs-5
JSON parsing reports the line number and relevant context info
incorrectly when the JSON contains newlines. Current code mostly just
says "LINE 1" and is misleading for error correction. There were no
tests for this previously.

Proposed changes mean a JSON error such as this
{
 "one": 1,
 "two":,"two",          <-- extra comma
 "three": true
}

was previously reported as

CONTEXT:  JSON data, line 1: {
"one": 1,
"two":,...

should be reported as

CONTEXT:  JSON data, line 3: "two":,...

Attached patches:
HEAD: json_error_context.v3.patch - applies cleanly, passes make check
PG13: json_error_context.v3.patch - applies w minor fuzz, passes make check
PG12: json_error_context.v3.PG12.patch - applies cleanly, passes make check
PG11: json_error_context.v3.PG12.patch - applies cleanly, not tested
PG10: json_error_context.v3.PG12.patch - applies cleanly, not tested
PG9.6: json_error_context.v3.PG12.patch - applies cleanly, not tested
PG9.5: json_error_context.v3.PG12.patch - applies cleanly, not tested

--
Simon Riggs                http://www.EnterpriseDB.com/

json_error_context.v3.PG12.patch (6K) Download Attachment
json_error_context.v3.patch (7K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Bug in error reporting for multi-line JSON

Tom Lane-2
Simon Riggs <[hidden email]> writes:
> JSON parsing reports the line number and relevant context info
> incorrectly when the JSON contains newlines. Current code mostly just
> says "LINE 1" and is misleading for error correction. There were no
> tests for this previously.

Couple thoughts:

* I think you are wrong to have removed the line number bump that
happened when report_json_context advances context_start over a
newline.  The case is likely harder to get to now, but it can still
happen can't it?  If it can't, we should remove that whole stanza.

* I'd suggest naming the new JsonLexContext field "pos_last_newline";
"linefeed" is not usually the word we use for this concept.  (Although
actually, it might work better if you make that point to the char
*after* the newline, in which case "last_linestart" might be the
right name.)

* I'm not enthused about back-patching.  This behavior seems like an
improvement, but that doesn't mean people will appreciate changing it
in stable branches.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Bug in error reporting for multi-line JSON

Simon Riggs-5
On Thu, Jan 21, 2021 at 6:08 PM Tom Lane <[hidden email]> wrote:

>
> Simon Riggs <[hidden email]> writes:
> > JSON parsing reports the line number and relevant context info
> > incorrectly when the JSON contains newlines. Current code mostly just
> > says "LINE 1" and is misleading for error correction. There were no
> > tests for this previously.
>
> Couple thoughts:
>
> * I think you are wrong to have removed the line number bump that
> happened when report_json_context advances context_start over a
> newline.  The case is likely harder to get to now, but it can still
> happen can't it?  If it can't, we should remove that whole stanza.

OK, I'm playing around with this to see what is needed.

> * I'd suggest naming the new JsonLexContext field "pos_last_newline";
> "linefeed" is not usually the word we use for this concept.  (Although
> actually, it might work better if you make that point to the char
> *after* the newline, in which case "last_linestart" might be the
> right name.)

Yes, OK

> * I'm not enthused about back-patching.  This behavior seems like an
> improvement, but that doesn't mean people will appreciate changing it
> in stable branches.

OK, as you wish.

Thanks for the review, will post again soon with an updated patch.

--
Simon Riggs                http://www.EnterpriseDB.com/


Reply | Threaded
Open this post in threaded view
|

Re: Bug in error reporting for multi-line JSON

Hamid Akhtar


On Mon, Jan 25, 2021 at 6:08 PM Simon Riggs <[hidden email]> wrote:
On Thu, Jan 21, 2021 at 6:08 PM Tom Lane <[hidden email]> wrote:
>
> Simon Riggs <[hidden email]> writes:
> > JSON parsing reports the line number and relevant context info
> > incorrectly when the JSON contains newlines. Current code mostly just
> > says "LINE 1" and is misleading for error correction. There were no
> > tests for this previously.
>
> Couple thoughts:
>
> * I think you are wrong to have removed the line number bump that
> happened when report_json_context advances context_start over a
> newline.  The case is likely harder to get to now, but it can still
> happen can't it?  If it can't, we should remove that whole stanza.

OK, I'm playing around with this to see what is needed.

> * I'd suggest naming the new JsonLexContext field "pos_last_newline";
> "linefeed" is not usually the word we use for this concept.  (Although
> actually, it might work better if you make that point to the char
> *after* the newline, in which case "last_linestart" might be the
> right name.)

Yes, OK

> * I'm not enthused about back-patching.  This behavior seems like an
> improvement, but that doesn't mean people will appreciate changing it
> in stable branches.

OK, as you wish.

Thanks for the review, will post again soon with an updated patch.

I agree with Tom's feedback.. Whether you change pos_last_linefeed to point to the character after the linefeed or not, we can still simplify the for loop within the "report_json_context" function to:

=================
context_start = lex->input + lex->pos_last_linefeed;
context_start += (*context_start == '\n'); /* Let's move beyond the linefeed */
context_end = lex->token_terminator;
line_start = context_start;
while (context_end - context_start >= 50 && context_start < context_end)
{
/* Advance to next multibyte character */
if (IS_HIGHBIT_SET(*context_start))
context_start += pg_mblen(context_start);
else
context_start++;
}
=================

IMHO, this should work as pos_last_linefeed points to the position of the last linefeed before the error occurred, hence we can safely skip it and move the start_context forward.

 

--
Simon Riggs                http://www.EnterpriseDB.com/




--
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL:+923335449950  EMAIL: mailto:[hidden email]
SKYPE: engineeredvirus
Reply | Threaded
Open this post in threaded view
|

Re: Bug in error reporting for multi-line JSON

Hamid Akhtar


On Tue, Jan 26, 2021 at 2:07 PM Hamid Akhtar <[hidden email]> wrote:


On Mon, Jan 25, 2021 at 6:08 PM Simon Riggs <[hidden email]> wrote:
On Thu, Jan 21, 2021 at 6:08 PM Tom Lane <[hidden email]> wrote:
>
> Simon Riggs <[hidden email]> writes:
> > JSON parsing reports the line number and relevant context info
> > incorrectly when the JSON contains newlines. Current code mostly just
> > says "LINE 1" and is misleading for error correction. There were no
> > tests for this previously.
>
> Couple thoughts:
>
> * I think you are wrong to have removed the line number bump that
> happened when report_json_context advances context_start over a
> newline.  The case is likely harder to get to now, but it can still
> happen can't it?  If it can't, we should remove that whole stanza.

OK, I'm playing around with this to see what is needed.

> * I'd suggest naming the new JsonLexContext field "pos_last_newline";
> "linefeed" is not usually the word we use for this concept.  (Although
> actually, it might work better if you make that point to the char
> *after* the newline, in which case "last_linestart" might be the
> right name.)

Yes, OK

> * I'm not enthused about back-patching.  This behavior seems like an
> improvement, but that doesn't mean people will appreciate changing it
> in stable branches.

OK, as you wish.

Thanks for the review, will post again soon with an updated patch.

I agree with Tom's feedback.. Whether you change pos_last_linefeed to point to the character after the linefeed or not, we can still simplify the for loop within the "report_json_context" function to:

=================
context_start = lex->input + lex->pos_last_linefeed;
context_start += (*context_start == '\n'); /* Let's move beyond the linefeed */
context_end = lex->token_terminator;
line_start = context_start;
while (context_end - context_start >= 50 && context_start < context_end)
{
/* Advance to next multibyte character */
if (IS_HIGHBIT_SET(*context_start))
context_start += pg_mblen(context_start);
else
context_start++;
}
=================

IMHO, this should work as pos_last_linefeed points to the position of the last linefeed before the error occurred, hence we can safely skip it and move the start_context forward.


This thread has been inactive for more than a month now. 

So, I have reworked Simon's patch and incorporated Tom's feedback. The changes include:
- Changing the variable name from "pos_last_linefeed" to "last_linestart" as it now points to the character after the newline character,
- The "for" loop in report_json_context function has been significantly simplified and uses a while loop.
 
The attached patch is created against the current master branch.

 

--
Simon Riggs                http://www.EnterpriseDB.com/




--
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL:+923335449950  EMAIL: mailto:[hidden email]
SKYPE: engineeredvirus

json_error_context.v4.patch (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Bug in error reporting for multi-line JSON

Daniel Gustafsson
> On 27 Feb 2021, at 22:19, Hamid Akhtar <[hidden email]> wrote:

> This thread has been inactive for more than a month now.
>
> So, I have reworked Simon's patch and incorporated Tom's feedback. The changes include:
> - Changing the variable name from "pos_last_linefeed" to "last_linestart" as it now points to the character after the newline character,
> - The "for" loop in report_json_context function has been significantly simplified and uses a while loop.
>  
> The attached patch is created against the current master branch.

The updated version address the review comments, and pass all tests with no
documentation updates required.  Playing around with I was also unable to break
it.

I'm changing the status of this patch to Ready for Committer.

--
Daniel Gustafsson https://vmware.com/



Reply | Threaded
Open this post in threaded view
|

Re: Bug in error reporting for multi-line JSON

Tom Lane-2
Daniel Gustafsson <[hidden email]> writes:
> I'm changing the status of this patch to Ready for Committer.

I reviewed this and pushed it, with some changes.

I noted that there was a basically unused "line_start" field in
JsonLexContext, which seems clearly to have been meant to track
what the new field was going to track.  So we can fix this without
any new field by updating that at the right times.

I thought putting jsonb tests into json.sql was a bit poorly
thought out.  I ended up adding parallel tests to both json.sql
and jsonb.sql ... maybe that's overkill, but a lot of the rest
of those scripts is duplicative too.  The tests weren't exercising
the dots-at-start-of-line behavior, either.

                        regards, tom lane