pgbench's expression parsing & negative numbers

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

pgbench's expression parsing & negative numbers

Andres Freund
Hi,

working on overflow correctness in pg I noticed that pgbench isn't quite
there.  I assume it won't matter terribly often, but the way it parses
integers makes it incorrect for, at least, the negativemost number.

It directly parses the integer input using:
{digit}+ {
                                        yylval->ival = strtoint64(yytext);
                                        return INTEGER_CONST;
                                }

but that unfortunately means that the sign is no included in the call to
strtoint64. Which in turn means you can't correctly parse PG_INT64_MIN,
because that's not representable as a positive number on a two's
complement machine (i.e. everywhere).  Preliminary testing shows that
that can relatively easily fixed by just prefixing [-+]? to the relevant
exprscan.l rules.  But it might also not be a bad idea to simply defer
parsing of the number until exprparse.y has its hand on the number?

There's plenty other things wrong with overflow handling too, especially
evalFunc() basically just doesn't care.  I'll come up with a patch for
that sometime soon.

A third complaint I have is that the tap tests are pretty hard to parse
for humans.

Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: pgbench's expression parsing & negative numbers

Fabien COELHO-3

Hello Andres,

> working on overflow correctness in pg I noticed that pgbench isn't quite
> there.  I assume it won't matter terribly often, but the way it parses
> integers makes it incorrect for, at least, the negativemost number.
>
> It directly parses the integer input using:
> {digit}+ {
> yylval->ival = strtoint64(yytext);
> return INTEGER_CONST;
> }
>
> but that unfortunately means that the sign is no included in the call to
> strtoint64. Which in turn means you can't correctly parse PG_INT64_MIN,

Indeed. For a benchmarking script which generates a pseudo random load I
do not see as a big issue, but maybe I'm missing some use case.

> because that's not representable as a positive number on a two's
> complement machine (i.e. everywhere).  Preliminary testing shows that
> that can relatively easily fixed by just prefixing [-+]? to the relevant
> exprscan.l rules.

Beware that it must handle the unary/binary plus/minus case consistently:

   "-123" vs "a -123" vs "a + -123"

> But it might also not be a bad idea to simply defer parsing of the
> number until exprparse.y has its hand on the number?

It is usually simpler to let the lexer handle constants.

> There's plenty other things wrong with overflow handling too, especially
> evalFunc() basically just doesn't care.

Sure.

There are some overflow checking with div and double to int cast, which
were added because of previous complaints, but which are not very useful
to me.

ISTM that it is a voluntary feature, which handles int64 operations
with a modulo overflow like C. Generating exceptions on such overflows
does not look very attractive to me.

>  I'll come up with a patch for that sometime soon.

I wish you could provide some motivation: why does it matter much to a
benchmarking script to handle overflows with more case?

> A third complaint I have is that the tap tests are pretty hard to parse
> for humans.

Probably.

Perl is pretty hard to humans to start with:-)

There is a lot of code factorization to cram many tests together, so that
one test is more or less one line and there are hundreds of them. Not sure
it would help if it was expanded.

There are a lot of regular expressions to check outputs, which does not
help.

I wanted to have the pgbench scripts outside but this has been rejected,
so the tested scripts themselves are in the middle of the perl code, which
I think has degraded the readability significantly.

--
Fabien.

Reply | Threaded
Open this post in threaded view
|

Re: pgbench's expression parsing & negative numbers

Andres Freund
On 2017-12-12 07:21:21 +0100, Fabien COELHO wrote:

>
> Hello Andres,
>
> > working on overflow correctness in pg I noticed that pgbench isn't quite
> > there.  I assume it won't matter terribly often, but the way it parses
> > integers makes it incorrect for, at least, the negativemost number.
> >
> > It directly parses the integer input using:
> > {digit}+ {
> > yylval->ival = strtoint64(yytext);
> > return INTEGER_CONST;
> > }
> >
> > but that unfortunately means that the sign is no included in the call to
> > strtoint64. Which in turn means you can't correctly parse PG_INT64_MIN,
>
> Indeed. For a benchmarking script which generates a pseudo random load I do
> not see as a big issue, but maybe I'm missing some use case.

IDK, behaving correctly seems like a good idea. Also far from impossible
that that code gets propagated further.  Doesn't seem like an entirely
crazy idea that somebody actually wants to benchmark boundary cases,
which might be slower and such.

> > because that's not representable as a positive number on a two's
> > complement machine (i.e. everywhere).  Preliminary testing shows that
> > that can relatively easily fixed by just prefixing [-+]? to the relevant
> > exprscan.l rules.
>
> Beware that it must handle the unary/binary plus/minus case consistently:
>
>   "-123" vs "a -123" vs "a + -123"

Which is a lot easier to solve on the parser side of things...


> > But it might also not be a bad idea to simply defer parsing of the
> > number until exprparse.y has its hand on the number?
>
> It is usually simpler to let the lexer handle constants.


> > There's plenty other things wrong with overflow handling too, especially
> > evalFunc() basically just doesn't care.
>
> Sure.
>
> There are some overflow checking with div and double to int cast, which were
> added because of previous complaints, but which are not very useful to me.

I think handling it inconsistently is the worst of all worlds.


> ISTM that it is a voluntary feature, which handles int64 operations with a
> modulo overflow like C. Generating exceptions on such overflows does not
> look very attractive to me.

No, it's not really voluntary. Signed overflow isn't actually defined
behaviour in C. We kinda make it so on some platforms, but that's not
really a good idea.


> >  I'll come up with a patch for that sometime soon.
>
> I wish you could provide some motivation: why does it matter much to a
> benchmarking script to handle overflows with more case?

a) I want to be able to compile without -fwrapv in the near
   future. Which means you can't have signed overflows, because they're
   undefined behaviour in C.
b) I want to be able to compile with -ftrapv /
   -fsanitize=signed-integer-overflow, to be sure code is actually
   correct. Currently pgbench crashes with that.
c) Randomly handling things in some but not all places is a bad idea.


> > A third complaint I have is that the tap tests are pretty hard to parse
> > for humans.
>
> Probably.
>
> Perl is pretty hard to humans to start with:-)

Meh.


> There is a lot of code factorization to cram many tests together, so that
> one test is more or less one line and there are hundreds of them. Not sure
> it would help if it was expanded.

I don't think expanding it is really a problem, I think they're just
largely not well documented, formatted. With some effort the tests could
be a lot easier to read.


Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: pgbench's expression parsing & negative numbers

Fabien COELHO-3

Hello Andres,

>> There are some overflow checking with div and double to int cast, which were
>> added because of previous complaints, but which are not very useful to me.
>
> I think handling it inconsistently is the worst of all worlds.

Hmmm... I cannot say that inconsistency is a good thing, that would not
be consistent:-)

My 0.02€:

  - I do not think that updating pgbench arithmetic for managing integer
    overflows is worth Andres Freund time. My guess is that most
    script would not trigger client-side overflows, so the change would
    be a no-op in practice.

  - I think that pgbench has more important defects/missing features, to
    be fixed more urgently. Given the time patches spend in the cf queue,
    obviously committers disagree on this one:-)

Now ISTM that it does not harm anything to add such a feature, so fine
with me. Maybe the global compiler option removal is worth the effort.

--
Fabien.
Reply | Threaded
Open this post in threaded view
|

Re: pgbench's expression parsing & negative numbers

Andres Freund
Hi,

On 2017-12-14 10:41:04 +0100, Fabien COELHO wrote:
>  - I do not think that updating pgbench arithmetic for managing integer
>    overflows is worth Andres Freund time. My guess is that most
>    script would not trigger client-side overflows, so the change would
>    be a no-op in practice.

It might not be if you view it in isolation (although I'm not
convinced). The problem is that it has cost beyond pgbench. Due to
pgbench's overflow handling I can't run make check on a build that has
-ftrapv, which found several bugs already.

I'd be happy if somebody else would tackle the issue, but I don't quite
see it happening...

Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: pgbench's expression parsing & negative numbers

Fabien COELHO-3

Hello,

>>  - I do not think that updating pgbench arithmetic for managing integer
>>    overflows is worth Andres Freund time. My guess is that most
>>    script would not trigger client-side overflows, so the change would
>>    be a no-op in practice.
>
> It might not be if you view it in isolation (although I'm not
> convinced). The problem is that it has cost beyond pgbench. Due to
> pgbench's overflow handling

Lack of?

> I can't run make check on a build that has -ftrapv, which found several
> bugs already.

Hmmm. You suggest that integer overflows do occur when running pgbench.

Indeed, this tap test case: "\set maxint debug(:minint - 1)"

Otherwise, some stat counters may overflow on very long runs? Although
overflowing a int64 takes some time...

> I'd be happy if somebody else would tackle the issue, but I don't quite
> see it happening...

I must admit that it is not very high on my may-do list. I'll review it if
it appears, though.

--
Fabien.

Reply | Threaded
Open this post in threaded view
|

Re: pgbench's expression parsing & negative numbers

Fabien COELHO-3
In reply to this post by Andres Freund

Hello Andres,

> working on overflow correctness in pg I noticed that pgbench isn't quite
> there.

Indeed.

> I assume it won't matter terribly often, but the way it parses integers
> makes it incorrect for, at least, the negativemost number. [...] but
> that unfortunately means that the sign is no included in the call to
> strtoint64.

The strtoint64 function is indeed a mess. It does not really report errors
(more precisely, an error message is printed out, but the execution goes
on nevertheless...).

> Which in turn means you can't correctly parse PG_INT64_MIN,
> because that's not representable as a positive number on a two's
> complement machine (i.e. everywhere).  Preliminary testing shows that
> that can relatively easily fixed by just prefixing [-+]? to the relevant
> exprscan.l rules.

I'm not sure I get it, because then "1 -2" would be scanned as "int
signed_int", which requires to add some fine rules to the parser to handle
that as an addition, and I would be unclear about the priority handling,
eg "1 -2 * 3". But I agree that it can be made to work with transfering
the conversion to the parser and maybe some careful tweaking there.

> But it might also not be a bad idea to simply defer
> parsing of the number until exprparse.y has its hand on the number?
>
> There's plenty other things wrong with overflow handling too, especially
> evalFunc() basically just doesn't care.

Indeed.

Some overflow issues are easy to fix with the existing pg_*_s64_overflow
macros that you provided. It should also handle double2int64 overflows.

I have tried to trigger errors with a -fsanitize=signed-integer-overflow
compilation, without much success, but ISTM that you suggested that
pgbench does not work with that in another thread. Could you be more
precise?

> I'll come up with a patch for that sometime soon.

ISTM that you have not sent any patch on the subject, otherwise I would
have reviewed it. Maybe I could do one some time later, unless you think
that I should not.

--
Fabien.

Reply | Threaded
Open this post in threaded view
|

Re: pgbench's expression parsing & negative numbers

Fabien COELHO-3

Hello Andres,

>> I'll come up with a patch for that sometime soon.
>
> ISTM that you have not sent any patch on the subject, otherwise I would
> have reviewed it. Maybe I could do one some time later, unless you think
> that I should not.

Here is a patch which detects pgbench overflows on int & double constants,
and on integer operators.

--
Fabien.

pgbench-overflow-2.patch (17K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pgbench's expression parsing & negative numbers

Fabien COELHO-3

>>> I'll come up with a patch for that sometime soon.
>>
>> ISTM that you have not sent any patch on the subject, otherwise I would
>> have reviewed it. Maybe I could do one some time later, unless you think
>> that I should not.
>
> Here is a patch which detects pgbench overflows on int & double constants,
> and on integer operators.

... it but forgot to handle parsing min int, which was the initial focus
of this thread.

This patch does that as well by handling it as the special case between
lexer & parser (the issue being that 9223372036854775808 cannot be lexed
as an standard integer, as it is too large, and -9223372036854775808 is
really two tokens, so must be managed from the parser).

--
Fabien.

pgbench-overflow-3.patch (19K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pgbench's expression parsing & negative numbers

Ibrar Ahmed-4
The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:       tested, passed
Spec compliant:           tested, passed
Documentation:            not tested

Patch does not apply cleanly on the master branch, anyways I managed that.  Patch work according to specs, and no issue found.
The only minor nit is that you can keep the full comments of function strtoint64

/*
 * If not errorOK, an error message is printed out.
 * If errorOK is true, just return "false" for bad input.
 */
Reply | Threaded
Open this post in threaded view
|

Re: pgbench's expression parsing & negative numbers

Fabien COELHO-3

Hello,

> The following review has been posted through the commitfest application:
> make installcheck-world:  not tested
> Implements feature:       tested, passed
> Spec compliant:           tested, passed
> Documentation:            not tested
>
> Patch does not apply cleanly on the master branch, anyways I managed that.  Patch work according to specs, and no issue found.
> The only minor nit is that you can keep the full comments of function strtoint64
>
> /*
> * If not errorOK, an error message is printed out.
> * If errorOK is true, just return "false" for bad input.
> */
Thanks for the review.

Attached is a v4, with improved comments on strtoint64 as you requested.
I also added 2 "unlikely" compiler directives.

--
Fabien.

pgbench-overflow-4.patch (20K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pgbench's expression parsing & negative numbers

Ibrar Ahmed-4
Patch is good to go from my side.

The new status of this patch is: Ready for Committer
Reply | Threaded
Open this post in threaded view
|

Re: pgbench's expression parsing & negative numbers

Fabien COELHO-3

Hello Ibrar,

> The new status of this patch is: Ready for Committer

Ok, thanks. Let's see what committers think about it.

Andres, are you still interested in overflow detection and handling?

--
Fabien.

Reply | Threaded
Open this post in threaded view
|

Re: pgbench's expression parsing & negative numbers

Andres Freund
Hi,

On 2018-08-16 17:28:06 +0200, Fabien COELHO wrote:
> > The new status of this patch is: Ready for Committer
>
> Ok, thanks. Let's see what committers think about it.
>
> Andres, are you still interested in overflow detection and handling?

Yes.  I'll try to look at it at some point not too far away.

- Andres

Previous Thread Next Thread