mingw32 floating point diff

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

mingw32 floating point diff

Peter Eisentraut-6
Running the regression tests on mingw32, I get the following diff in
circle.out:

@@ -111,8 +111,8 @@
   WHERE (c1.f1 < c2.f1) AND ((c1.f1 <-> c2.f1) > 0)
   ORDER BY distance, area(c1.f1), area(c2.f1);
  five |      one       |      two       |     distance
-------+----------------+----------------+------------------
-      | <(3,5),0>      | <(1,2),3>      | 0.60555127546399
+------+----------------+----------------+-------------------
+      | <(3,5),0>      | <(1,2),3>      | 0.605551275463989
       | <(3,5),0>      | <(5,1),3>      | 1.47213595499958
       | <(100,200),10> | <(100,1),115>  |               74
       | <(100,200),10> | <(1,2),100>    | 111.370729772479

I only get this on master/PG12, but not on PG11, so I suspect that the
new floating-point output routines could be the root of the issue.

This happens only with the 32-bit build (mingw32), but not with a 64-bit
build (mingw64).

Any suggestions on how to analyze this further?

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

mingw32 floating point diff

Peter Eisentraut-6
On 2019-08-20 14:59, Peter Eisentraut wrote:

> Running the regression tests on mingw32, I get the following diff in
> circle.out:
>
> @@ -111,8 +111,8 @@
>    WHERE (c1.f1 < c2.f1) AND ((c1.f1 <-> c2.f1) > 0)
>    ORDER BY distance, area(c1.f1), area(c2.f1);
>   five |      one       |      two       |     distance
> -------+----------------+----------------+------------------
> -      | <(3,5),0>      | <(1,2),3>      | 0.60555127546399
> +------+----------------+----------------+-------------------
> +      | <(3,5),0>      | <(1,2),3>      | 0.605551275463989
>        | <(3,5),0>      | <(5,1),3>      | 1.47213595499958
>        | <(100,200),10> | <(100,1),115>  |               74
>        | <(100,200),10> | <(1,2),100>    | 111.370729772479
> 
> I only get this on master/PG12, but not on PG11, so I suspect that the
> new floating-point output routines could be the root of the issue.
>
> This happens only with the 32-bit build (mingw32), but not with a 64-bit
> build (mingw64).

OK, the problem isn't the new output routines.  The result of the
computations is actually different.  The test itself is new in PG12.

The difference in output is due to the mingw32 target using -mfpmath=387
by default.  If you build with -mfpmath=sse -msee, the tests pass again.

Do we care to do anything about this?  Pick slightly different test data
perhaps?

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Reply | Threaded
Open this post in threaded view
|

Re: mingw32 floating point diff

Tom Lane-2
Peter Eisentraut <[hidden email]> writes:
> On 2019-08-20 14:59, Peter Eisentraut wrote:
>> Running the regression tests on mingw32, I get the following diff in
>> circle.out:
...
> OK, the problem isn't the new output routines.  The result of the
> computations is actually different.  The test itself is new in PG12.
> The difference in output is due to the mingw32 target using -mfpmath=387
> by default.  If you build with -mfpmath=sse -msee, the tests pass again.

Hm, so presumably we could replicate this on other Intel-oid platforms
by changing compiler switches?  (I haven't tried.)

> Do we care to do anything about this?  Pick slightly different test data
> perhaps?

Picking different test data might be a good "fix".  Alternatively, we
could try to figure out where the discrepancy is arising and adjust
the code --- but that might be a lot more work than it's worth.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: mingw32 floating point diff

Tom Lane-2
I wrote:
> Peter Eisentraut <[hidden email]> writes:
>> Do we care to do anything about this?  Pick slightly different test data
>> perhaps?

> Picking different test data might be a good "fix".  Alternatively, we
> could try to figure out where the discrepancy is arising and adjust
> the code --- but that might be a lot more work than it's worth.

I poked at this a bit more.  I can reproduce the problem by using
-mfpmath=387 on dromedary's host (fairly old 32-bit macOS); although
I also get half a dozen *other* failures in the core regression tests,
mostly around detection of float overflow.  So I'm not quite sure that
this is comparable.  But at any rate, I tracked the core of the problem
to pg_hypot:

        /* Determine the hypotenuse */
        yx = y / x;
        result = x * sqrt(1.0 + (yx * yx));

With -mfpmath=387, these calculations are done to more-than-64-bit
precision, yielding a different end result --- note in particular
that sqrt() is a hardware instruction on this platform, so it's
not rounding either.

I experimented with preventing that by using volatile intermediate
variables (cf comments in float.c); but it seemed like a mess,
and it would likely pessimize the code more than we want for other
platforms, and it's kind of hard to argue that deliberately sabotaging
the more-accurate computation is an improvement.

What I suggest doing is reducing extra_float_digits to -1 for this
specific test.  Changing the contents of circle_tbl seems like it'd have
more consequences than we want, in particular there's no guarantee that
we'd not hit similar issues in other tests if they're given different
inputs.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: mingw32 floating point diff

Peter Eisentraut-6
On 2019-08-22 18:19, Tom Lane wrote:
> What I suggest doing is reducing extra_float_digits to -1 for this
> specific test.  Changing the contents of circle_tbl seems like it'd have
> more consequences than we want, in particular there's no guarantee that
> we'd not hit similar issues in other tests if they're given different
> inputs.

I agree that reducing the output precision is better than adjusting the
code.

The circle.sql file already has SET extra_float_digits TO 0, and a few
other files have other settings with different values.  Are we content
to use various numbers until it works in each case, or should we try to
use some consistency?

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: mingw32 floating point diff

Andrew Dunstan-8
In reply to this post by Peter Eisentraut-6

On 8/20/19 8:59 AM, Peter Eisentraut wrote:

> Running the regression tests on mingw32, I get the following diff in
> circle.out:
>
> @@ -111,8 +111,8 @@
>    WHERE (c1.f1 < c2.f1) AND ((c1.f1 <-> c2.f1) > 0)
>    ORDER BY distance, area(c1.f1), area(c2.f1);
>   five |      one       |      two       |     distance
> -------+----------------+----------------+------------------
> -      | <(3,5),0>      | <(1,2),3>      | 0.60555127546399
> +------+----------------+----------------+-------------------
> +      | <(3,5),0>      | <(1,2),3>      | 0.605551275463989
>        | <(3,5),0>      | <(5,1),3>      | 1.47213595499958
>        | <(100,200),10> | <(100,1),115>  |               74
>        | <(100,200),10> | <(1,2),100>    | 111.370729772479
> 
> I only get this on master/PG12, but not on PG11, so I suspect that the
> new floating-point output routines could be the root of the issue.
>
> This happens only with the 32-bit build (mingw32), but not with a 64-bit
> build (mingw64).
>
> Any suggestions on how to analyze this further?
>


I complained about this a year ago:
<https://postgr.es/m/9f4f22be-f9f1-b350-bc06-521226b87f7a@...>


+1 for fixing it by any reasonable means.


cheers


andrew


--
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Reply | Threaded
Open this post in threaded view
|

Re: mingw32 floating point diff

Tom Lane-2
In reply to this post by Peter Eisentraut-6
Peter Eisentraut <[hidden email]> writes:
> The circle.sql file already has SET extra_float_digits TO 0, and a few
> other files have other settings with different values.  Are we content
> to use various numbers until it works in each case, or should we try to
> use some consistency?

The one in rules.sql doesn't count here.  Everyplace else seems to be
using 0, except for the one in geometry.sql, which seems to be perhaps
unreasonably conservative.  Might be worth researching why it's -3.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: mingw32 floating point diff

hasegeli
In reply to this post by Tom Lane-2
> I poked at this a bit more.  I can reproduce the problem by using
> -mfpmath=387 on dromedary's host (fairly old 32-bit macOS); although
> I also get half a dozen *other* failures in the core regression tests,
> mostly around detection of float overflow.  So I'm not quite sure that
> this is comparable.  But at any rate, I tracked the core of the problem
> to pg_hypot:

I couldn't test if it helps, but another solution may be is to rip out
pg_hypot() in favour of the libc implementation.  This was discussed
in detail as part of "Improve geometric types" thread.

Last comment about it:

https://www.postgresql.org/message-id/9223.1507039405%40sss.pgh.pa.us


Reply | Threaded
Open this post in threaded view
|

Re: mingw32 floating point diff

Tom Lane-2
Emre Hasegeli <[hidden email]> writes:
> I couldn't test if it helps, but another solution may be is to rip out
> pg_hypot() in favour of the libc implementation.  This was discussed
> in detail as part of "Improve geometric types" thread.

Hm ... the problem we're trying to fix here is platform-varying results.
Surely switching to the libc implementation would make that worse not
better?

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: mingw32 floating point diff

Peter Eisentraut-6
In reply to this post by Tom Lane-2
On 2019-08-23 15:50, Tom Lane wrote:
> Peter Eisentraut <[hidden email]> writes:
>> The circle.sql file already has SET extra_float_digits TO 0, and a few
>> other files have other settings with different values.  Are we content
>> to use various numbers until it works in each case, or should we try to
>> use some consistency?
>
> The one in rules.sql doesn't count here.  Everyplace else seems to be
> using 0, except for the one in geometry.sql, which seems to be perhaps
> unreasonably conservative.  Might be worth researching why it's -3.

I can confirm that SET extra_float_digits TO -1 in circle.sql fixes the
original complaint.

I don't understand this stuff enough to be able to provide a good source
code comment or commit message, so I'd appreciate some help or someone
else to proceed with this change.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: mingw32 floating point diff

Tom Lane-2
Peter Eisentraut <[hidden email]> writes:
> I can confirm that SET extra_float_digits TO -1 in circle.sql fixes the
> original complaint.

Cool.  It did on dromedary, but that doesn't necessarily prove much
about other compilers :-(

> I don't understand this stuff enough to be able to provide a good source
> code comment or commit message, so I'd appreciate some help or someone
> else to proceed with this change.

Sure, I'll take it.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: mingw32 floating point diff

Tom Lane-2
In reply to this post by Andrew Dunstan-8
Andrew Dunstan <[hidden email]> writes:
> On 8/20/19 8:59 AM, Peter Eisentraut wrote:
>> Running the regression tests on mingw32, I get the following diff in
>> circle.out:
>> -      | <(3,5),0>      | <(1,2),3>      | 0.60555127546399
>> +      | <(3,5),0>      | <(1,2),3>      | 0.605551275463989

> I complained about this a year ago:
> <https://postgr.es/m/9f4f22be-f9f1-b350-bc06-521226b87f7a@...>
> +1 for fixing it by any reasonable means.

Now that that fix is in, could we get a buildfarm member running on
such a platform?  It seems to behave differently from anything else.

I tracked down the residual regression failures on dromedary with
-mfpmath=387, and found that they occur because check_float8_val
gets inlined and then its tests for "isinf(val)" and "val == 0.0"
are applied to the 80-bit intermediate results not the 64-bit form.
I find it odd that we apparently don't have the same issues on
mingw32.

I'm very hesitant to apply a volatile-qualification approach to
eliminate those issues, for fear of pessimizing performance-critical
code on more modern platforms.  I wonder whether there is a reasonable
way to tell at compile time if we have a platform with 80-bit math.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: mingw32 floating point diff

Tom Lane-2
I wrote:
> I'm very hesitant to apply a volatile-qualification approach to
> eliminate those issues, for fear of pessimizing performance-critical
> code on more modern platforms.  I wonder whether there is a reasonable
> way to tell at compile time if we have a platform with 80-bit math.

Hmmm ... I find that dromedary's compiler predefines __FLT_EVAL_METHOD__
as 2 not 0 when -mfpmath=387 is given.  This seems to be something
that was standardized in C99 (without the double underscores), so
maybe we could do something like

#if __FLT_EVAL_METHOD__ > 0 || FLT_EVAL_METHOD > 0

to conditionalize whether we try to force the evaluation width in
check_float8_val and check_float4_val.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: mingw32 floating point diff

Andrew Dunstan-8
In reply to this post by Tom Lane-2

On 8/25/19 4:23 PM, Tom Lane wrote:

> Andrew Dunstan <[hidden email]> writes:
>> On 8/20/19 8:59 AM, Peter Eisentraut wrote:
>>> Running the regression tests on mingw32, I get the following diff in
>>> circle.out:
>>> -      | <(3,5),0>      | <(1,2),3>      | 0.60555127546399
>>> +      | <(3,5),0>      | <(1,2),3>      | 0.605551275463989
>> I complained about this a year ago:
>> <https://postgr.es/m/9f4f22be-f9f1-b350-bc06-521226b87f7a@...>
>> +1 for fixing it by any reasonable means.
> Now that that fix is in, could we get a buildfarm member running on
> such a platform?  It seems to behave differently from anything else.
>


I'm pretty much tapped out for Windows resources, I have one physical
and one virtual machine which do nothing but run my 6 Windows based animals.


I don't know if the community has spare resources available from those
that Amazon donate to us. There is already one animal I manage running
there, so maybe another would be feasible.


cheers


andrew


--
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Reply | Threaded
Open this post in threaded view
|

Re: mingw32 floating point diff

Tom Lane-2
In reply to this post by Tom Lane-2
I wrote:
>> I'm very hesitant to apply a volatile-qualification approach to
>> eliminate those issues, for fear of pessimizing performance-critical
>> code on more modern platforms.  I wonder whether there is a reasonable
>> way to tell at compile time if we have a platform with 80-bit math.

> Hmmm ... I find that dromedary's compiler predefines __FLT_EVAL_METHOD__
> as 2 not 0 when -mfpmath=387 is given.  This seems to be something
> that was standardized in C99 (without the double underscores), so
> maybe we could do something like
> #if __FLT_EVAL_METHOD__ > 0 || FLT_EVAL_METHOD > 0

After further poking around, it seems that testing FLT_EVAL_METHOD
should be sufficient --- <float.h> appears to define that correctly
even in very old C99 installations.

However, I'm losing interest in the problem after finding that I can't
reproduce it anywhere except on dromedary (with "-mfpmath=387" added).

For instance, I have a 32-bit FreeBSD system with what claims to be
the same compiler (gcc 4.2.1), but it passes regression just fine,
with or without -mfpmath=387.  Earlier and later gcc versions also
don't show a problem.  I suspect that Apple bollixed something with
local mods to their version of 4.2.1, or possibly they are allowing
inlining of isinf() in a way that nobody else does.

Also, using that compiler with "-mfpmath=387", I see that every
supported PG version back to 9.4 fails regression due to not
detecting float8 multiply overflow.  So this isn't a problem that
we introduced with v12's changes, as I'd first suspected; and it's
not a problem that anyone is hitting in the field, or we'd have
heard complaints.

So, barring somebody showing that we have an issue on some platform
that people actually care about currently, I'm inclined not to do
anything more here.

                        regards, tom lane