Ryu floating point output patch

classic Classic list List threaded Threaded
50 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Ryu floating point output patch

Andrew Gierth
This is a mostly cleaned-up version of the test patch I posted
previously for floating-point output using the Ryu algorithm.

Upstream source: github.com/ulfjack/ryu/ryu at commit 795c8b57a

From the upstream, I've taken only specific files which are
Boost-licensed, removed code not of interest to us, removed C99-isms,
applied project style for things like type names and use of INT64CONST,
removed some ad-hoc configuration #ifs in favour of using values
established by pg_config.h, and ran the whole thing through pgindent and
fixed up the resulting wreckage.

On top of that I made these functional changes:

1. Added an alternative fixed-point output format so that values with
exponents not less than -4 and less than the default precision for the
type are formatted in fixed-point, as sprintf does.

2. Exponents now always have a sign and at least two digits, also as per
sprintf formatting rules.

The fixed-point output hasn't been micro-optimized to the same extent as
the rest of the code, and there is a measurable performance effect - but
it's not that significant compared to the speedup over the existing code.

As for the extent of that speedup: in my tests, float8 output is now as
fast or marginally faster than bigint output, and roughly one order of
magnitude faster than the existing float8 output.

This version of the patch continues to use extra_float_digits to select
whether Ryu output is used - but I've also changed the default, so that
Ryu is used unless you explicitly set extra_float_digits to 0 or less.

This does mean that at the moment, about 13 regression tests fail on
this patch due to the higher precision of float output. I have
intentionally not yet adjusted those results, pending discussion.

--
Andrew (irc:RhodiumToad)


ryu.patch (101K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Ryu floating point output patch

Andres Freund
Hi,

On 2018-12-13 19:41:55 +0000, Andrew Gierth wrote:
> This is a mostly cleaned-up version of the test patch I posted
> previously for floating-point output using the Ryu algorithm.

Nice!


> From the upstream, I've taken only specific files which are
> Boost-licensed, removed code not of interest to us, removed C99-isms,
> applied project style for things like type names and use of INT64CONST,
> removed some ad-hoc configuration #ifs in favour of using values
> established by pg_config.h, and ran the whole thing through pgindent and
> fixed up the resulting wreckage.

Removed C99isms? Why, we allow that these days? Did you mean C11?



> +static inline uint64
> +mulShiftAll(const uint64 m, const uint64 *const mul, const int32 j,
> + uint64 *const vp, uint64 *const vm, const uint32 mmShift)
> +{
> +/*   m <<= 2; */
> +/*   uint128 b0 = ((uint128) m) * mul[0]; // 0 */
> +/*   uint128 b2 = ((uint128) m) * mul[1]; // 64 */
> +/*  */
> +/*   uint128 hi = (b0 >> 64) + b2; */
> +/*   uint128 lo = b0 & 0xffffffffffffffffull; */
> +/*   uint128 factor = (((uint128) mul[1]) << 64) + mul[0]; */
> +/*   uint128 vpLo = lo + (factor << 1); */
> +/*   *vp = (uint64) ((hi + (vpLo >> 64)) >> (j - 64)); */
> +/*   uint128 vmLo = lo - (factor << mmShift); */
> +/*   *vm = (uint64) ((hi + (vmLo >> 64) - (((uint128) 1ull) << 64)) >> (j - 64)); */
> +/*   return (uint64) (hi >> (j - 64)); */
> + *vp = mulShift(4 * m + 2, mul, j);
> + *vm = mulShift(4 * m - 1 - mmShift, mul, j);
> + return mulShift(4 * m, mul, j);
> +}

What's with all the commented out code?  I'm not sure that keeping close
alignment with the original codebase with things like that has much
value given the extent of the reformatting and functional changes?


> +/*  These tables are generated by PrintDoubleLookupTable. */

This kind of reference is essentially dangling now, so perhaps we should
rewrite that?


> +++ b/src/common/d2s_intrinsics.h

> +static inline uint64
> +umul128(const uint64 a, const uint64 b, uint64 *const productHi)
> +{
> + return _umul128(a, b, productHi);
> +}

These are fairly generic names, perhaps we ought to prefix them?



Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: Ryu floating point output patch

Andrew Gierth
>>>>> "Andres" == Andres Freund <[hidden email]> writes:

 >> From the upstream, I've taken only specific files which are
 >> Boost-licensed, removed code not of interest to us, removed
 >> C99-isms, applied project style for things like type names and use
 >> of INT64CONST, removed some ad-hoc configuration #ifs in favour of
 >> using values established by pg_config.h, and ran the whole thing
 >> through pgindent and fixed up the resulting wreckage.

 Andres> Removed C99isms? Why, we allow that these days? Did you mean
 Andres> C11?

My understanding is that we do not allow // comments or mixed
declarations and code (other than loop control variables).

This code in particular was very heavily into using mixed declarations
and code throughout. Removing those was moderately painful.

 Andres> What's with all the commented out code?

Just stuff from upstream I didn't take out.

 Andres> I'm not sure that keeping close alignment with the original
 Andres> codebase with things like that has much value given the extent
 Andres> of the reformatting and functional changes?

Probably not, but otherwise I did not remove much in the way of upstream
comments or edit them except for formatting.

 >> +/*  These tables are generated by PrintDoubleLookupTable. */

 Andres> This kind of reference is essentially dangling now, so perhaps
 Andres> we should rewrite that?

Well, it's probably still useful to point out that they're generated
(though not by us); I guess it should make clear where the generator is.

 >> +++ b/src/common/d2s_intrinsics.h

 >> +static inline uint64
 >> +umul128(const uint64 a, const uint64 b, uint64 *const productHi)
 >> +{
 >> + return _umul128(a, b, productHi);
 >> +}

 Andres> These are fairly generic names, perhaps we ought to prefix
 Andres> them?

Maybe, but presumably nothing else is going to include this file.

--
Andrew (irc:RhodiumToad)

Reply | Threaded
Open this post in threaded view
|

Re: Ryu floating point output patch

Andres Freund
Hi,

On 2018-12-13 20:23:51 +0000, Andrew Gierth wrote:

> >>>>> "Andres" == Andres Freund <[hidden email]> writes:
>
>  >> From the upstream, I've taken only specific files which are
>  >> Boost-licensed, removed code not of interest to us, removed
>  >> C99-isms, applied project style for things like type names and use
>  >> of INT64CONST, removed some ad-hoc configuration #ifs in favour of
>  >> using values established by pg_config.h, and ran the whole thing
>  >> through pgindent and fixed up the resulting wreckage.
>
>  Andres> Removed C99isms? Why, we allow that these days? Did you mean
>  Andres> C11?
>
> My understanding is that we do not allow // comments or mixed
> declarations and code (other than loop control variables).

Ah, that makes sense.


> This code in particular was very heavily into using mixed declarations
> and code throughout. Removing those was moderately painful.

I wonder if we should instead relax those restriction for the largely
foreign pieces of code?



>  >> +/*  These tables are generated by PrintDoubleLookupTable. */
>
>  Andres> This kind of reference is essentially dangling now, so perhaps
>  Andres> we should rewrite that?
>
> Well, it's probably still useful to point out that they're generated
> (though not by us); I guess it should make clear where the generator is.

Right it should definitely be explained. But I do think pointing at the
source, would be good. I kind of wonder if we should, if we were to
accept something like this patch, clone the original ryu repository to
either git.pg.org, or at least into pg's github repository?  It'd be
annoying if the original authors moved on and deleted the repository.

Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: Ryu floating point output patch

Andrew Gierth
>>>>> "Andres" == Andres Freund <[hidden email]> writes:

 >> This code in particular was very heavily into using mixed
 >> declarations and code throughout. Removing those was moderately
 >> painful.

 Andres> I wonder if we should instead relax those restriction for the
 Andres> largely foreign pieces of code?

Do we have any reason for the restriction beyond stylistic preference?

 Andres> Right it should definitely be explained. But I do think
 Andres> pointing at the source, would be good. I kind of wonder if we
 Andres> should, if we were to accept something like this patch, clone
 Andres> the original ryu repository to either git.pg.org, or at least
 Andres> into pg's github repository? It'd be annoying if the original
 Andres> authors moved on and deleted the repository.

Keeping a copy on git.pg.org seems like a reasonable thing to do.

--
Andrew (irc:RhodiumToad)

Reply | Threaded
Open this post in threaded view
|

Re: Ryu floating point output patch

Andres Freund
Hi,

On 2018-12-13 20:53:23 +0000, Andrew Gierth wrote:

> >>>>> "Andres" == Andres Freund <[hidden email]> writes:
>
>  >> This code in particular was very heavily into using mixed
>  >> declarations and code throughout. Removing those was moderately
>  >> painful.
>
>  Andres> I wonder if we should instead relax those restriction for the
>  Andres> largely foreign pieces of code?
>
> Do we have any reason for the restriction beyond stylistic preference?

No. Robert and Tom were against allowing mixing declarations and code, a
few more against // comments. I don't quite agree with either, but
getting to the rest of C99 seemed more important than fighting those out
at that moment.

Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: Ryu floating point output patch

Andrew Gierth
In reply to this post by Andrew Gierth
>>>>> "Andrew" == Andrew Gierth <[hidden email]> writes:

 Andrew> This code in particular was very heavily into using mixed
 Andrew> declarations and code throughout. Removing those was moderately
 Andrew> painful.

And it turns out I missed one, sigh.

--
Andrew (irc:RhodiumToad)

Reply | Threaded
Open this post in threaded view
|

Re: Ryu floating point output patch

Andrew Gierth
>>>>> "Andrew" == Andrew Gierth <[hidden email]> writes:

 Andrew> This code in particular was very heavily into using mixed
 Andrew> declarations and code throughout. Removing those was moderately
 Andrew> painful.

 Andrew> And it turns out I missed one, sigh.

Updated patch.

--
Andrew (irc:RhodiumToad)


ryu2.patch (101K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Ryu floating point output patch

Thomas Munro-3
In reply to this post by Andres Freund
On Fri, Dec 14, 2018 at 8:00 AM Andres Freund <[hidden email]> wrote:

> Hi,
>
> On 2018-12-13 20:53:23 +0000, Andrew Gierth wrote:
> > >>>>> "Andres" == Andres Freund <[hidden email]> writes:
> >
> >  >> This code in particular was very heavily into using mixed
> >  >> declarations and code throughout. Removing those was moderately
> >  >> painful.
> >
> >  Andres> I wonder if we should instead relax those restriction for the
> >  Andres> largely foreign pieces of code?
> >
> > Do we have any reason for the restriction beyond stylistic preference?
>
> No. Robert and Tom were against allowing mixing declarations and code, a
> few more against // comments. I don't quite agree with either, but
> getting to the rest of C99 seemed more important than fighting those out
> at that moment.

-1 for making superficial changes to code that we are "vendoring",
unless it is known to be dead/abandoned and we're definitively forking
it.  It just makes it harder to track upstream's bug fixes and
improvements, surely?

--
Thomas Munro
http://www.enterprisedb.com

Reply | Threaded
Open this post in threaded view
|

Re: Ryu floating point output patch

Andrew Gierth
In reply to this post by Andrew Gierth
>>>>> "Andrew" == Andrew Gierth <[hidden email]> writes:

 Andrew> This code in particular was very heavily into using mixed
 Andrew> declarations and code throughout. Removing those was moderately
 Andrew> painful.

 Andrew> And it turns out I missed one, sigh.

 Andrew> Updated patch.

And again to fix the windows build - why does Mkvcbuild.pm have its own
private copy of the file list for src/common?

--
Andrew (irc:RhodiumToad)


ryu3.patch (101K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Ryu floating point output patch

Alvaro Herrera-9
On 2018-Dec-13, Andrew Gierth wrote:

> And again to fix the windows build - why does Mkvcbuild.pm have its own
> private copy of the file list for src/common?

I think at some point the Makefile parsing code was too stupid to deal
with the src/port Makefile, so it was hardcoded; later probably I
cargo-culted that into the src/common makefile.  Maybe the parser has
already improved (or the makefile simplified) that the msvc tooling can
extract the files from the makefile?  Dunno.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: Ryu floating point output patch

Andrew Gierth
In reply to this post by Andrew Gierth
>>>>> "Andrew" == Andrew Gierth <[hidden email]> writes:

 Andrew> And again to fix the windows build

And again to see if it actually compiles now...

--
Andrew (irc:RhodiumToad)


ryu4.patch (101K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Ryu floating point output patch

Andrew Gierth
In reply to this post by Thomas Munro-3
>>>>> "Thomas" == Thomas Munro <[hidden email]> writes:

 >>> Do we have any reason for the restriction beyond stylistic preference?

 >> No. Robert and Tom were against allowing mixing declarations and
 >> code, a few more against // comments. I don't quite agree with
 >> either, but getting to the rest of C99 seemed more important than
 >> fighting those out at that moment.

 Thomas> -1 for making superficial changes to code that we are
 Thomas> "vendoring", unless it is known to be dead/abandoned and we're
 Thomas> definitively forking it. It just makes it harder to track
 Thomas> upstream's bug fixes and improvements, surely?

Well, the question there is how far to take that principle; do we avoid
pgindent'ing the code, do we avoid changing uint64_t etc. to uint64
etc., and so on.

(I notice that a stray uint8_t that I left behind broke the Windows
build but not the linux/bsd one, so something would have to be fixed in
the windows build in order to avoid having to change that.)

The Ryu code is perhaps an extreme example because it follows almost the
exact reverse of our coding standard.

--
Andrew (irc:RhodiumToad)

Reply | Threaded
Open this post in threaded view
|

Re: Ryu floating point output patch

Tom Lane-2
Andrew Gierth <[hidden email]> writes:
> "Thomas" == Thomas Munro <[hidden email]> writes:
>  Thomas> -1 for making superficial changes to code that we are
>  Thomas> "vendoring", unless it is known to be dead/abandoned and we're
>  Thomas> definitively forking it. It just makes it harder to track
>  Thomas> upstream's bug fixes and improvements, surely?

> Well, the question there is how far to take that principle; do we avoid
> pgindent'ing the code, do we avoid changing uint64_t etc. to uint64
> etc., and so on.
> The Ryu code is perhaps an extreme example because it follows almost the
> exact reverse of our coding standard.

My heart kind of sinks when looking at this patch, because it's a
large addition of not-terribly-well-documented code that nobody here
really understands, never mind the problem that it's mostly not close
to our own coding standards.

Our track record in borrowing code from "upstream" projects is pretty
miserable: almost without exception, we've found ourselves stuck with
maintaining such code ourselves after a few years.  I don't see any
reason to think that wouldn't be true here; in fact there's much more
reason to worry here than we had for, eg, borrowing the regex code.
The maintenance track record of this github repo appears to span six
months, and it's now been about four months since the last commit.
It might be abandonware already.

Is this a path we really want to go down?  I'm not convinced the
cost/benefit ratio is attractive.

If we do go down this path, though, I'm not too worried about making
it simple to absorb upstream fixes; I bet there will be few to none.
(Still, you might want to try to automate and document the coding
format conversion steps, along the line of what I've done recently
for our copy of tzcode.)

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: Ryu floating point output patch

Andres Freund
Hi,

On 2018-12-14 13:25:29 -0500, Tom Lane wrote:
> Our track record in borrowing code from "upstream" projects is pretty
> miserable: almost without exception, we've found ourselves stuck with
> maintaining such code ourselves after a few years.  I don't see any
> reason to think that wouldn't be true here; in fact there's much more
> reason to worry here than we had for, eg, borrowing the regex code.
> The maintenance track record of this github repo appears to span six
> months, and it's now been about four months since the last commit.
> It might be abandonware already.

It's been absorbed into MSVC's standard library and a bunch of other
projects, so there's certainly some other prominent adoption.

The last commit was a month ago, no? November 6th afaict.


> Is this a path we really want to go down?  I'm not convinced the
> cost/benefit ratio is attractive.

float->text conversion is one of the major bottlenecks when backing up
postgres, it's definitely a pain-point in practice. I've not really seen
a nicer implementation anywhere, not even close.

Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: Ryu floating point output patch

Tom Lane-2
Andres Freund <[hidden email]> writes:
> On 2018-12-14 13:25:29 -0500, Tom Lane wrote:
>> The maintenance track record of this github repo appears to span six
>> months, and it's now been about four months since the last commit.
>> It might be abandonware already.

> The last commit was a month ago, no? November 6th afaict.

Hmm, the commit history I was looking at ended on Aug 19, but I must've
done something wrong, because going in from a different angle shows
me commits up to November.

> It's been absorbed into MSVC's standard library and a bunch of other
> projects, so there's certainly some other prominent adoption.

If we think that's going on, maybe we should just sit tight till glibc
absorbs it.

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: Ryu floating point output patch

Andres Freund
Hi,

On 2018-12-14 13:47:53 -0500, Tom Lane wrote:
> Andres Freund <[hidden email]> writes:
> > It's been absorbed into MSVC's standard library and a bunch of other
> > projects, so there's certainly some other prominent adoption.
>
> If we think that's going on, maybe we should just sit tight till glibc
> absorbs it.

All the stuff it'll have to put above it will make it slower
anyway. It's not like this'll be a feature that's hard to rip out if all
libc's have fast roundtrip safe conversion routines, or if something
better comes along.

Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: Ryu floating point output patch

Andrew Gierth
In reply to this post by Tom Lane-2
>>>>> "Tom" == Tom Lane <[hidden email]> writes:

 Tom> The maintenance track record of this github repo appears to span
 Tom> six months,

The algorithm was only published six months ago.

--
Andrew (irc:RhodiumToad)

Reply | Threaded
Open this post in threaded view
|

Re: Ryu floating point output patch

Tom Lane-2
Andrew Gierth <[hidden email]> writes:
> "Tom" == Tom Lane <[hidden email]> writes:
>  Tom> The maintenance track record of this github repo appears to span
>  Tom> six months,

> The algorithm was only published six months ago.

Doesn't really affect my point: there's little reason to believe that
there will be an active upstream several years down the pike.

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: Ryu floating point output patch

Andrew Gierth
In reply to this post by Tom Lane-2
>>>>> "Tom" == Tom Lane <[hidden email]> writes:

 Tom> (Still, you might want to try to automate and document the coding
 Tom> format conversion steps, along the line of what I've done recently
 Tom> for our copy of tzcode.)

Working around our declaration-after-statement prohibition required
manually moving some lines of code, manually separating some
declarations from their assignments (removing const qualifiers), and as
a last resort introducing new code blocks. I doubt it could be automated
in any sane way. (This might be an argument to relax that rule.)

Mechanical search-and-replace accounted for the _t suffix on types, the
addition of the UINT64CONSTs, and changing assert to Assert; that much
could be automated.

pgindent did a pretty horrible job on the comments, a script could
probably do better.

I guess removal of the unwanted #ifs could be automated without too much
difficulty.

(But I'm not likely going to do any of this in the forseeable future,
because I don't expect it to be useful.)

--
Andrew (irc:RhodiumToad)

123