Use of "long" in incremental sort code

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

Use of "long" in incremental sort code

David Rowley
Hi,

I noticed the incremental sort code makes use of the long datatype a
few times, e.g in TuplesortInstrumentation and
IncrementalSortGroupInfo.  (64-bit windows machines have sizeof(long)
== 4).  I understand that the values are in kilobytes and it would
take 2TB to cause them to wrap. Never-the-less, I think it would be
better to choose a better-suited type. work_mem is still limited to
2GB on 64-bit Windows machines, so perhaps there's some argument that
it does not matter about fields that related to in-memory stuff, but
the on-disk fields are wrong.  The in-memory fields likely raise the
bar further for fixing the 2GB work_mem limit on Windows.

Maybe Size would be better for the in-memory fields and uint64 for the
on-disk fields?

David


Reply | Threaded
Open this post in threaded view
|

Re: Use of "long" in incremental sort code

Tom Lane-2
David Rowley <[hidden email]> writes:

> I noticed the incremental sort code makes use of the long datatype a
> few times, e.g in TuplesortInstrumentation and
> IncrementalSortGroupInfo.  (64-bit windows machines have sizeof(long)
> == 4).  I understand that the values are in kilobytes and it would
> take 2TB to cause them to wrap. Never-the-less, I think it would be
> better to choose a better-suited type. work_mem is still limited to
> 2GB on 64-bit Windows machines, so perhaps there's some argument that
> it does not matter about fields that related to in-memory stuff, but
> the on-disk fields are wrong.  The in-memory fields likely raise the
> bar further for fixing the 2GB work_mem limit on Windows.

> Maybe Size would be better for the in-memory fields and uint64 for the
> on-disk fields?

There is a fairly widespread issue that memory-size-related GUCs and
suchlike variables are limited to represent sizes that fit in a "long".
Although Win64 is the *only* platform where that's an issue, maybe
it's worth doing something about.  But we shouldn't just fix the sort
code, if we do do something.

(IOW, I don't agree with doing a fix that doesn't also fix work_mem.)

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Use of "long" in incremental sort code

David Rowley
On Tue, 30 Jun 2020 at 16:20, Tom Lane <[hidden email]> wrote:
> There is a fairly widespread issue that memory-size-related GUCs and
> suchlike variables are limited to represent sizes that fit in a "long".
> Although Win64 is the *only* platform where that's an issue, maybe
> it's worth doing something about.  But we shouldn't just fix the sort
> code, if we do do something.
>
> (IOW, I don't agree with doing a fix that doesn't also fix work_mem.)

I raised it mostly because this new-to-PG13-code is making the problem worse.

If we're not going to change the in-memory fields, then shouldn't we
at least change the ones for disk space tracking?

David


Reply | Threaded
Open this post in threaded view
|

Re: Use of "long" in incremental sort code

Peter Eisentraut-6
On 2020-06-30 06:24, David Rowley wrote:

> On Tue, 30 Jun 2020 at 16:20, Tom Lane <[hidden email]> wrote:
>> There is a fairly widespread issue that memory-size-related GUCs and
>> suchlike variables are limited to represent sizes that fit in a "long".
>> Although Win64 is the *only* platform where that's an issue, maybe
>> it's worth doing something about.  But we shouldn't just fix the sort
>> code, if we do do something.
>>
>> (IOW, I don't agree with doing a fix that doesn't also fix work_mem.)
>
> I raised it mostly because this new-to-PG13-code is making the problem worse.

Yeah, we recently got rid of a bunch of inappropriate use of long, so it
seems reasonable to make this new code follow that.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Use of "long" in incremental sort code

James Coleman
On Tue, Jun 30, 2020 at 7:21 AM Peter Eisentraut
<[hidden email]> wrote:

>
> On 2020-06-30 06:24, David Rowley wrote:
> > On Tue, 30 Jun 2020 at 16:20, Tom Lane <[hidden email]> wrote:
> >> There is a fairly widespread issue that memory-size-related GUCs and
> >> suchlike variables are limited to represent sizes that fit in a "long".
> >> Although Win64 is the *only* platform where that's an issue, maybe
> >> it's worth doing something about.  But we shouldn't just fix the sort
> >> code, if we do do something.
> >>
> >> (IOW, I don't agree with doing a fix that doesn't also fix work_mem.)
> >
> > I raised it mostly because this new-to-PG13-code is making the problem worse.
>
> Yeah, we recently got rid of a bunch of inappropriate use of long, so it
> seems reasonable to make this new code follow that.
I've attached a patch to make this change but with one tweak: I
decided to use unint64 for both memory and disk (rather than Size in
some cases) since we aggregated across multiple runs and have shared
code that deals with both values.

James

v1-0001-Use-unint64-instead-of-long-for-space-used-variab.patch (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Use of "long" in incremental sort code

Peter Geoghegan-4
In reply to this post by David Rowley
On Mon, Jun 29, 2020 at 9:13 PM David Rowley <[hidden email]> wrote:
> I noticed the incremental sort code makes use of the long datatype a
> few times, e.g in TuplesortInstrumentation and
> IncrementalSortGroupInfo.

I agree that long is terrible, and should generally be avoided.

> Maybe Size would be better for the in-memory fields and uint64 for the
> on-disk fields?

FWIW we have to use int64 for the in-memory tuplesort.c fields. This
is because it must be possible for the fields to have negative values
in the context of tuplesort. If there is going to be a general rule
for in-memory fields, then ISTM that it'll have to be "use int64".

logtape.c uses long for on-disk fields. It also relies on negative
values, albeit to a fairly limited degree (it uses -1 as a magic
value).

--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: Use of "long" in incremental sort code

James Coleman
On Thu, Jul 2, 2020 at 1:36 PM Peter Geoghegan <[hidden email]> wrote:

>
> On Mon, Jun 29, 2020 at 9:13 PM David Rowley <[hidden email]> wrote:
> > I noticed the incremental sort code makes use of the long datatype a
> > few times, e.g in TuplesortInstrumentation and
> > IncrementalSortGroupInfo.
>
> I agree that long is terrible, and should generally be avoided.
>
> > Maybe Size would be better for the in-memory fields and uint64 for the
> > on-disk fields?
>
> FWIW we have to use int64 for the in-memory tuplesort.c fields. This
> is because it must be possible for the fields to have negative values
> in the context of tuplesort. If there is going to be a general rule
> for in-memory fields, then ISTM that it'll have to be "use int64".
>
> logtape.c uses long for on-disk fields. It also relies on negative
> values, albeit to a fairly limited degree (it uses -1 as a magic
> value).

Do you think it's reasonable to use int64 across the board for memory
and disk space numbers then? If so, I can update the patch.

James


Reply | Threaded
Open this post in threaded view
|

Re: Use of "long" in incremental sort code

Peter Geoghegan-4
On Thu, Jul 2, 2020 at 10:53 AM James Coleman <[hidden email]> wrote:
> Do you think it's reasonable to use int64 across the board for memory
> and disk space numbers then? If so, I can update the patch.

Using int64 as a replacement for long is the safest general strategy,
and so ISTM that it might be worth doing that even in cases where it
isn't clearly necessary. After all, any code that uses long must have
been written with the assumption that that was the same thing as
int64, at least on most platforms.

There is nothing wrong with using Size/size_t, and doing so is often
slightly clearer. But it's no drop-in replacement for long.

--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: Use of "long" in incremental sort code

Tom Lane-2
Peter Geoghegan <[hidden email]> writes:
> On Thu, Jul 2, 2020 at 10:53 AM James Coleman <[hidden email]> wrote:
>> Do you think it's reasonable to use int64 across the board for memory
>> and disk space numbers then? If so, I can update the patch.

> Using int64 as a replacement for long is the safest general strategy,

mumble ssize_t mumble

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Use of "long" in incremental sort code

Peter Geoghegan-4
On Thu, Jul 2, 2020 at 12:39 PM Tom Lane <[hidden email]> wrote:
> mumble ssize_t mumble

That's from POSIX, though. I imagine MSVC won't be happy (surprise!).

--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: Use of "long" in incremental sort code

Tom Lane-2
Peter Geoghegan <[hidden email]> writes:
> On Thu, Jul 2, 2020 at 12:39 PM Tom Lane <[hidden email]> wrote:
>> mumble ssize_t mumble

> That's from POSIX, though. I imagine MSVC won't be happy (surprise!).

We've got quite a few uses of it already, so apparently it's fine.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Use of "long" in incremental sort code

James Coleman
In reply to this post by Tom Lane-2
On Thu, Jul 2, 2020 at 3:39 PM Tom Lane <[hidden email]> wrote:
>
> Peter Geoghegan <[hidden email]> writes:
> > On Thu, Jul 2, 2020 at 10:53 AM James Coleman <[hidden email]> wrote:
> >> Do you think it's reasonable to use int64 across the board for memory
> >> and disk space numbers then? If so, I can update the patch.
>
> > Using int64 as a replacement for long is the safest general strategy,
>
> mumble ssize_t mumble

But wouldn't that mean we'd get int on 32-bit systems, and since we're
accumulating data we could go over that value in both memory and disk?

My assumption is that it's preferable to have the "this run value" and
the "total used across multiple runs" and both of those for disk and
memory to be the same. In that case it seems we want to guarantee
64-bits.

Patch using int64 attached.

James

v2-0001-Use-int64-instead-of-long-for-space-used-variable.patch (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Use of "long" in incremental sort code

Peter Geoghegan-4
In reply to this post by Tom Lane-2
On Thu, Jul 2, 2020 at 12:44 PM Tom Lane <[hidden email]> wrote:
> > That's from POSIX, though. I imagine MSVC won't be happy (surprise!).
>
> We've got quite a few uses of it already, so apparently it's fine.

Oh, looks like we have a compatibility hack for MSVC within
win32_port.h, where ssize_t is typedef'd to __int64. I didn't realize
that it was okay to use ssize_t.

--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: Use of "long" in incremental sort code

Peter Geoghegan-4
In reply to this post by James Coleman
On Thu, Jul 2, 2020 at 12:47 PM James Coleman <[hidden email]> wrote:
> But wouldn't that mean we'd get int on 32-bit systems, and since we're
> accumulating data we could go over that value in both memory and disk?
>
> My assumption is that it's preferable to have the "this run value" and
> the "total used across multiple runs" and both of those for disk and
> memory to be the same. In that case it seems we want to guarantee
> 64-bits.

I agree. There seems to be little reason to accommodate platform level
conventions, beyond making sure that everything works on less popular
or obsolete platforms.

I suppose that it's a little idiosyncratic to use int64 like this. But
it makes sense, and isn't nearly as ugly as the long thing, so I don't
think that it should really matter.

--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: Use of "long" in incremental sort code

Tom Lane-2
In reply to this post by James Coleman
James Coleman <[hidden email]> writes:
> On Thu, Jul 2, 2020 at 3:39 PM Tom Lane <[hidden email]> wrote:
>> mumble ssize_t mumble

> But wouldn't that mean we'd get int on 32-bit systems, and since we're
> accumulating data we could go over that value in both memory and disk?

Certainly, a number that's meant to represent the amount of data *on disk*
shouldn't use ssize_t.  But I think it's appropriate if you want to
represent in-memory quantities while also allowing negative values.

I guess if you're expecting in-memory sizes exceeding 2GB, you might worry
that ssize_t could overflow.  I'm dubious that a 32-bit machine could get
to that, though, seeing that it's going to have other demands on its
address space.

> My assumption is that it's preferable to have the "this run value" and
> the "total used across multiple runs" and both of those for disk and
> memory to be the same. In that case it seems we want to guarantee
> 64-bits.

If you're not going to distinguish in-memory from not-in-memory, agreed.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Use of "long" in incremental sort code

David Rowley
In reply to this post by James Coleman
On Fri, 3 Jul 2020 at 07:47, James Coleman <[hidden email]> wrote:
> Patch using int64 attached.

I added this to the open items list for PG13.

David


Reply | Threaded
Open this post in threaded view
|

Re: Use of "long" in incremental sort code

James Coleman
On Thu, Jul 30, 2020 at 10:12 PM David Rowley <[hidden email]> wrote:
>
> On Fri, 3 Jul 2020 at 07:47, James Coleman <[hidden email]> wrote:
> > Patch using int64 attached.
>
> I added this to the open items list for PG13.
>
> David

I'd previously attached a patch [1], and there seemed to be agreement
it was reasonable (lightly so, but I also didn't see any
disagreement); would someone be able to either commit the change or
provide some additional feedback?

Thanks,
James

[1]: https://www.postgresql.org/message-id/CAAaqYe_Y5zwCTFCJeso7p34yJgf4khR8EaKeJtGd%3DQPudOad6A%40mail.gmail.com


Reply | Threaded
Open this post in threaded view
|

Re: Use of "long" in incremental sort code

David Rowley
On Sat, 1 Aug 2020 at 02:02, James Coleman <[hidden email]> wrote:
> I'd previously attached a patch [1], and there seemed to be agreement
> it was reasonable (lightly so, but I also didn't see any
> disagreement); would someone be able to either commit the change or
> provide some additional feedback?

It looks fine to me. Pushed.

David

> [1]: https://www.postgresql.org/message-id/CAAaqYe_Y5zwCTFCJeso7p34yJgf4khR8EaKeJtGd%3DQPudOad6A%40mail.gmail.com