Comment simplehash/dynahash trade-offs

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Comment simplehash/dynahash trade-offs

James Coleman
In another thread [1] I'd mused that "there might be some value in a
README or comments
addition that would be a guide to what the various hash
implementations are useful for...so that we have something to make the
code base a bit more discoverable."

I'd solicited feedback from Andres (as the author of the simplehash
implementation) and gotten further explanation from Tomas (both cc'd
here) and have tried to condense that into the comment changes in this
patch series.

v1-0001-Summarize-trade-offs-between-simplehash-and-dynah.patch
Contains the summaries mentioned above.

v1-0002-Improve-simplehash-usage-notes.patch
I'd noticed while adding a simplehash implementation in that other
thread that the facts that:
- The element type requires a status member.
- Why storing a hash in the element type is useful (i.e., when to
define SH_STORE_HASH/SH_GET_HASH).
- The availability of  private_data member for metadata access from callbacks.
are either undocumented or hard to discover, and so I've added the
information directly to the usage notes section.

v1-0003-Show-sample-simplehash-method-signatures.patch
I find it hard to read the macro code "templating" particularly for
seeing what the available API is and so added sample method signatures
in comments to the macro generated method signature defines.

James

[1]: https://www.postgresql.org/message-id/CAAaqYe956a-zbm1qR8pqz%3DiLbF8LW5vBrQGrzXVHXdLk3at5_Q%40mail.gmail.com

v1-0002-Improve-simplehash-usage-notes.patch (2K) Download Attachment
v1-0003-Show-sample-simplehash-method-signatures.patch (3K) Download Attachment
v1-0001-Summarize-trade-offs-between-simplehash-and-dynah.patch (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Comment simplehash/dynahash trade-offs

Thomas Munro-5
On Fri, May 1, 2020 at 1:53 PM James Coleman <[hidden email]> wrote:
> In another thread [1] I'd mused that "there might be some value in a
> README or comments
> addition that would be a guide to what the various hash
> implementations are useful for...so that we have something to make the
> code base a bit more discoverable."

+1

> I'd solicited feedback from Andres (as the author of the simplehash
> implementation) and gotten further explanation from Tomas (both cc'd
> here) and have tried to condense that into the comment changes in this
> patch series.
>
> v1-0001-Summarize-trade-offs-between-simplehash-and-dynah.patch
> Contains the summaries mentioned above.

+ * - It supports partitioning, which is useful for shared memory access using

I wonder if we should say a bit more about the shared memory mode.
Shared memory dynahash tables are allocated in a fixed size area at
startup, and are discoverable by name in other other processes that
need to get access to them, while simplehash assumes that it can get
memory from a MemoryContext or an allocator with a malloc/free-style
interface, which isn't very well suited for use in shared memory.
(I'm sure you can convince it to work in shared memory with some
work.)

> v1-0002-Improve-simplehash-usage-notes.patch

+ *    For convenience the hash table create functions accept a void pointer
+ *    will be stored in the hash table type's member private_data.

*that* will be stored?

> v1-0003-Show-sample-simplehash-method-signatures.patch
> I find it hard to read the macro code "templating" particularly for
> seeing what the available API is and so added sample method signatures
> in comments to the macro generated method signature defines.

I didn't double-check all the expansions of the macros but +1 for this
idea, it's very useful.


Reply | Threaded
Open this post in threaded view
|

Re: Comment simplehash/dynahash trade-offs

James Coleman
On Mon, Jul 20, 2020 at 1:29 AM Thomas Munro <[hidden email]> wrote:

>
> On Fri, May 1, 2020 at 1:53 PM James Coleman <[hidden email]> wrote:
> > In another thread [1] I'd mused that "there might be some value in a
> > README or comments
> > addition that would be a guide to what the various hash
> > implementations are useful for...so that we have something to make the
> > code base a bit more discoverable."
>
> +1
>
> > I'd solicited feedback from Andres (as the author of the simplehash
> > implementation) and gotten further explanation from Tomas (both cc'd
> > here) and have tried to condense that into the comment changes in this
> > patch series.
> >
> > v1-0001-Summarize-trade-offs-between-simplehash-and-dynah.patch
> > Contains the summaries mentioned above.
>
> + * - It supports partitioning, which is useful for shared memory access using
>
> I wonder if we should say a bit more about the shared memory mode.
> Shared memory dynahash tables are allocated in a fixed size area at
> startup, and are discoverable by name in other other processes that
> need to get access to them, while simplehash assumes that it can get
> memory from a MemoryContext or an allocator with a malloc/free-style
> interface, which isn't very well suited for use in shared memory.
> (I'm sure you can convince it to work in shared memory with some
> work.)
Added.

> > v1-0002-Improve-simplehash-usage-notes.patch
>
> + *    For convenience the hash table create functions accept a void pointer
> + *    will be stored in the hash table type's member private_data.
>
> *that* will be stored?

Fixed.

> > v1-0003-Show-sample-simplehash-method-signatures.patch
> > I find it hard to read the macro code "templating" particularly for
> > seeing what the available API is and so added sample method signatures
> > in comments to the macro generated method signature defines.
>
> I didn't double-check all the expansions of the macros but +1 for this
> idea, it's very useful.

James

v2-0003-Show-sample-simplehash-method-signatures.patch (3K) Download Attachment
v2-0002-Improve-simplehash-usage-notes.patch (2K) Download Attachment
v2-0001-Summarize-trade-offs-between-simplehash-and-dynah.patch (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Comment simplehash/dynahash trade-offs

Thomas Munro-5
On Sat, Aug 1, 2020 at 7:22 AM James Coleman <[hidden email]> wrote:
> [v2 patch set]

I ran it through pgindent which insisted on adding some newlines, I
manually replaced some spaces with tabs to match nearby lines, I added
some asterisks in your example function prototypes where <element> is
returned because they seemed to be missing, and I pushed this.
Thanks!


Reply | Threaded
Open this post in threaded view
|

Re: Comment simplehash/dynahash trade-offs

James Coleman
On Fri, Jul 31, 2020 at 8:17 PM Thomas Munro <[hidden email]> wrote:
>
> On Sat, Aug 1, 2020 at 7:22 AM James Coleman <[hidden email]> wrote:
> > [v2 patch set]
>
> I ran it through pgindent which insisted on adding some newlines, I
> manually replaced some spaces with tabs to match nearby lines, I added
> some asterisks in your example function prototypes where <element> is
> returned because they seemed to be missing, and I pushed this.
> Thanks!


Thanks for reviewing and committing!

James


Reply | Threaded
Open this post in threaded view
|

Re: Comment simplehash/dynahash trade-offs

David Rowley
In reply to this post by Thomas Munro-5
On Sat, 1 Aug 2020 at 12:17, Thomas Munro <[hidden email]> wrote:
>
> On Sat, Aug 1, 2020 at 7:22 AM James Coleman <[hidden email]> wrote:
> > [v2 patch set]
>
> I ran it through pgindent which insisted on adding some newlines, I
> manually replaced some spaces with tabs to match nearby lines, I added
> some asterisks in your example function prototypes where <element> is
> returned because they seemed to be missing, and I pushed this.
> Thanks!

I was just reading over this and wondered about the following:

+ *   The element type is required to contain a "uint32 status" member.

I see that PagetableEntry does not follow this and I also didn't
follow it when writing the Result Cache patch in [1]. I managed to
shrink the struct I was using for the hash table by 4 bytes by using a
char instead of an int. That sounds like a small amount of memory, but
it did result in much better cache hit ratios in the patch

Maybe it would be better just to get rid of the enum and just #define
the values. It seems unlikely that we're every going to need many more
states than what are there already, let along more than, say 127 of
them. It does look like manifest_file could be shrunk down a bit too
by making the status field a char.

[1] https://www.postgresql.org/message-id/flat/CAApHDvrPcQyQdWERGYWx8J+2DLUNgXu+fOSbQ1UscxrunyXyrQ@...


Reply | Threaded
Open this post in threaded view
|

Re: Comment simplehash/dynahash trade-offs

David Rowley
On Sun, 2 Aug 2020 at 11:16, David Rowley <[hidden email]> wrote:
> Maybe it would be better just to get rid of the enum and just #define
> the values. It seems unlikely that we're every going to need many more
> states than what are there already, let along more than, say 127 of
> them. It does look like manifest_file could be shrunk down a bit too
> by making the status field a char.

This didn't turn out quite as pretty as I had imagined.  I needed to
leave the two statuses defined in simplehash.h so that callers could
make use of them. (Result Cache will do this).

The change here would be callers would need to use SH_STATUS_IN_USE
rather than <prefix>_SH_IN_USE.

I'm not really that sold on doing things this way. I really just don't
want to have to make my status field a uint32 in Result Cache per what
the new comment states we must do. If there's a nicer way, then
perhaps that's worth considering.

David

simplehash_fix.patch (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Comment simplehash/dynahash trade-offs

Thomas Munro-5
On Sun, Aug 2, 2020 at 1:54 PM David Rowley <[hidden email]> wrote:

> On Sun, 2 Aug 2020 at 11:16, David Rowley <[hidden email]> wrote:
> > Maybe it would be better just to get rid of the enum and just #define
> > the values. It seems unlikely that we're every going to need many more
> > states than what are there already, let along more than, say 127 of
> > them. It does look like manifest_file could be shrunk down a bit too
> > by making the status field a char.
>
> This didn't turn out quite as pretty as I had imagined.  I needed to
> leave the two statuses defined in simplehash.h so that callers could
> make use of them. (Result Cache will do this).
>
> The change here would be callers would need to use SH_STATUS_IN_USE
> rather than <prefix>_SH_IN_USE.
>
> I'm not really that sold on doing things this way. I really just don't
> want to have to make my status field a uint32 in Result Cache per what
> the new comment states we must do. If there's a nicer way, then
> perhaps that's worth considering.

Agreed that the new comment is wrong and should be changed.

I think you can probably go further, though, and make it require no
storage at all by making it optionally "intrusive", by using a special
value in an existing member, and supplying an expression to set and
test for that value.  For example, maybe some users have a pointer but
never want to use NULL, and maybe some users already have a field
holding various flags that are one bit wide and can spare a bit.


Reply | Threaded
Open this post in threaded view
|

Re: Comment simplehash/dynahash trade-offs

David Rowley
On Mon, 3 Aug 2020 at 11:10, Thomas Munro <[hidden email]> wrote:
> I think you can probably go further, though, and make it require no
> storage at all by making it optionally "intrusive", by using a special
> value in an existing member, and supplying an expression to set and
> test for that value.  For example, maybe some users have a pointer but
> never want to use NULL, and maybe some users already have a field
> holding various flags that are one bit wide and can spare a bit.

I agree that it would be good to allow users of simplehash.h that
additional flexibility. It may allow additional memory savings.
However, it would mean we'd need to do some additional work when we
create and grow the hash table to ensure that we mark new buckets as
empty. For now, we get that for free with the zeroing of the newly
allocated memory, but we couldn't rely on that if we allowed users to
define their own macro.

It looks like none of the current callers could gain from this

1. TupleHashEntryData does not have any reusable fields. The status
should fit in the padding on a 64-bit machine anyway.
2. PagetableEntry already has a status that fits into the padding.
3. manifest_file could have its status moved to the end of the struct
and made into a char and the struct would be the same size as if the
field did not exist.

So, with the current users, we'd stand to lose more than we'd gain
from doing it that way.

David


Reply | Threaded
Open this post in threaded view
|

Re: Comment simplehash/dynahash trade-offs

David Rowley
On Mon, 3 Aug 2020 at 11:36, David Rowley <[hidden email]> wrote:
> So, with the current users, we'd stand to lose more than we'd gain
> from doing it that way.

FWIW, I'd be ok with just:

- *       The element type is required to contain a "uint32 status" member.
+ *       The element type is required to contain an integer-based
"status" member
+ *       which can store the range of values defined in the SH_STATUS enum.

David


Reply | Threaded
Open this post in threaded view
|

Re: Comment simplehash/dynahash trade-offs

Thomas Munro-5
On Mon, Aug 3, 2020 at 11:42 AM David Rowley <[hidden email]> wrote:

> On Mon, 3 Aug 2020 at 11:36, David Rowley <[hidden email]> wrote:
> > So, with the current users, we'd stand to lose more than we'd gain
> > from doing it that way.
>
> FWIW, I'd be ok with just:
>
> - *       The element type is required to contain a "uint32 status" member.
> + *       The element type is required to contain an integer-based
> "status" member
> + *       which can store the range of values defined in the SH_STATUS enum.

Thanks for the correction.  Pushed.