allow frontend use of the backend's core hashing functions

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

allow frontend use of the backend's core hashing functions

Robert Haas
Late last year, I did some work to make it possible to use simplehash
in frontend code.[1] However, a hash table is not much good unless one
also has some hash functions that one can use to hash the keys that
need to be inserted into that hash table. I initially thought that
solving this problem was going to be pretty annoying, but when I
looked at it again today I found what I think is a pretty simple way
to adapt things so that the hashing routines we use in the backend are
easily available to frontend code.

Here are some patches for that. It may make sense to combine some of
these in terms of actually getting this committed, but I have
separated them here so that it is, hopefully, easy to see what I did
and why I did it. There are three basic problems which are solved by
the three preparatory patches:

0001 - hashfn.c has a couple of routines that depend on bitmapsets,
and bitmapset.c is currently backend-only. Fix by moving this code
near related code in bitmapset.c.

0002 - some of the prototypes for functions in hashfn.c are in
hsearch.h, mixed with the dynahash stuff, and others are in
hashutils.c. Fix by making hashutils.h the one true header for
hashfn.c.

0003 - Some of hashfn.c's routines return Datum, but that's
backend-only. Fix by renaming the functions and changing the return
types to uint32 and uint64, and add static inline wrappers with the
old names that convert to Datum. Just changing the return types of the
existing functions seemed like it would've required a lot more code
churn, and also seems like it could cause silent breakage in the
future.

Thanks,

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1] http://postgr.es/m/CA+Tgmob8oyh02NrZW=xCScB+5GyJ-jVowE3+TWTUmPF=FsGWTA@...

0001-Move-bitmap_hash-and-bitmap_match-to-bitmapset.c.patch (5K) Download Attachment
0004-Move-src-backend-utils-hash-hashfn.c-to-src-common.patch (29K) Download Attachment
0003-Adapt-hashfn.c-and-hashutils.h-for-frontend-use.patch (9K) Download Attachment
0002-Put-all-the-prototypes-for-hashfn.c-into-the-same-he.patch (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: allow frontend use of the backend's core hashing functions

Suraj Kharage-2
Hi,

I have spent some time reviewing the patches and overall it looks good to me.

However, I have few cosmetic review comments for 0003 patch as below;

1:
+++ b/src/backend/utils/hash/hashfn.c
@@ -16,15 +16,14 @@
  *  It is expected that every bit of a hash function's 32-bit result is
  *  as random as every other; failure to ensure this is likely to lead
  *  to poor performance of hash tables.  In most cases a hash
- *  function should use hash_any() or its variant hash_uint32().
+ *  function should use hash_bytes() or its variant hash_bytes_uint32(),
+ *  or the wrappers hash_any() and hash_any_uint32 defined in hashfn.h.

Here, indicated function name should be hash_uint32.

2: I can see renamed functions are declared twice in hashutils.c. I think duplicate declarations after #endif can be removed,

+extern uint32 hash_bytes(const unsigned char *k, int keylen);
+extern uint64 hash_bytes_extended(const unsigned char *k,
+  int keylen, uint64 seed);
+extern uint32 hash_bytes_uint32(uint32 k);
+extern uint64 hash_bytes_uint32_extended(uint32 k, uint64 seed);
+
+#ifndef FRONTEND
..
Wrapper functions
..
+#endif
+
+extern uint32 hash_bytes(const unsigned char *k, int keylen);
+extern uint64 hash_bytes_extended(const unsigned char *k,
+  int keylen, uint64 seed);
+extern uint32 hash_bytes_uint32(uint32 k);
+extern uint64 hash_bytes_uint32_extended(uint32 k, uint64 seed);


3: The first line of the commit message has one typo.
defiend => defined.

On Fri, Feb 7, 2020 at 11:00 PM Robert Haas <[hidden email]> wrote:
Late last year, I did some work to make it possible to use simplehash
in frontend code.[1] However, a hash table is not much good unless one
also has some hash functions that one can use to hash the keys that
need to be inserted into that hash table. I initially thought that
solving this problem was going to be pretty annoying, but when I
looked at it again today I found what I think is a pretty simple way
to adapt things so that the hashing routines we use in the backend are
easily available to frontend code.

Here are some patches for that. It may make sense to combine some of
these in terms of actually getting this committed, but I have
separated them here so that it is, hopefully, easy to see what I did
and why I did it. There are three basic problems which are solved by
the three preparatory patches:

0001 - hashfn.c has a couple of routines that depend on bitmapsets,
and bitmapset.c is currently backend-only. Fix by moving this code
near related code in bitmapset.c.

0002 - some of the prototypes for functions in hashfn.c are in
hsearch.h, mixed with the dynahash stuff, and others are in
hashutils.c. Fix by making hashutils.h the one true header for
hashfn.c.

0003 - Some of hashfn.c's routines return Datum, but that's
backend-only. Fix by renaming the functions and changing the return
types to uint32 and uint64, and add static inline wrappers with the
old names that convert to Datum. Just changing the return types of the
existing functions seemed like it would've required a lot more code
churn, and also seems like it could cause silent breakage in the
future.

Thanks,

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1] http://postgr.es/m/CA+Tgmob8oyh02NrZW=xCScB+5GyJ-jVowE3+TWTUmPF=FsGWTA@...


--
--

Thanks & Regards, 
Suraj kharage, 
EnterpriseDB Corporation, 
The Postgres Database Company.
Reply | Threaded
Open this post in threaded view
|

Re: allow frontend use of the backend's core hashing functions

Mark Dilger-5


> On Feb 13, 2020, at 3:44 AM, Suraj Kharage <[hidden email]> wrote:
>
> Hi,
>
> I have spent some time reviewing the patches and overall it looks good to me.
>
> However, I have few cosmetic review comments for 0003 patch as below;
>
> 1:
> +++ b/src/backend/utils/hash/hashfn.c
> @@ -16,15 +16,14 @@
>   * It is expected that every bit of a hash function's 32-bit result is
>   * as random as every other; failure to ensure this is likely to lead
>   * to poor performance of hash tables.  In most cases a hash
> - * function should use hash_any() or its variant hash_uint32().
> + * function should use hash_bytes() or its variant hash_bytes_uint32(),
> + * or the wrappers hash_any() and hash_any_uint32 defined in hashfn.h.
>
> Here, indicated function name should be hash_uint32.
+1

> 2: I can see renamed functions are declared twice in hashutils.c. I think duplicate declarations after #endif can be removed,
>
> +extern uint32 hash_bytes(const unsigned char *k, int keylen);
> +extern uint64 hash_bytes_extended(const unsigned char *k,
> + int keylen, uint64 seed);
> +extern uint32 hash_bytes_uint32(uint32 k);
> +extern uint64 hash_bytes_uint32_extended(uint32 k, uint64 seed);
> +
> +#ifndef FRONTEND
> ..
> Wrapper functions
> ..
> +#endif
> +
> +extern uint32 hash_bytes(const unsigned char *k, int keylen);
> +extern uint64 hash_bytes_extended(const unsigned char *k,
> + int keylen, uint64 seed);
> +extern uint32 hash_bytes_uint32(uint32 k);
> +extern uint64 hash_bytes_uint32_extended(uint32 k, uint64 seed);
+1

> 3: The first line of the commit message has one typo.
> defiend => defined.

+1

I have made these changes and rebased Robert’s patches but otherwise changed nothing.  Here they are:





Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




v2-0001-Move-bitmap_hash-and-bitmap_match-to-bitmapset.c.patch (5K) Download Attachment
v2-0002-Put-all-the-prototypes-for-hashfn.c-into-the-same.patch (4K) Download Attachment
v2-0003-Adapt-hashfn.c-and-hashutils.h-for-frontend-use.patch (9K) Download Attachment
v2-0004-Move-src-backend-utils-hash-hashfn.c-to-src-commo.patch (28K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: allow frontend use of the backend's core hashing functions

Robert Haas
On Thu, Feb 13, 2020 at 11:26 AM Mark Dilger
<[hidden email]> wrote:
> I have made these changes and rebased Robert’s patches but otherwise changed nothing.  Here they are:

Thanks. Anyone else have comments? I think this is pretty
straightforward and unobjectionable work so I'm inclined to press
forward with committing it fairly soon, but if someone feels
otherwise, please speak up.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: allow frontend use of the backend's core hashing functions

David Fetter
On Fri, Feb 14, 2020 at 10:33:04AM -0500, Robert Haas wrote:
> On Thu, Feb 13, 2020 at 11:26 AM Mark Dilger
> <[hidden email]> wrote:
> > I have made these changes and rebased Robert’s patches but
> > otherwise changed nothing.  Here they are:
>
> Thanks. Anyone else have comments? I think this is pretty
> straightforward and unobjectionable work so I'm inclined to press
> forward with committing it fairly soon, but if someone feels
> otherwise, please speak up.

One question. It might be possible to make these functions faster
using compiler intrinsics. Would those still be available to front-end
code?

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Reply | Threaded
Open this post in threaded view
|

Re: allow frontend use of the backend's core hashing functions

Mark Dilger-5


> On Feb 14, 2020, at 8:15 AM, David Fetter <[hidden email]> wrote:
>
> On Fri, Feb 14, 2020 at 10:33:04AM -0500, Robert Haas wrote:
>> On Thu, Feb 13, 2020 at 11:26 AM Mark Dilger
>> <[hidden email]> wrote:
>>> I have made these changes and rebased Robert’s patches but
>>> otherwise changed nothing.  Here they are:
>>
>> Thanks. Anyone else have comments? I think this is pretty
>> straightforward and unobjectionable work so I'm inclined to press
>> forward with committing it fairly soon, but if someone feels
>> otherwise, please speak up.
>
> One question. It might be possible to make these functions faster
> using compiler intrinsics. Would those still be available to front-end
> code?

Do you have a specific proposal that would preserve on-disk compatibility?


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Reply | Threaded
Open this post in threaded view
|

Re: allow frontend use of the backend's core hashing functions

Robert Haas
In reply to this post by David Fetter
On Fri, Feb 14, 2020 at 11:15 AM David Fetter <[hidden email]> wrote:
> One question. It might be possible to make these functions faster
> using compiler intrinsics. Would those still be available to front-end
> code?

Why not? We use the same compiler for frontend code as we do for
backend code. Possibly you might need some more header-file
rejiggering someplace, but I don't know of anything specific or see
any reason why it would be particularly painful if we had to do
something.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: allow frontend use of the backend's core hashing functions

David Fetter
In reply to this post by Mark Dilger-5
On Fri, Feb 14, 2020 at 08:16:47AM -0800, Mark Dilger wrote:

> > On Feb 14, 2020, at 8:15 AM, David Fetter <[hidden email]> wrote:
> >
> > On Fri, Feb 14, 2020 at 10:33:04AM -0500, Robert Haas wrote:
> >> On Thu, Feb 13, 2020 at 11:26 AM Mark Dilger
> >> <[hidden email]> wrote:
> >>> I have made these changes and rebased Robert’s patches but
> >>> otherwise changed nothing.  Here they are:
> >>
> >> Thanks. Anyone else have comments? I think this is pretty
> >> straightforward and unobjectionable work so I'm inclined to press
> >> forward with committing it fairly soon, but if someone feels
> >> otherwise, please speak up.
> >
> > One question. It might be possible to make these functions faster
> > using compiler intrinsics. Would those still be available to front-end
> > code?
>
> Do you have a specific proposal that would preserve on-disk compatibility?

I hadn't planned on changing the representation, just cutting
instructions out of the calculation of same.

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Reply | Threaded
Open this post in threaded view
|

Re: allow frontend use of the backend's core hashing functions

Mark Dilger-5


> On Feb 14, 2020, at 8:29 AM, David Fetter <[hidden email]> wrote:
>
> On Fri, Feb 14, 2020 at 08:16:47AM -0800, Mark Dilger wrote:
>>> On Feb 14, 2020, at 8:15 AM, David Fetter <[hidden email]> wrote:
>>>
>>> On Fri, Feb 14, 2020 at 10:33:04AM -0500, Robert Haas wrote:
>>>> On Thu, Feb 13, 2020 at 11:26 AM Mark Dilger
>>>> <[hidden email]> wrote:
>>>>> I have made these changes and rebased Robert’s patches but
>>>>> otherwise changed nothing.  Here they are:
>>>>
>>>> Thanks. Anyone else have comments? I think this is pretty
>>>> straightforward and unobjectionable work so I'm inclined to press
>>>> forward with committing it fairly soon, but if someone feels
>>>> otherwise, please speak up.
>>>
>>> One question. It might be possible to make these functions faster
>>> using compiler intrinsics. Would those still be available to front-end
>>> code?
>>
>> Do you have a specific proposal that would preserve on-disk compatibility?
>
> I hadn't planned on changing the representation, just cutting
> instructions out of the calculation of same.

Ok, I misunderstood.

I thought the question was about using compiler intrinsics to implement an alltogether different hashing algorithm than the one currently in use and whether exposing the hash function to frontend code would lock down the algorithm in a way that would make it harder to change in the future.  That lead me to the question of whether we had sufficient flexibility to entertain changing the hashing algorithm anyway.


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Reply | Threaded
Open this post in threaded view
|

Re: allow frontend use of the backend's core hashing functions

Robert Haas
In reply to this post by Robert Haas
On Fri, Feb 14, 2020 at 9:03 PM Robert Haas <[hidden email]> wrote:
> On Thu, Feb 13, 2020 at 11:26 AM Mark Dilger
> <[hidden email]> wrote:
> > I have made these changes and rebased Robert’s patches but otherwise changed nothing.  Here they are:
>
> Thanks. Anyone else have comments? I think this is pretty
> straightforward and unobjectionable work so I'm inclined to press
> forward with committing it fairly soon, but if someone feels
> otherwise, please speak up.

I've committed 0001 through 0003 as revised by Mark in accordance with
the comments from Suraj. Here's the last patch again with a tweak to
try not to break the Windows build, per some off-list advice I
received on how not to break the Windows build. Barring complaints
from the buildfarm or otherwise, I'll commit this one too.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

v3-0001-Move-src-backend-utils-hash-hashfn.c-to-src-commo.patch (31K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: allow frontend use of the backend's core hashing functions

Robert Haas
On Mon, Feb 24, 2020 at 5:32 PM Robert Haas <[hidden email]> wrote:
> I've committed 0001 through 0003 as revised by Mark in accordance with
> the comments from Suraj. Here's the last patch again with a tweak to
> try not to break the Windows build, per some off-list advice I
> received on how not to break the Windows build. Barring complaints
> from the buildfarm or otherwise, I'll commit this one too.

Done.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company