cryptohash: missing locking functions for OpenSSL <= 1.0.2?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

cryptohash: missing locking functions for OpenSSL <= 1.0.2?

Jacob Champion-2
While reviewing the NSS patch [1], I noticed that the cryptohash
implementation for OpenSSL doesn't set up any locking callbacks in
frontend code. I think there has to be a call to
OPENSSL_set_locking_callback() before libpq starts reaching into the
EVP_* API, if ENABLE_THREAD_SAFETY and HAVE_CRYPTO_LOCK are both true.

This would only affect threaded libpq clients running OpenSSL 1.0.2 and
below, and it looks like the most likely code path to be affected is
the OpenSSL error stack. So if anything went wrong with one of those
hash calls, it's possible that libpq would crash (or worse, silently
misbehave somewhere in the TLS stack) instead of gracefully reporting
an error. [2] is an example of this in the wild.

--Jacob

[1] https://www.postgresql.org/message-id/40095f48c3c6d556293cb0ecf80ea10cdf7d26b3.camel%40vmware.com
[2] https://github.com/openssl/openssl/issues/4690
Reply | Threaded
Open this post in threaded view
|

Re: cryptohash: missing locking functions for OpenSSL <= 1.0.2?

Michael Paquier-2
On Wed, Feb 17, 2021 at 06:34:36PM +0000, Jacob Champion wrote:
> This would only affect threaded libpq clients running OpenSSL 1.0.2 and
> below, and it looks like the most likely code path to be affected is
> the OpenSSL error stack. So if anything went wrong with one of those
> hash calls, it's possible that libpq would crash (or worse, silently
> misbehave somewhere in the TLS stack) instead of gracefully reporting
> an error. [2] is an example of this in the wild.

I have been discussing a bit this issue with Jacob, and that's a
problem we would need to address on HEAD.  First, I have been looking
at this stuff in older versions with MD5 and SHA256 used by SCRAM when
it comes to ~13 with libpq:
- MD5 is based on the internal implementation of Postgres even when
building libpq with OpenSSL, so that would not be an issue.
- SHA256 is a different story though, because when building with
OpenSSL we would go through SHA256_{Init,Update,Final} for SCRAM
authentication.  In the context of a SSL connection, the crypto part
is initialized.  But that would *not* be the case of a connection in a
non-SSL context.  Fortunately, and after looking at the OpenSSL code
(fips_md_init_ctx, HASH_UPDATE, etc.), there is no sign of locking
handling or errors, so I think that we are safe there.

Now comes the case of HEAD that uses EVP for MD5 and SHA256.  A SSL
connection would initialize the crypto part, but that does not happen
for a non-SSL connection.  So, logically, one could run into issues if
using MD5 or SCRAM with OpenSSL <= 1.0.2 (pgbench with a high number
of threads does not complain by the way), and we are not yet in a
stage where we should drop this version either, even if it has been
EOL'd by upstream at the end of 2019.

We have the code in place to properly initialize the crypto locking in
libpq with ENABLE_THREAD_SAFETY, but the root of the issue is that the
SSL and crypto initializations are grouped together.  What we need to
do here is to split those phases of the initialization so as non-SSL
connections can use the crypto part properly, as pqsecure_initialize
gets only called now when libpq negotiates SSL with the postmaster.
It seems to me that we should make sure of a proper reset of the
crypto part within pqDropConnection(), while the initialization needs
to happen in PQconnectPoll().
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: cryptohash: missing locking functions for OpenSSL <= 1.0.2?

Michael Paquier-2
On Thu, Feb 18, 2021 at 11:04:05AM +0900, Michael Paquier wrote:
> We have the code in place to properly initialize the crypto locking in
> libpq with ENABLE_THREAD_SAFETY, but the root of the issue is that the
> SSL and crypto initializations are grouped together.  What we need to
> do here is to split those phases of the initialization so as non-SSL
> connections can use the crypto part properly, as pqsecure_initialize
> gets only called now when libpq negotiates SSL with the postmaster.
> It seems to me that we should make sure of a proper reset of the
> crypto part within pqDropConnection(), while the initialization needs
> to happen in PQconnectPoll().

So, I have tried a couple of things with a debug build of OpenSSL
1.0.2 at hand (two locks for the crypto and SSL initializations but
SSL_new() grabs some random bytes that need the same callback to be
set or the state of the threads is messed up, some global states to
control the load), and the simplest solution I have come up with is to
control in each pg_conn if the crypto callbacks have been initialized
or not so as we avoid multiple inits and/or drops of the state for a
single connection.  I have arrived at this conclusion after hunting
down cases with pqDropConnection() which would could be called
multiple times, particularly if there are connection attempts to
multiple hosts.

The attached patch implements things this way, and initializes the
crypto callbacks before sending the startup packet, before deciding if
SSL needs to be requested or not.  I have played with several
threading scenarios with this patch, with and without OpenSSL, and the
numbers match in terms of callback loading and unloading (the global
counter used in fe-secure-openssl.c gets to zero).
--
Michael

cryptohash-libpq.patch (8K) Download Attachment
signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: cryptohash: missing locking functions for OpenSSL <= 1.0.2?

Michael Paquier-2
On Fri, Feb 19, 2021 at 02:37:06PM +0900, Michael Paquier wrote:
> The attached patch implements things this way, and initializes the
> crypto callbacks before sending the startup packet, before deciding if
> SSL needs to be requested or not.  I have played with several
> threading scenarios with this patch, with and without OpenSSL, and the
> numbers match in terms of callback loading and unloading (the global
> counter used in fe-secure-openssl.c gets to zero).

I have done more work and much more tests with this patch, polishing
things as of the attached v2.  First, I don't see any performance
impact or concurrency issues, using up to 200 threads with pgbench -C
-n -j N -c N -f blah.sql where the SQL file includes a single
meta-command like that for instance:
\set a 1

This ensures that connection requests happen a maximum in concurrency,
and libpq stays close to the maximum for the number of open threads.
Attached is a second, simple program that I have used to stress the
case of threads using both SSL and non-SSL connections in parallel to
check for the consistency of the callbacks and their release, mainly
across MD5 and SCRAM.

Extra eyes are welcome here, though I feel comfortable with the
approach taken here.
--
Michael

pthread_libpq.c (1K) Download Attachment
cryptohash-libpq-v2.patch (8K) Download Attachment
signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: cryptohash: missing locking functions for OpenSSL <= 1.0.2?

Jacob Champion-2
On Wed, 2021-03-03 at 15:30 +0900, Michael Paquier wrote:
> Extra eyes are welcome here, though I feel comfortable with the
> approach taken here.

I have one suggestion for the new logic:

>        else
>        {
>                /*
>                 * In the non-SSL case, just remove the crypto callbacks.  This code
>                 * path has no dependency on any pending SSL calls.
>                 */
>                destroy_needed = true;
>        }
>        [...]
>        if (destroy_needed && conn->crypto_loaded)
>        {
>                destroy_ssl_system();
>                conn->crypto_loaded = false;
>        }

I had to convince myself that this logic is correct -- we set
destroy_needed even if crypto is not enabled, but then check later to
make sure that crypto_loaded is true before doing anything. What would
you think about moving the conn->crypto_loaded check to the else
branch, so that destroy_needed is only set if we actually need it?

Either way, the patch looks good to me and behaves nicely in testing.
Thanks!

--Jacob
Reply | Threaded
Open this post in threaded view
|

Re: cryptohash: missing locking functions for OpenSSL <= 1.0.2?

Michael Paquier-2
On Mon, Mar 08, 2021 at 06:06:32PM +0000, Jacob Champion wrote:
> I had to convince myself that this logic is correct -- we set
> destroy_needed even if crypto is not enabled, but then check later to
> make sure that crypto_loaded is true before doing anything. What would
> you think about moving the conn->crypto_loaded check to the else
> branch, so that destroy_needed is only set if we actually need it?

Do you mean something like the attached?  If I recall my mood from the
moment, I think that I did that to be more careful with the case where
the client has its own set of callbacks set (pq_init_crypto_lib as
false) but that does not matter as this is double-checked in
destroy_ssl_system().  I have adjusted some comments after more
review.
--
Michael

cryptohash-libpq-v3.patch (8K) Download Attachment
signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: cryptohash: missing locking functions for OpenSSL <= 1.0.2?

Jacob Champion-2
On Wed, 2021-03-10 at 17:21 +0900, Michael Paquier wrote:
> Do you mean something like the attached?

Yes! Patch LGTM.

--Jacob
Reply | Threaded
Open this post in threaded view
|

Re: cryptohash: missing locking functions for OpenSSL <= 1.0.2?

Michael Paquier-2
On Wed, Mar 10, 2021 at 05:05:38PM +0000, Jacob Champion wrote:
> On Wed, 2021-03-10 at 17:21 +0900, Michael Paquier wrote:
> > Do you mean something like the attached?
>
> Yes! Patch LGTM.

Thanks Jacob for double-checking.  I have looked at that again slowly
today, and applied it after some light adjustments in the comments.
--
Michael

signature.asc (849 bytes) Download Attachment