Support for NSS as a libpq TLS backend

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
53 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Re: Support for NSS as a libpq TLS backend

Jacob Champion-2
On Nov 11, 2020, at 10:17 AM, Jacob Champion <[hidden email]> wrote:
>
> (Two failures left on my machine.)

False alarm -- the stderr debugging I'd added in to track down the
assertion tripped up the "no stderr" tests. Zero failing tests now.

--Jacob


Reply | Threaded
Open this post in threaded view
|

Re: Support for NSS as a libpq TLS backend

Jacob Champion-2
On Nov 11, 2020, at 10:57 AM, Jacob Champion <[hidden email]> wrote:
>
> False alarm -- the stderr debugging I'd added in to track down the
> assertion tripped up the "no stderr" tests. Zero failing tests now.

I took a look at the OpenSSL interop problems you mentioned upthread. I
don't see a hang like you did, but I do see a PR_IO_TIMEOUT_ERROR during
connection.

I think pgtls_read() needs to treat PR_IO_TIMEOUT_ERROR as if no bytes
were read, in order to satisfy its API. There was some discussion on
this upthread:

On Oct 27, 2020, at 1:07 PM, Daniel Gustafsson <[hidden email]> wrote:

>
> On 20 Oct 2020, at 21:15, Andres Freund <[hidden email]> wrote:
>>
>>> + case PR_IO_TIMEOUT_ERROR:
>>> + break;
>>
>> What does this mean? We'll return with a 0 errno here, right? When is
>> this case reachable?
>
> It should, AFAICT, only be reachable when PR_Recv is used with a timeout which
> we don't do.  It mentioned somewhere that it had happened in no-wait calls due
> to a bug, but I fail to find that reference now.  Either way, I've removed it
> to fall into the default error handling which now sets errno correctly as that
> was a paddle short here.
PR_IO_TIMEOUT_ERROR is definitely returned in no-wait calls on my
machine. It doesn't look like the PR_Recv() API has a choice -- if
there's no data, it can't return a positive integer, and returning zero
means that the socket has been disconnected. So -1 with a timeout error
is the only option.

I'm not completely sure why this is exposed so easily with an OpenSSL
server -- I'm guessing the implementation slices up its packets
differently on the wire, causing a read event before NSS is able to
decrypt a full record -- but it's worth noting that this case also shows
up during NSS-to-NSS psql connections, when handling notifications at
the end of every query. PQconsumeInput() reports a hard failure with the
current implementation, but its return value is ignored by
PrintNotifications(). Otherwise this probably would have showed up
earlier.

(What's the best way to test this case? Are there lower-level tests for
the protocol/network layer somewhere that I'm missing?)

While patching this case, I also noticed that pgtls_read() doesn't call
SOCK_ERRNO_SET() for the disconnection case. That is also in the
attached patch.

--Jacob


nss-handle-timeouts-and-disconnections-in-pgtls_read.patch (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Support for NSS as a libpq TLS backend

Daniel Gustafsson
> On 12 Nov 2020, at 23:12, Jacob Champion <[hidden email]> wrote:
>
> On Nov 11, 2020, at 10:57 AM, Jacob Champion <[hidden email]> wrote:
>>
>> False alarm -- the stderr debugging I'd added in to track down the
>> assertion tripped up the "no stderr" tests. Zero failing tests now.
>
> I took a look at the OpenSSL interop problems you mentioned upthread.

Great, thanks!

> I don't see a hang like you did, but I do see a PR_IO_TIMEOUT_ERROR during
> connection.
>
> I think pgtls_read() needs to treat PR_IO_TIMEOUT_ERROR as if no bytes
> were read, in order to satisfy its API. There was some discussion on
> this upthread:
>
> On Oct 27, 2020, at 1:07 PM, Daniel Gustafsson <[hidden email]> wrote:
>>
>> On 20 Oct 2020, at 21:15, Andres Freund <[hidden email]> wrote:
>>>
>>>> + case PR_IO_TIMEOUT_ERROR:
>>>> + break;
>>>
>>> What does this mean? We'll return with a 0 errno here, right? When is
>>> this case reachable?
>>
>> It should, AFAICT, only be reachable when PR_Recv is used with a timeout which
>> we don't do.  It mentioned somewhere that it had happened in no-wait calls due
>> to a bug, but I fail to find that reference now.  Either way, I've removed it
>> to fall into the default error handling which now sets errno correctly as that
>> was a paddle short here.
>
> PR_IO_TIMEOUT_ERROR is definitely returned in no-wait calls on my
> machine. It doesn't look like the PR_Recv() API has a choice -- if
> there's no data, it can't return a positive integer, and returning zero
> means that the socket has been disconnected. So -1 with a timeout error
> is the only option.
Right, that makes sense.

> I'm not completely sure why this is exposed so easily with an OpenSSL
> server -- I'm guessing the implementation slices up its packets
> differently on the wire, causing a read event before NSS is able to
> decrypt a full record -- but it's worth noting that this case also shows
> up during NSS-to-NSS psql connections, when handling notifications at
> the end of every query. PQconsumeInput() reports a hard failure with the
> current implementation, but its return value is ignored by
> PrintNotifications(). Otherwise this probably would have showed up
> earlier.

Should there perhaps be an Assert there to catch those?

> (What's the best way to test this case? Are there lower-level tests for
> the protocol/network layer somewhere that I'm missing?)

Not AFAIK.  Having been knee-deep now, do you have any ideas on how to
implement?

> While patching this case, I also noticed that pgtls_read() doesn't call
> SOCK_ERRNO_SET() for the disconnection case. That is also in the
> attached patch.

Ah yes, nice catch.

I've incorporated this patch as well as the previous patch for the assertion
failure on private callback data into the attached v19 patchset.  I also did a
spellcheck and pgindent run on it for ease of review.

cheers ./daniel


v19-0001-NSS-Frontend-Backend-and-build-infrastructure.patch (152K) Download Attachment
v19-0002-NSS-Testharness-updates.patch (65K) Download Attachment
v19-0003-NSS-pg_strong_random-support.patch (6K) Download Attachment
v19-0004-NSS-Documentation.patch (21K) Download Attachment
v19-0005-NSS-contrib-modules.patch (42K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Support for NSS as a libpq TLS backend

Jacob Champion-2
On Nov 13, 2020, at 4:14 AM, Daniel Gustafsson <[hidden email]> wrote:

>> On 12 Nov 2020, at 23:12, Jacob Champion <[hidden email]> wrote:
>>
>> I'm not completely sure why this is exposed so easily with an OpenSSL
>> server -- I'm guessing the implementation slices up its packets
>> differently on the wire, causing a read event before NSS is able to
>> decrypt a full record -- but it's worth noting that this case also shows
>> up during NSS-to-NSS psql connections, when handling notifications at
>> the end of every query. PQconsumeInput() reports a hard failure with the
>> current implementation, but its return value is ignored by
>> PrintNotifications(). Otherwise this probably would have showed up
>> earlier.
>
> Should there perhaps be an Assert there to catch those?
Hm. From the perspective of helping developers out, perhaps, but from
the standpoint of "don't crash when an endpoint outside our control does
something strange", I think that's a harder sell. Should the error be
bubbled all the way up instead? Or perhaps, if psql isn't supposed to
treat notification errors as "hard" failures, it should at least warn
the user that something is fishy?

>> (What's the best way to test this case? Are there lower-level tests for
>> the protocol/network layer somewhere that I'm missing?)
>
> Not AFAIK.  Having been knee-deep now, do you have any ideas on how to
> implement?

I think that testing these sorts of important edge cases needs a
friendly DSL -- something that doesn't want to make devs tear their hair
out while building tests. I've been playing a little bit with Scapy [1]
to understand more of the libpq v3 protocol; I'll see if that can be
adapted for pieces of the TLS handshake in a way that's easy to
maintain. If it can be, maybe that'd be a good starting example.

> I've incorporated this patch as well as the previous patch for the assertion
> failure on private callback data into the attached v19 patchset.  I also did a
> spellcheck and pgindent run on it for ease of review.

Commit 6be725e70 got rid of some psql error messaging that the tests
were keying off of, so there are a few new failures after a rebase onto
latest master.

I've attached a patch that gets the SCRAM tests a little further
(certificate hashing was caught in an infinite loop). I also added error
checks to those loops, along the lines of the existing OpenSSL
implementation: if a suitable digest can't be found, the user will see
an error like

    psql: error: could not find digest for OID 'PKCS #1 SHA-256 With RSA Encryption'

It's a little verbose but I don't think this case should come up in
normal practice.

--Jacob

[1] https://scapy.net/


nss-fix-hang-when-hashing-certificates.patch (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Support for NSS as a libpq TLS backend

Daniel Gustafsson
> On 16 Nov 2020, at 21:00, Jacob Champion <[hidden email]> wrote:
> On Nov 13, 2020, at 4:14 AM, Daniel Gustafsson <[hidden email]> wrote:

>> I've incorporated this patch as well as the previous patch for the assertion
>> failure on private callback data into the attached v19 patchset.  I also did a
>> spellcheck and pgindent run on it for ease of review.
>
> Commit 6be725e70 got rid of some psql error messaging that the tests
> were keying off of, so there are a few new failures after a rebase onto
> latest master.
>
> I've attached a patch that gets the SCRAM tests a little further
> (certificate hashing was caught in an infinite loop). I also added error
> checks to those loops, along the lines of the existing OpenSSL
> implementation: if a suitable digest can't be found, the user will see
> an error like
>
>    psql: error: could not find digest for OID 'PKCS #1 SHA-256 With RSA Encryption'
>
> It's a little verbose but I don't think this case should come up in
> normal practice.
Nice, thanks for the fix!  I've incorporated your patch into the attached v20
which also fixes client side error reporting to be more readable.  The SCRAM
tests are now also hooked up, albeit with SKIP blocks for NSS, so they can
start getting fixed.

cheers ./daniel


v20-0001-NSS-Frontend-Backend-and-build-infrastructure.patch (152K) Download Attachment
v20-0002-NSS-Testharness-updates.patch (71K) Download Attachment
v20-0003-NSS-pg_strong_random-support.patch (6K) Download Attachment
v20-0004-NSS-Documentation.patch (21K) Download Attachment
v20-0005-NSS-contrib-modules.patch (42K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Support for NSS as a libpq TLS backend

Jacob Champion-2
On Nov 17, 2020, at 7:00 AM, Daniel Gustafsson <[hidden email]> wrote:
>
> Nice, thanks for the fix!  I've incorporated your patch into the attached v20
> which also fixes client side error reporting to be more readable.

I was testing handshake failure modes and noticed that some FATAL
messages are being sent through to the client in cleartext. The OpenSSL
implementation doesn't do this, because it logs handshake problems at
COMMERROR level. Should we switch all those ereport() calls in the NSS
be_tls_open_server() to COMMERROR as well (and return explicitly), to
avoid this? Or was there a reason for logging at FATAL/ERROR level?

Related note, at the end of be_tls_open_server():

>     ...
>     port->ssl_in_use = true;
>     return 0;
>
> error:
>     return 1;
> }

This needs to return -1 in the error case; the only caller of
secure_open_server() does a direct `result == -1` comparison rather than
checking `result != 0`.

--Jacob

Reply | Threaded
Open this post in threaded view
|

Re: Support for NSS as a libpq TLS backend

Jacob Champion-2
In reply to this post by Daniel Gustafsson
On Tue, 2020-10-27 at 21:07 +0100, Daniel Gustafsson wrote:

> > On 20 Oct 2020, at 21:15, Andres Freund <[hidden email]> wrote:
> >
> > > +static SECStatus
> > > +pg_cert_auth_handler(void *arg, PRFileDesc * fd, PRBool checksig, PRBool isServer)
> > > +{
> > > + SECStatus status;
> > > + Port   *port = (Port *) arg;
> > > + CERTCertificate *cert;
> > > + char   *peer_cn;
> > > + int len;
> > > +
> > > + status = SSL_AuthCertificate(CERT_GetDefaultCertDB(), port->pr_fd, checksig, PR_TRUE);
> > > + if (status == SECSuccess)
> > > + {
> > > + cert = SSL_PeerCertificate(port->pr_fd);
> > > + len = strlen(cert->subjectName);
> > > + peer_cn = MemoryContextAllocZero(TopMemoryContext, len + 1);
> > > + if (strncmp(cert->subjectName, "CN=", 3) == 0)
> > > + strlcpy(peer_cn, cert->subjectName + strlen("CN="), len + 1);
> > > + else
> > > + strlcpy(peer_cn, cert->subjectName, len + 1);
> > > + CERT_DestroyCertificate(cert);
> > > +
> > > + port->peer_cn = peer_cn;
> > > + port->peer_cert_valid = true;
> >
> > Hm. We either should have something similar to
> >
> > /*
> > * Reject embedded NULLs in certificate common name to prevent
> > * attacks like CVE-2009-4034.
> > */
> > if (len != strlen(peer_cn))
> > {
> > ereport(COMMERROR,
> > (errcode(ERRCODE_PROTOCOL_VIOLATION),
> > errmsg("SSL certificate's common name contains embedded null")));
> > pfree(peer_cn);
> > return -1;
> > }
> > here, or a comment explaining why not.
>
> We should, but it's proving rather difficult as there is no equivalent API call
> to get the string as well as the expected length of it.

I'm going to try to tackle this part next. It looks like NSS uses RFC
4514 (or something like it) backslash-quoting, which this code either
needs to undo or bypass before performing a comparison.

--Jacob
Reply | Threaded
Open this post in threaded view
|

Re: Support for NSS as a libpq TLS backend

Michael Paquier-2
In reply to this post by Daniel Gustafsson
On Tue, Nov 17, 2020 at 04:00:53PM +0100, Daniel Gustafsson wrote:
> Nice, thanks for the fix!  I've incorporated your patch into the attached v20
> which also fixes client side error reporting to be more readable.  The SCRAM
> tests are now also hooked up, albeit with SKIP blocks for NSS, so they can
> start getting fixed.

On top of the set of TODO items mentioned in the logs of the patches,
this patch set needs a rebase because it does not apply.  In order to
move on with this set, I would suggest to extract some parts of the
patch set independently of the others and have two buildfarm members
for the MSVC and non-MSVC cases to stress the parts that can be
committed.  Just seeing the size, we could move on with:
- The ./configure set, with the change to introduce --with-ssl=openssl.
- 0004 for strong randoms.
- Support for cryptohashes.

+/*
+ * BITS_PER_BYTE is also defined in the NSPR header files, so we need to undef
+ * our version to avoid compiler warnings on redefinition.
+ */
+#define pg_BITS_PER_BYTE BITS_PER_BYTE
+#undef BITS_PER_BYTE
This could be done separately.

src/sgml/libpq.sgml needs to document PQdefaultSSLKeyPassHook_nss, no?
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Support for NSS as a libpq TLS backend

Daniel Gustafsson
> On 18 Jan 2021, at 08:08, Michael Paquier <[hidden email]> wrote:
>
> On Tue, Nov 17, 2020 at 04:00:53PM +0100, Daniel Gustafsson wrote:
>> Nice, thanks for the fix!  I've incorporated your patch into the attached v20
>> which also fixes client side error reporting to be more readable.  The SCRAM
>> tests are now also hooked up, albeit with SKIP blocks for NSS, so they can
>> start getting fixed.
>
> On top of the set of TODO items mentioned in the logs of the patches,
> this patch set needs a rebase because it does not apply.

Fixed in the attached, which also addresses the points raised earlier by Jacob
as well as adds certificates created entirely by NSS tooling as well as initial
cryptohash support.  There is something iffy with these certs (the test fails
on mismatching ciphers and/or signature algorithms) that I haven't been able to
pin down, but to get more eyes on this I'm posting the patch with the test
enabled.  The NSS toolchain requires interactive input which makes the Makefile
a bit hacky, ideas on cleaning that up are appreciated.

> In order to
> move on with this set, I would suggest to extract some parts of the
> patch set independently of the others and have two buildfarm members
> for the MSVC and non-MSVC cases to stress the parts that can be
> committed.  Just seeing the size, we could move on with:
> - The ./configure set, with the change to introduce --with-ssl=openssl.
> - 0004 for strong randoms.
> - Support for cryptohashes.

I will leave it to others to decide the feasibility of this, I'm happy to slice
and dice the commits into smaller bits to for example separate out the
--with-ssl autoconf change into a non NSS dependent commit, if that's wanted.

> +/*
> + * BITS_PER_BYTE is also defined in the NSPR header files, so we need to undef
> + * our version to avoid compiler warnings on redefinition.
> + */
> +#define pg_BITS_PER_BYTE BITS_PER_BYTE
> +#undef BITS_PER_BYTE
> This could be done separately.

Based on an offlist discussion I believe this was a misunderstanding, but if I
instead misunderstood that feel free to correct me with how you think this
should be done.

> src/sgml/libpq.sgml needs to document PQdefaultSSLKeyPassHook_nss, no?

Good point, fixed.

cheers ./daniel


v21-0001-NSS-Frontend-Backend-and-build-infrastructure.patch (159K) Download Attachment
v21-0002-NSS-Testharness-updates.patch (75K) Download Attachment
v21-0003-NSS-pg_strong_random-support.patch (5K) Download Attachment
v21-0004-NSS-Documentation.patch (29K) Download Attachment
v21-0005-NSS-contrib-modules.patch (42K) Download Attachment
v21-0006-NSS-cryptohash-support.patch (8K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Support for NSS as a libpq TLS backend

Daniel Gustafsson
In reply to this post by Jacob Champion-2
> On 4 Dec 2020, at 01:57, Jacob Champion <[hidden email]> wrote:
>
> On Nov 17, 2020, at 7:00 AM, Daniel Gustafsson <[hidden email]> wrote:
>>
>> Nice, thanks for the fix!  I've incorporated your patch into the attached v20
>> which also fixes client side error reporting to be more readable.
>
> I was testing handshake failure modes and noticed that some FATAL
> messages are being sent through to the client in cleartext. The OpenSSL
> implementation doesn't do this, because it logs handshake problems at
> COMMERROR level. Should we switch all those ereport() calls in the NSS
> be_tls_open_server() to COMMERROR as well (and return explicitly), to
> avoid this? Or was there a reason for logging at FATAL/ERROR level?

The ERROR logging made early development easier but then stuck around, I've
changed them to COMMERROR returning an error instead in the v21 patch just
sent to the list.

> Related note, at the end of be_tls_open_server():
>
>>    ...
>>    port->ssl_in_use = true;
>>    return 0;
>>
>> error:
>>    return 1;
>> }
>
> This needs to return -1 in the error case; the only caller of
> secure_open_server() does a direct `result == -1` comparison rather than
> checking `result != 0`.

Fixed.

cheers ./daniel

Reply | Threaded
Open this post in threaded view
|

Re: Support for NSS as a libpq TLS backend

Jacob Champion-2
In reply to this post by Daniel Gustafsson
On Tue, 2021-01-19 at 21:21 +0100, Daniel Gustafsson wrote:
> There is something iffy with these certs (the test fails
> on mismatching ciphers and/or signature algorithms) that I haven't been able to
> pin down, but to get more eyes on this I'm posting the patch with the test
> enabled.

Removing `--keyUsage keyEncipherment` from the native_server-* CSR
generation seems to let the tests pass for me, but I'm wary of just
pushing that as a solution because I don't understand why that would
have anything to do with the failure mode
(SSL_ERROR_NO_SUPPORTED_SIGNATURE_ALGORITHM).

> The NSS toolchain requires interactive input which makes the Makefile
> a bit hacky, ideas on cleaning that up are appreciated.

Hm. I got nothing, short of a feature request to NSS...

--Jacob
Reply | Threaded
Open this post in threaded view
|

Re: Support for NSS as a libpq TLS backend

Daniel Gustafsson
> On 20 Jan 2021, at 01:40, Jacob Champion <[hidden email]> wrote:

>
> On Tue, 2021-01-19 at 21:21 +0100, Daniel Gustafsson wrote:
>> There is something iffy with these certs (the test fails
>> on mismatching ciphers and/or signature algorithms) that I haven't been able to
>> pin down, but to get more eyes on this I'm posting the patch with the test
>> enabled.
>
> Removing `--keyUsage keyEncipherment` from the native_server-* CSR
> generation seems to let the tests pass for me, but I'm wary of just
> pushing that as a solution because I don't understand why that would
> have anything to do with the failure mode
> (SSL_ERROR_NO_SUPPORTED_SIGNATURE_ALGORITHM).
Aha, that was a good clue, I had overlooked the required extensions in the CSR.
Re-reading RFC 5280 it seems we need keyEncipherment, dataEncipherment and
digitalSignature to create a valid SSL Server certificate.  Adding those indeed
make the test pass.  Skimming the certutil code *I think* removing it as you
did cause a set of defaults to kick in that made it work based on the parameter
"--nsCertType sslServer", but it's not entirely easy to make out.  Either way,
relying on defaults in a test suite seems less than good, so I've extended the
Makefile to be explicit about the extensions.

The attached v22 rebase incorporates the fixup to the test Makefile, with not
further changes on top of that.

cheers ./daniel


v22-0001-NSS-Frontend-Backend-and-build-infrastructure.patch (159K) Download Attachment
v22-0002-NSS-Testharness-updates.patch (75K) Download Attachment
v22-0003-NSS-pg_strong_random-support.patch (5K) Download Attachment
v22-0004-NSS-Documentation.patch (29K) Download Attachment
v22-0005-NSS-contrib-modules.patch (42K) Download Attachment
v22-0006-NSS-cryptohash-support.patch (8K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Support for NSS as a libpq TLS backend

Jacob Champion-2
On Wed, 2021-01-20 at 12:58 +0100, Daniel Gustafsson wrote:
> Aha, that was a good clue, I had overlooked the required extensions in the CSR.
> Re-reading RFC 5280 it seems we need keyEncipherment, dataEncipherment and
> digitalSignature to create a valid SSL Server certificate.  Adding those indeed
> make the test pass.  Skimming the certutil code *I think* removing it as you
> did cause a set of defaults to kick in that made it work based on the parameter
> "--nsCertType sslServer", but it's not entirely easy to make out.

Lovely. I didn't expect *removing* an extension to effectively *add*
more, but I'm glad it works now.

==

To continue the Subject Common Name discussion [1] from a different
part of the thread:

Attached is a v23 version of the patchset that peels the raw Common
Name out from a client cert's Subject. This allows the following cases
that the OpenSSL implementation currently handles:

- subjects that don't begin with a CN
- subjects with quotable characters
- subjects that have no CN at all
Embedded NULLs are now handled in a similar manner to the OpenSSL side,
though because this failure happens during the certificate
authentication callback, it results in a TLS alert rather than simply
closing the connection.

For easier review of just the parts I've changed, I've also attached a
since-v22.diff, which is part of the 0001 patch.

--Jacob

[1]
https://www.postgresql.org/message-id/7d6a23a7e30540b486abc823f7ced7a93e1da1e8.camel%40vmware.com

since-v22.diff (6K) Download Attachment
v23-0001-NSS-Frontend-Backend-and-build-infrastructure.patch (159K) Download Attachment
v23-0002-NSS-Testharness-updates.patch (77K) Download Attachment
v23-0003-NSS-pg_strong_random-support.patch (5K) Download Attachment
v23-0004-NSS-Documentation.patch (27K) Download Attachment
v23-0005-NSS-contrib-modules.patch (40K) Download Attachment
v23-0006-NSS-cryptohash-support.patch (8K) Download Attachment
123