Support for NSS as a libpq TLS backend

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
53 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Re: Support for NSS as a libpq TLS backend

Michael Paquier-2
On Thu, Sep 17, 2020 at 11:41:28AM +0200, Daniel Gustafsson wrote:
> Attached is a v10 rebased to apply on top of HEAD.

I am afraid that this needs a new rebase.  The patch is failing to
apply, per the CF bot. :/
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Support for NSS as a libpq TLS backend

Daniel Gustafsson
> On 29 Sep 2020, at 07:59, Michael Paquier <[hidden email]> wrote:
>
> On Thu, Sep 17, 2020 at 11:41:28AM +0200, Daniel Gustafsson wrote:
>> Attached is a v10 rebased to apply on top of HEAD.
>
> I am afraid that this needs a new rebase.  The patch is failing to
> apply, per the CF bot. :/

It's failing on binary diffs due to the NSS certificate databases being
included to make hacking on the patch easier:

  File src/test/ssl/ssl/nss/server.crl: git binary diffs are not supported.

This is a limitation of the CFBot patch tester, the text portions of the patch
still applies with a tiny but of fuzz.

cheers ./daniel

Reply | Threaded
Open this post in threaded view
|

Re: Support for NSS as a libpq TLS backend

Daniel Gustafsson
> On 29 Sep 2020, at 09:52, Daniel Gustafsson <[hidden email]> wrote:

>
>> On 29 Sep 2020, at 07:59, Michael Paquier <[hidden email]> wrote:
>>
>> On Thu, Sep 17, 2020 at 11:41:28AM +0200, Daniel Gustafsson wrote:
>>> Attached is a v10 rebased to apply on top of HEAD.
>>
>> I am afraid that this needs a new rebase.  The patch is failing to
>> apply, per the CF bot. :/
>
> It's failing on binary diffs due to the NSS certificate databases being
> included to make hacking on the patch easier:
>
>  File src/test/ssl/ssl/nss/server.crl: git binary diffs are not supported.
>
> This is a limitation of the CFBot patch tester, the text portions of the patch
> still applies with a tiny but of fuzz.
Attached is a new version which doesn't contain the NSS certificate databases
to keep the CFBot happy.

It also implements server-side passphrase callbacks as well as re-enables the
tests for those.  The callback works a bit differently from the OpenSSL one as
it must run in the forked process, so it can't run on server reload.  There's
also no default fallback reading from a TTY like in OpenSSL, so if no callback
it set the always-failing dummy is set.

cheers ./daniel


0001-Support-for-NSS-as-a-TLS-backend-v11.patch (232K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Support for NSS as a libpq TLS backend

Daniel Gustafsson
The attached v12 adds support for pgcrypto as well as pg_strong_random, which I
believe completes the required subsystems where we have OpenSSL support today.
I opted for not adding code to handle the internal shaXXX implementations until
the dust settles around the proposal to change the API there.

Blowfish is not supported by NSS AFAICT, even though the cipher mechanism is
defined, so the internal implementation is used there instead.  CAST5 is
supported, but segfaults inside NSS on most inputs so support for that is not
included for now.

cheers ./daniel


0001-Support-for-NSS-as-a-TLS-backend-v12.patch (275K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Support for NSS as a libpq TLS backend

Andres Freund
Hi,

On 2020-10-20 14:24:24 +0200, Daniel Gustafsson wrote:

> From 0cb0e6a0ce9adb18bc9d212bd03e4e09fa452972 Mon Sep 17 00:00:00 2001
> From: Daniel Gustafsson <[hidden email]>
> Date: Thu, 8 Oct 2020 18:44:28 +0200
> Subject: [PATCH] Support for NSS as a TLS backend v12
> ---
>  configure                                     |  223 +++-
>  configure.ac                                  |   39 +-
>  contrib/Makefile                              |    2 +-
>  contrib/pgcrypto/Makefile                     |    5 +
>  contrib/pgcrypto/nss.c                        |  773 +++++++++++
>  contrib/pgcrypto/openssl.c                    |    2 +-
>  contrib/pgcrypto/px.c                         |    1 +
>  contrib/pgcrypto/px.h                         |    1 +

Personally I'd like to see this patch broken up a bit - it's quite
large. Several of the changes could easily be committed separately, no?


>  if test "$with_openssl" = yes ; then
> +  if test x"$with_nss" = x"yes" ; then
> +    AC_MSG_ERROR([multiple SSL backends cannot be enabled simultaneously"])
> +  fi

Based on a quick look there's no similar error check for the msvc
build. Should there be?

>  
> +if test "$with_nss" = yes ; then
> +  if test x"$with_openssl" = x"yes" ; then
> +    AC_MSG_ERROR([multiple SSL backends cannot be enabled simultaneously"])
> +  fi

Isn't this a repetition of the earlier check?


> +  CLEANLDFLAGS="$LDFLAGS"
> +  # TODO: document this set of LDFLAGS
> +  LDFLAGS="-lssl3 -lsmime3 -lnss3 -lplds4 -lplc4 -lnspr4 $LDFLAGS"

Shouldn't this use nss-config or such?


> +if test "$with_nss" = yes ; then
> +  AC_CHECK_HEADER(ssl.h, [], [AC_MSG_ERROR([header file <ssl.h> is required for NSS])])
> +  AC_CHECK_HEADER(nss.h, [], [AC_MSG_ERROR([header file <nss.h> is required for NSS])])
> +fi

Hm. For me, on debian, these headers are not directly in the default
include search path, but would be as nss/ssl.h. I don't see you adding
nss/ to CFLAGS anywhere? How does this work currently?

I think it'd also be better if we could include these files as nss/ssl.h
etc - ssl.h is a name way too likely to conflict imo.

> +++ b/src/backend/libpq/be-secure-nss.c
> @@ -0,0 +1,1158 @@
> +/*
> + * BITS_PER_BYTE is also defined in the NSPR header files, so we need to undef
> + * our version to avoid compiler warnings on redefinition.
> + */
> +#define pg_BITS_PER_BYTE BITS_PER_BYTE
> +#undef BITS_PER_BYTE

Most compilers/preprocessors don't warn about redefinitions when they
would result in the same value (IIRC we have some cases of redefinitions
in tree even). Does nspr's differ?


> +/*
> + * The nspr/obsolete/protypes.h NSPR header typedefs uint64 and int64 with
> + * colliding definitions from ours, causing a much expected compiler error.
> + * The definitions are however not actually used in NSPR at all, and are only
> + * intended for what seems to be backwards compatibility for apps written
> + * against old versions of NSPR.  The following comment is in the referenced
> + * file, and was added in 1998:
> + *
> + * This section typedefs the old 'native' types to the new PR<type>s.
> + * These definitions are scheduled to be eliminated at the earliest
> + * possible time. The NSPR API is implemented and documented using
> + * the new definitions.
> + *
> + * As there is no opt-out from pulling in these typedefs, we define the guard
> + * for the file to exclude it. This is incredibly ugly, but seems to be about
> + * the only way around it.
> + */
> +#define PROTYPES_H
> +#include <nspr.h>
> +#undef PROTYPES_H

Yuck :(.


> +int
> +be_tls_init(bool isServerStart)
> +{
> + SECStatus status;
> + SSLVersionRange supported_sslver;
> +
> + /*
> + * Set up the connection cache for multi-processing application behavior.

Hm. Do we necessarily want that? Session resumption is not exactly
unproblematic... Or does this do something else?


> + * If we are in ServerStart then we initialize the cache. If the server is
> + * already started, we inherit the cache such that it can be used for
> + * connections. Calling SSL_ConfigMPServerSIDCache sets an environment
> + * variable which contains enough information for the forked child to know
> + * how to access it.  Passing NULL to SSL_InheritMPServerSIDCache will
> + * make the forked child look it up by the default name SSL_INHERITANCE,
> + * if env vars aren't inherited then the contents of the variable can be
> + * passed instead.
> + */

Does this stuff work on windows / EXEC_BACKEND?


> + * The below parameters are what the implicit initialization would've done
> + * for us, and should work even for older versions where it might not be
> + * done automatically. The last parameter, maxPTDs, is set to various
> + * values in other codebases, but has been unused since NSPR 2.1 which was
> + * released sometime in 1998.
> + */
> + PR_Init(PR_USER_THREAD, PR_PRIORITY_NORMAL, 0 /* maxPTDs */ );

https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSPR/Reference/PR_Init
says that currently all parameters are ignored?




> + /*
> + * Import the already opened socket as we don't want to use NSPR functions
> + * for opening the network socket due to how the PostgreSQL protocol works
> + * with TLS connections. This function is not part of the NSPR public API,
> + * see the comment at the top of the file for the rationale of still using
> + * it.
> + */
> + pr_fd = PR_ImportTCPSocket(port->sock);
> + if (!pr_fd)
> + ereport(ERROR,
> + (errmsg("unable to connect to socket")));

I don't see the comment you're referring to?


> + /*
> + * Most of the documentation available, and implementations of, NSS/NSPR
> + * use the PR_NewTCPSocket() function here, which has the drawback that it
> + * can only create IPv4 sockets. Instead use PR_OpenTCPSocket() which
> + * copes with IPv6 as well.
> + */
> + model = PR_OpenTCPSocket(port->laddr.addr.ss_family);
> + if (!model)
> + ereport(ERROR,
> + (errmsg("unable to open socket")));
> +
> + /*
> + * Convert the NSPR socket to an SSL socket. Ensuring the success of this
> + * operation is critical as NSS SSL_* functions may return SECSuccess on
> + * the socket even though SSL hasn't been enabled, which introduce a risk
> + * of silent downgrades.
> + */
> + model = SSL_ImportFD(NULL, model);
> + if (!model)
> + ereport(ERROR,
> + (errmsg("unable to enable TLS on socket")));

It's confusing that these functions do not actually reference the socket
via some handle :(. What does opening a socket do here?


> + /*
> + * Configure the allowed cipher. If there are no user preferred suites,

*ciphers?

> +
> + port->pr_fd = SSL_ImportFD(model, pr_fd);
> + if (!port->pr_fd)
> + ereport(ERROR,
> + (errmsg("unable to initialize")));
> +
> + PR_Close(model);

A comment explaining why we first import a NULL into the model, and then
release the model, and import the real fd would be good.


> +ssize_t
> +be_tls_read(Port *port, void *ptr, size_t len, int *waitfor)
> +{
> + ssize_t n_read;
> + PRErrorCode err;
> +
> + n_read = PR_Read(port->pr_fd, ptr, len);
> +
> + if (n_read < 0)
> + {
> + err = PR_GetError();

Yay, more thread global state :(.

> + /* XXX: This logic seems potentially bogus? */
> + if (err == PR_WOULD_BLOCK_ERROR)
> + *waitfor = WL_SOCKET_READABLE;
> + else
> + *waitfor = WL_SOCKET_WRITEABLE;

Don't we need to handle failed connections somewhere here? secure_read()
won't know about PR_GetError() etc? How would SSL errors be signalled
upwards here?

Also, as you XXX, it's not clear to me that your mapping would always
result in waiting for the right event? A tls write could e.g. very well
require receiving data etc?

> + /*
> + * At least one byte with password content was returned, and NSS requires
> + * that we return it allocated in NSS controlled memory. If we fail to
> + * allocate then abort without passing back NULL and bubble up the error
> + * on the PG side.
> + */
> + password = (char *) PR_Malloc(len + 1);
> + if (!password)
> + ereport(ERROR,
> + (errcode(ERRCODE_OUT_OF_MEMORY),
> + errmsg("out of memory")));
>
> + strlcpy(password, buf, sizeof(password));
> + explicit_bzero(buf, sizeof(buf));
> +

In case of error you're not bzero'ing out the password!

Separately, I wonder if we should introduce a function for throwing OOM
errors - which then e.g. could print the memory context stats in those
places too...


> +static SECStatus
> +pg_cert_auth_handler(void *arg, PRFileDesc * fd, PRBool checksig, PRBool isServer)
> +{
> + SECStatus status;
> + Port   *port = (Port *) arg;
> + CERTCertificate *cert;
> + char   *peer_cn;
> + int len;
> +
> + status = SSL_AuthCertificate(CERT_GetDefaultCertDB(), port->pr_fd, checksig, PR_TRUE);
> + if (status == SECSuccess)
> + {
> + cert = SSL_PeerCertificate(port->pr_fd);
> + len = strlen(cert->subjectName);
> + peer_cn = MemoryContextAllocZero(TopMemoryContext, len + 1);
> + if (strncmp(cert->subjectName, "CN=", 3) == 0)
> + strlcpy(peer_cn, cert->subjectName + strlen("CN="), len + 1);
> + else
> + strlcpy(peer_cn, cert->subjectName, len + 1);
> + CERT_DestroyCertificate(cert);
> +
> + port->peer_cn = peer_cn;
> + port->peer_cert_valid = true;

Hm. We either should have something similar to

                        /*
                         * Reject embedded NULLs in certificate common name to prevent
                         * attacks like CVE-2009-4034.
                         */
                        if (len != strlen(peer_cn))
                        {
                                ereport(COMMERROR,
                                                (errcode(ERRCODE_PROTOCOL_VIOLATION),
                                                 errmsg("SSL certificate's common name contains embedded null")));
                                pfree(peer_cn);
                                return -1;
                        }
here, or a comment explaining why not.

Also, what's up with the CN= bit? Why is that needed here, but not for
openssl?


> +static PRFileDesc *
> +init_iolayer(Port *port, int loglevel)
> +{
> + const PRIOMethods *default_methods;
> + PRFileDesc *layer;
> +
> + /*
> + * Start by initializing our layer with all the default methods so that we
> + * can selectively override the ones we want while still ensuring that we
> + * have a complete layer specification.
> + */
> + default_methods = PR_GetDefaultIOMethods();
> + memcpy(&pr_iomethods, default_methods, sizeof(PRIOMethods));
> +
> + pr_iomethods.recv = pg_ssl_read;
> + pr_iomethods.send = pg_ssl_write;
> +
> + /*
> + * Each IO layer must be identified by a unique name, where uniqueness is
> + * per connection. Each connection in a postgres cluster can generate the
> + * identity from the same string as they will create their IO layers on
> + * different sockets. Only one layer per socket can have the same name.
> + */
> + pr_id = PR_GetUniqueIdentity("PostgreSQL");

Seems like it might not be a bad idea to append Server or something?


> +
> + /*
> + * Create the actual IO layer as a stub such that it can be pushed onto
> + * the layer stack. The step via a stub is required as we define custom
> + * callbacks.
> + */
> + layer = PR_CreateIOLayerStub(pr_id, &pr_iomethods);
> + if (!layer)
> + {
> + ereport(loglevel,
> + (errmsg("unable to create NSS I/O layer")));
> + return NULL;
> + }

Why is this accepting a variable log level? The only caller passes ERROR?


> +/*
> + * pg_SSLerrmessage
> + * Create and return a human readable error message given
> + * the specified error code
> + *
> + * PR_ErrorToName only converts the enum identifier of the error to string,
> + * but that can be quite useful for debugging (and in case PR_ErrorToString is
> + * unable to render a message then we at least have something).
> + */
> +static char *
> +pg_SSLerrmessage(PRErrorCode errcode)
> +{
> + char error[128];
> + int ret;
> +
> + /* TODO: this should perhaps use a StringInfo instead.. */
> + ret = pg_snprintf(error, sizeof(error), "%s (%s)",
> +  PR_ErrorToString(errcode, PR_LANGUAGE_I_DEFAULT),
> +  PR_ErrorToName(errcode));
> + if (ret)
> + return pstrdup(error);

> + return pstrdup(_("unknown TLS error"));
> +}

Why not use psrintf() here?



> +++ b/src/include/common/pg_nss.h
> @@ -0,0 +1,141 @@
> +/*-------------------------------------------------------------------------
> + *
> + * pg_nss.h
> + *  Support for NSS as a TLS backend
> + *
> + * These definitions are used by both frontend and backend code.
> + *
> + * Copyright (c) 2020, PostgreSQL Global Development Group
> + *
> + * IDENTIFICATION
> + *        src/include/common/pg_nss.h
> + *
> + *-------------------------------------------------------------------------
> + */
> +#ifndef PG_NSS_H
> +#define PG_NSS_H
> +
> +#ifdef USE_NSS
> +
> +#include <sslproto.h>
> +
> +PRUint16 pg_find_cipher(char *name);
> +
> +typedef struct
> +{
> + const char *name;
> + PRUint16 number;
> +} NSSCiphers;
> +
> +#define INVALID_CIPHER 0xFFFF
> +
> +/*
> + * This list is a partial copy of the ciphers in NSS files lib/ssl/sslproto.h
> + * in order to provide a human readable version of the ciphers. It would be
> + * nice to not have to have this, but NSS doesn't provide any API addressing
> + * the ciphers by name. TODO: do we want more of the ciphers, or perhaps less?
> + */
> +static const NSSCiphers NSS_CipherList[] = {
> +
> + {"TLS_NULL_WITH_NULL_NULL", TLS_NULL_WITH_NULL_NULL},

Hm. Is this whole business of defining array constants in a header just
done to avoid having a .c file that needs to be compiled both in
frontend and backend code?


> +/*
> + * The nspr/obsolete/protypes.h NSPR header typedefs uint64 and int64 with
> + * colliding definitions from ours, causing a much expected compiler error.
> + * The definitions are however not actually used in NSPR at all, and are only
> + * intended for what seems to be backwards compatibility for apps written
> + * against old versions of NSPR.  The following comment is in the referenced
> + * file, and was added in 1998:
> + *
> + * This section typedefs the old 'native' types to the new PR<type>s.
> + * These definitions are scheduled to be eliminated at the earliest
> + * possible time. The NSPR API is implemented and documented using
> + * the new definitions.
> + *
> + * As there is no opt-out from pulling in these typedefs, we define the guard
> + * for the file to exclude it. This is incredibly ugly, but seems to be about
> + * the only way around it.
> + */

There's a lot of duplicated comments here. Could we move either of the
files to reference the other for longer ones?



> +/*
> + * PR_ImportTCPSocket() is a private API, but very widely used, as it's the
> + * only way to make NSS use an already set up POSIX file descriptor rather
> + * than opening one itself. To quote the NSS documentation:
> + *
> + * "In theory, code that uses PR_ImportTCPSocket may break when NSPR's
> + * implementation changes. In practice, this is unlikely to happen because
> + * NSPR's implementation has been stable for years and because of NSPR's
> + * strong commitment to backward compatibility."
> + *
> + * https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSPR/Reference/PR_ImportTCPSocket
> + *
> + * The function is declared in <private/pprio.h>, but as it is a header marked
> + * private we declare it here rather than including it.
> + */
> +NSPR_API(PRFileDesc *) PR_ImportTCPSocket(int);

Ugh. This is really the way to do this? How do other applications deal
with this problem?


> +#if defined(WIN32)
> +static const char *ca_trust_name = "nssckbi.dll";
> +#elif defined(__darwin__)
> +static const char *ca_trust_name = "libnssckbi.dylib";
> +#else
> +static const char *ca_trust_name = "libnssckbi.so";
> +#endif

There's really no pre-existing handling for this in nss???


> + /*
> + * The original design of NSS was for a single application to use a single
> + * copy of it, initialized with NSS_Initialize() which isn't returning any
> + * handle with which to refer to NSS. NSS initialization and shutdown are
> + * global for the application, so a shutdown in another NSS enabled
> + * library would cause NSS to be stopped for libpq as well.  The fix has
> + * been to introduce NSS_InitContext which returns a context handle to
> + * pass to NSS_ShutdownContext.  NSS_InitContext was introduced in NSS
> + * 3.12, but the use of it is not very well documented.
> + * https://bugzilla.redhat.com/show_bug.cgi?id=738456
> + *
> + * The InitParameters struct passed can be used to override internal
> + * values in NSS, but the usage is not documented at all. When using
> + * NSS_Init initializations, the values are instead set via PK11_Configure
> + * calls so the PK11_Configure documentation can be used to glean some
> + * details on these.
> + *
> + * https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSS/PKCS11/Module_Specs

> +
> + if (!nss_context)
> + {
> + char   *err = pg_SSLerrmessage(PR_GetError());
> +
> + printfPQExpBuffer(&conn->errorMessage,
> +  libpq_gettext("unable to %s certificate database: %s"),
> +  conn->cert_database ? "open" : "create",
> +  err);
> + free(err);
> + return PGRES_POLLING_FAILED;
> + }
> +
> + /*
> + * Configure cipher policy.
> + */
> + status = NSS_SetDomesticPolicy();

Why is "domestic" the right thing here?


> +
> + PK11_SetPasswordFunc(PQssl_passwd_cb);

Is it actually OK to do stuff like this when other users of NSS might be
present? That's obviously more likely in the libpq case, compared to the
backend case (where it's also possible, of course). What prevents us
from overriding another user's callback?


> +ssize_t
> +pgtls_read(PGconn *conn, void *ptr, size_t len)
> +{
> + PRInt32 nread;
> + PRErrorCode status;
> + int read_errno = 0;
> +
> + nread = PR_Recv(conn->pr_fd, ptr, len, 0, PR_INTERVAL_NO_WAIT);
> +
> + /*
> + * PR_Recv blocks until there is data to read or the timeout expires. Zero
> + * is returned for closed connections, while -1 indicates an error within
> + * the ongoing connection.
> + */
> + if (nread == 0)
> + {
> + read_errno = ECONNRESET;
> + return -1;
> + }

It's a bit confusing to talk about blocking when the socket presumably
is in non-blocking mode, and you're also asking to never wait?


> + if (nread == -1)
> + {
> + status = PR_GetError();
> +
> + switch (status)
> + {
> + case PR_WOULD_BLOCK_ERROR:
> + read_errno = EINTR;
> + break;

Uh, isn't this going to cause a busy-loop by the caller? EINTR isn't the
same as EAGAIN/EWOULDBLOCK?


> + case PR_IO_TIMEOUT_ERROR:
> + break;

What does this mean? We'll return with a 0 errno here, right? When is
this case reachable?

E.g. the comment in fe-misc.c:
                                /* pqsecure_read set the error message for us */
for this case doesn't seem to be fulfilled by this.


> +/*
> + * Verify that the server certificate matches the hostname we connected to.
> + *
> + * The certificate's Common Name and Subject Alternative Names are considered.
> + */
> +int
> +pgtls_verify_peer_name_matches_certificate_guts(PGconn *conn,
> + int *names_examined,
> + char **first_name)
> +{
> + return 1;
> +}

Uh, huh? Certainly doesn't verify anything...


> +/* ------------------------------------------------------------ */
> +/* PostgreSQL specific TLS support functions */
> +/* ------------------------------------------------------------ */
> +
> +/*
> + * TODO: this a 99% copy of the same function in the backend, make these share
> + * a single implementation instead.
> + */
> +static char *
> +pg_SSLerrmessage(PRErrorCode errcode)
> +{
> + const char *error;
> +
> + error = PR_ErrorToName(errcode);
> + if (error)
> + return strdup(error);
> +
> + return strdup("unknown TLS error");
> +}

Btw, why does this need to duplicate strings, instead of returning a
const char*?



Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: Support for NSS as a libpq TLS backend

Daniel Gustafsson
> On 20 Oct 2020, at 21:15, Andres Freund <[hidden email]> wrote:
>
> Hi,

Thanks for your review, much appreciated!

> On 2020-10-20 14:24:24 +0200, Daniel Gustafsson wrote:
>> From 0cb0e6a0ce9adb18bc9d212bd03e4e09fa452972 Mon Sep 17 00:00:00 2001
>> From: Daniel Gustafsson <[hidden email]>
>> Date: Thu, 8 Oct 2020 18:44:28 +0200
>> Subject: [PATCH] Support for NSS as a TLS backend v12
>> ---
>> configure                                     |  223 +++-
>> configure.ac                                  |   39 +-
>> contrib/Makefile                              |    2 +-
>> contrib/pgcrypto/Makefile                     |    5 +
>> contrib/pgcrypto/nss.c                        |  773 +++++++++++
>> contrib/pgcrypto/openssl.c                    |    2 +-
>> contrib/pgcrypto/px.c                         |    1 +
>> contrib/pgcrypto/px.h                         |    1 +
>
> Personally I'd like to see this patch broken up a bit - it's quite
> large. Several of the changes could easily be committed separately, no?
Not sure how much of this makes sense committed separately (unless separately
means in quick succession), but it could certainly be broken up for the sake of
making review easier.  I will take a stab at that, but in a follow-up email as
I would like the split to be a version just doing the split and not also
introducing/fixing things.

>> if test "$with_openssl" = yes ; then
>> +  if test x"$with_nss" = x"yes" ; then
>> +    AC_MSG_ERROR([multiple SSL backends cannot be enabled simultaneously"])
>> +  fi
>
> Based on a quick look there's no similar error check for the msvc
> build. Should there be?

Thats a good question.  When embarking on this is seemed quite natural to me
that it should be, but now I'm not so sure.  Maybe there should be a
--with-openssl-preferred like how we handle readline/libedit or just allow
multiple and let the last one win?  Do you have any input on what would make
sense?

The only thing I think makes no sense is to allow multiple ones at the same
time given the current autoconf switches, even if it would just be to pick say
pg_strong_random from one and libpq TLS from another.

>> +if test "$with_nss" = yes ; then
>> +  if test x"$with_openssl" = x"yes" ; then
>> +    AC_MSG_ERROR([multiple SSL backends cannot be enabled simultaneously"])
>> +  fi
>
> Isn't this a repetition of the earlier check?

It is, and it we want to keep such a check it should be broken out into a
separate step performed before all library specific checks IMO.

>> +  CLEANLDFLAGS="$LDFLAGS"
>> +  # TODO: document this set of LDFLAGS
>> +  LDFLAGS="-lssl3 -lsmime3 -lnss3 -lplds4 -lplc4 -lnspr4 $LDFLAGS"
>
> Shouldn't this use nss-config or such?

Indeed it should, where available.  I've added rudimentary support for that
without a fallback as of now.

>> +if test "$with_nss" = yes ; then
>> +  AC_CHECK_HEADER(ssl.h, [], [AC_MSG_ERROR([header file <ssl.h> is required for NSS])])
>> +  AC_CHECK_HEADER(nss.h, [], [AC_MSG_ERROR([header file <nss.h> is required for NSS])])
>> +fi
>
> Hm. For me, on debian, these headers are not directly in the default
> include search path, but would be as nss/ssl.h. I don't see you adding
> nss/ to CFLAGS anywhere? How does this work currently?

I had Stockholm-syndromed myself into passing --with-includes and hadn't really
realized. Sometimes the obvious is too obvious in a 4000+ LOC patch.

> I think it'd also be better if we could include these files as nss/ssl.h
> etc - ssl.h is a name way too likely to conflict imo.

I've changed this to be nss/ssl.h and nspr/nspr.h etc, but the include path
will still need the direct path to the headers (from autoconf) since nss.h
includes NSPR headers as #include <nspr.h> and so on.

>> +++ b/src/backend/libpq/be-secure-nss.c
>> @@ -0,0 +1,1158 @@
>> +/*
>> + * BITS_PER_BYTE is also defined in the NSPR header files, so we need to undef
>> + * our version to avoid compiler warnings on redefinition.
>> + */
>> +#define pg_BITS_PER_BYTE BITS_PER_BYTE
>> +#undef BITS_PER_BYTE
>
> Most compilers/preprocessors don't warn about redefinitions when they
> would result in the same value (IIRC we have some cases of redefinitions
> in tree even). Does nspr's differ?
GCC 8.3 in my Debian installation throws the below warning:

    In file included from /usr/include/nspr/prtypes.h:26,
                     from /usr/include/nspr/pratom.h:14,
                     from /usr/include/nspr/nspr.h:9,
                     from be-secure-nss.c:45:
    /usr/include/nspr/prcpucfg.h:1143: warning: "BITS_PER_BYTE" redefined
     #define BITS_PER_BYTE  PR_BITS_PER_BYTE

    In file included from ../../../src/include/c.h:55,
                     from ../../../src/include/postgres.h:46,
                     from be-secure-nss.c:16:
    ../../../src/include/pg_config_manual.h:115: note: this is the location of the previous definition
     #define BITS_PER_BYTE  8

PR_BITS_PER_BYTE is defined per platform in pr/include/md/_<platform>.cfg and
is as expected 8. I assume it's that indirection which cause the warning?

>> +/*
>> + * The nspr/obsolete/protypes.h NSPR header typedefs uint64 and int64 with
>> + * colliding definitions from ours, causing a much expected compiler error.
>> + * The definitions are however not actually used in NSPR at all, and are only
>> + * intended for what seems to be backwards compatibility for apps written
>> + * against old versions of NSPR.  The following comment is in the referenced
>> + * file, and was added in 1998:
>> + *
>> + * This section typedefs the old 'native' types to the new PR<type>s.
>> + * These definitions are scheduled to be eliminated at the earliest
>> + * possible time. The NSPR API is implemented and documented using
>> + * the new definitions.
>> + *
>> + * As there is no opt-out from pulling in these typedefs, we define the guard
>> + * for the file to exclude it. This is incredibly ugly, but seems to be about
>> + * the only way around it.
>> + */
>> +#define PROTYPES_H
>> +#include <nspr.h>
>> +#undef PROTYPES_H
>
> Yuck :(.
Thats not an understatement.  Taking another dive into the NSPR code I did
however find a proper way to deal with this.  Defining NO_NSPR_10_SUPPORT stops
NSPR from using the files in obsolete/. So fixed, yay!

>> +int
>> +be_tls_init(bool isServerStart)
>> +{
>> + SECStatus status;
>> + SSLVersionRange supported_sslver;
>> +
>> + /*
>> + * Set up the connection cache for multi-processing application behavior.
>
> Hm. Do we necessarily want that? Session resumption is not exactly
> unproblematic... Or does this do something else?
From my reading of the docs, and experience with the code, a server application
must set up a connection cache in order to accept connections.  Not entirely
sure, and the docs aren't terribly clear for non SSLv2/v3 environments (it
seems to only cache for SSLv2/3 and not TLSv+) but it seems like it may have
other uses internally.  I will hunt down some more information on the NSS
mailing list.

>> + * If we are in ServerStart then we initialize the cache. If the server is
>> + * already started, we inherit the cache such that it can be used for
>> + * connections. Calling SSL_ConfigMPServerSIDCache sets an environment
>> + * variable which contains enough information for the forked child to know
>> + * how to access it.  Passing NULL to SSL_InheritMPServerSIDCache will
>> + * make the forked child look it up by the default name SSL_INHERITANCE,
>> + * if env vars aren't inherited then the contents of the variable can be
>> + * passed instead.
>> + */
>
> Does this stuff work on windows
According to the documentation it does, and Andrew had this working on Windows
in an earlier version of the patch.  I need to get a proper Windows env for
testing/dev up and running as mine has bitrotted to nothingness.

> / EXEC_BACKEND?

That's a good point, maybe we need to do a SSL_ConfigServerSessionIDCache
rather than the MP version for EXEC_BACKEND? Not sure.

>> + * The below parameters are what the implicit initialization would've done
>> + * for us, and should work even for older versions where it might not be
>> + * done automatically. The last parameter, maxPTDs, is set to various
>> + * values in other codebases, but has been unused since NSPR 2.1 which was
>> + * released sometime in 1998.
>> + */
>> + PR_Init(PR_USER_THREAD, PR_PRIORITY_NORMAL, 0 /* maxPTDs */ );
>
> https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSPR/Reference/PR_Init
> says that currently all parameters are ignored?
Right, my comment didn't reflect that they're all dead these days, only that
one of them has been unused since RUN DMC topped the charts with "It's like
that". Comment updated.

>> + /*
>> + * Import the already opened socket as we don't want to use NSPR functions
>> + * for opening the network socket due to how the PostgreSQL protocol works
>> + * with TLS connections. This function is not part of the NSPR public API,
>> + * see the comment at the top of the file for the rationale of still using
>> + * it.
>> + */
>> + pr_fd = PR_ImportTCPSocket(port->sock);
>> + if (!pr_fd)
>> + ereport(ERROR,
>> + (errmsg("unable to connect to socket")));
>
> I don't see the comment you're referring to?
It's referring to the comment discussing PR_ImportTCPSocket being a private API
call, yet still used by everyone (which is also discussed later in this review).

>> + /*
>> + * Most of the documentation available, and implementations of, NSS/NSPR
>> + * use the PR_NewTCPSocket() function here, which has the drawback that it
>> + * can only create IPv4 sockets. Instead use PR_OpenTCPSocket() which
>> + * copes with IPv6 as well.
>> + */
>> + model = PR_OpenTCPSocket(port->laddr.addr.ss_family);
>> + if (!model)
>> + ereport(ERROR,
>> + (errmsg("unable to open socket")));
>> +
>> + /*
>> + * Convert the NSPR socket to an SSL socket. Ensuring the success of this
>> + * operation is critical as NSS SSL_* functions may return SECSuccess on
>> + * the socket even though SSL hasn't been enabled, which introduce a risk
>> + * of silent downgrades.
>> + */
>> + model = SSL_ImportFD(NULL, model);
>> + if (!model)
>> + ereport(ERROR,
>> + (errmsg("unable to enable TLS on socket")));
>
> It's confusing that these functions do not actually reference the socket
> via some handle :(. What does opening a socket do here?
This specific call converts the socket from a plain NSPR socket to an SSL/TLS
capable socket which NSS will work with.  This is a required step for
"activating" NSS on the socket.

>> + /*
>> + * Configure the allowed cipher. If there are no user preferred suites,
>
> *ciphers?

Yes, fixed.

>> +
>> + port->pr_fd = SSL_ImportFD(model, pr_fd);
>> + if (!port->pr_fd)
>> + ereport(ERROR,
>> + (errmsg("unable to initialize")));
>> +
>> + PR_Close(model);
>
> A comment explaining why we first import a NULL into the model, and then
> release the model, and import the real fd would be good.
I've added a small comment to explain how the model is a configuration template
for the actual socket.  This part of NSS/NSPR is a bit overcomplicated for how
we have connections, it's more geared towards having many open sockets in the
same process.

>> +ssize_t
>> +be_tls_read(Port *port, void *ptr, size_t len, int *waitfor)
>> +{
>> + ssize_t n_read;
>> + PRErrorCode err;
>> +
>> + n_read = PR_Read(port->pr_fd, ptr, len);
>> +
>> + if (n_read < 0)
>> + {
>> + err = PR_GetError();
>
> Yay, more thread global state :(.
Sorry about that.

>> + /* XXX: This logic seems potentially bogus? */
>> + if (err == PR_WOULD_BLOCK_ERROR)
>> + *waitfor = WL_SOCKET_READABLE;
>> + else
>> + *waitfor = WL_SOCKET_WRITEABLE;
>
> Don't we need to handle failed connections somewhere here? secure_read()
> won't know about PR_GetError() etc? How would SSL errors be signalled
> upwards here?
>
> Also, as you XXX, it's not clear to me that your mapping would always
> result in waiting for the right event? A tls write could e.g. very well
> require receiving data etc?
Fixed, but there might be more to be done here.

>> + /*
>> + * At least one byte with password content was returned, and NSS requires
>> + * that we return it allocated in NSS controlled memory. If we fail to
>> + * allocate then abort without passing back NULL and bubble up the error
>> + * on the PG side.
>> + */
>> + password = (char *) PR_Malloc(len + 1);
>> + if (!password)
>> + ereport(ERROR,
>> + (errcode(ERRCODE_OUT_OF_MEMORY),
>> + errmsg("out of memory")));
>>
>> + strlcpy(password, buf, sizeof(password));
>> + explicit_bzero(buf, sizeof(buf));
>> +
>
> In case of error you're not bzero'ing out the password!
Fixed.

> Separately, I wonder if we should introduce a function for throwing OOM
> errors - which then e.g. could print the memory context stats in those
> places too...

+1. I'd be happy to review such a patch.

>> +static SECStatus
>> +pg_cert_auth_handler(void *arg, PRFileDesc * fd, PRBool checksig, PRBool isServer)
>> +{
>> + SECStatus status;
>> + Port   *port = (Port *) arg;
>> + CERTCertificate *cert;
>> + char   *peer_cn;
>> + int len;
>> +
>> + status = SSL_AuthCertificate(CERT_GetDefaultCertDB(), port->pr_fd, checksig, PR_TRUE);
>> + if (status == SECSuccess)
>> + {
>> + cert = SSL_PeerCertificate(port->pr_fd);
>> + len = strlen(cert->subjectName);
>> + peer_cn = MemoryContextAllocZero(TopMemoryContext, len + 1);
>> + if (strncmp(cert->subjectName, "CN=", 3) == 0)
>> + strlcpy(peer_cn, cert->subjectName + strlen("CN="), len + 1);
>> + else
>> + strlcpy(peer_cn, cert->subjectName, len + 1);
>> + CERT_DestroyCertificate(cert);
>> +
>> + port->peer_cn = peer_cn;
>> + port->peer_cert_valid = true;
>
> Hm. We either should have something similar to
>
> /*
> * Reject embedded NULLs in certificate common name to prevent
> * attacks like CVE-2009-4034.
> */
> if (len != strlen(peer_cn))
> {
> ereport(COMMERROR,
> (errcode(ERRCODE_PROTOCOL_VIOLATION),
> errmsg("SSL certificate's common name contains embedded null")));
> pfree(peer_cn);
> return -1;
> }
> here, or a comment explaining why not.
We should, but it's proving rather difficult as there is no equivalent API call
to get the string as well as the expected length of it.

> Also, what's up with the CN= bit? Why is that needed here, but not for
> openssl?

OpenSSL returns only the value portion, whereas NSS returns key=value so we
need to skip over the key= part.

>> +static PRFileDesc *
>> +init_iolayer(Port *port, int loglevel)
>> +{
>> + const PRIOMethods *default_methods;
>> + PRFileDesc *layer;
>> +
>> + /*
>> + * Start by initializing our layer with all the default methods so that we
>> + * can selectively override the ones we want while still ensuring that we
>> + * have a complete layer specification.
>> + */
>> + default_methods = PR_GetDefaultIOMethods();
>> + memcpy(&pr_iomethods, default_methods, sizeof(PRIOMethods));
>> +
>> + pr_iomethods.recv = pg_ssl_read;
>> + pr_iomethods.send = pg_ssl_write;
>> +
>> + /*
>> + * Each IO layer must be identified by a unique name, where uniqueness is
>> + * per connection. Each connection in a postgres cluster can generate the
>> + * identity from the same string as they will create their IO layers on
>> + * different sockets. Only one layer per socket can have the same name.
>> + */
>> + pr_id = PR_GetUniqueIdentity("PostgreSQL");
>
> Seems like it might not be a bad idea to append Server or something?
Fixed.

>> + /*
>> + * Create the actual IO layer as a stub such that it can be pushed onto
>> + * the layer stack. The step via a stub is required as we define custom
>> + * callbacks.
>> + */
>> + layer = PR_CreateIOLayerStub(pr_id, &pr_iomethods);
>> + if (!layer)
>> + {
>> + ereport(loglevel,
>> + (errmsg("unable to create NSS I/O layer")));
>> + return NULL;
>> + }
>
> Why is this accepting a variable log level? The only caller passes ERROR?
Good catch, that's a leftover from a previous version which no longer makes
sense.  loglevel param removed.

>> +/*
>> + * pg_SSLerrmessage
>> + * Create and return a human readable error message given
>> + * the specified error code
>> + *
>> + * PR_ErrorToName only converts the enum identifier of the error to string,
>> + * but that can be quite useful for debugging (and in case PR_ErrorToString is
>> + * unable to render a message then we at least have something).
>> + */
>> +static char *
>> +pg_SSLerrmessage(PRErrorCode errcode)
>> +{
>> + char error[128];
>> + int ret;
>> +
>> + /* TODO: this should perhaps use a StringInfo instead.. */
>> + ret = pg_snprintf(error, sizeof(error), "%s (%s)",
>> +  PR_ErrorToString(errcode, PR_LANGUAGE_I_DEFAULT),
>> +  PR_ErrorToName(errcode));
>> + if (ret)
>> + return pstrdup(error);
>
>> + return pstrdup(_("unknown TLS error"));
>> +}
>
> Why not use psrintf() here?
Thats a good question to which I don't have a good answer.  Changed to doing
just that.

>> +++ b/src/include/common/pg_nss.h
>> @@ -0,0 +1,141 @@
>> +/*-------------------------------------------------------------------------
>> + *
>> + * pg_nss.h
>> + *  Support for NSS as a TLS backend
>> + *
>> + * These definitions are used by both frontend and backend code.
>> + *
>> + * Copyright (c) 2020, PostgreSQL Global Development Group
>> + *
>> + * IDENTIFICATION
>> + *        src/include/common/pg_nss.h
>> + *
>> + *-------------------------------------------------------------------------
>> + */
>> +#ifndef PG_NSS_H
>> +#define PG_NSS_H
>> +
>> +#ifdef USE_NSS
>> +
>> +#include <sslproto.h>
>> +
>> +PRUint16 pg_find_cipher(char *name);
>> +
>> +typedef struct
>> +{
>> + const char *name;
>> + PRUint16 number;
>> +} NSSCiphers;
>> +
>> +#define INVALID_CIPHER 0xFFFF
>> +
>> +/*
>> + * This list is a partial copy of the ciphers in NSS files lib/ssl/sslproto.h
>> + * in order to provide a human readable version of the ciphers. It would be
>> + * nice to not have to have this, but NSS doesn't provide any API addressing
>> + * the ciphers by name. TODO: do we want more of the ciphers, or perhaps less?
>> + */
>> +static const NSSCiphers NSS_CipherList[] = {
>> +
>> + {"TLS_NULL_WITH_NULL_NULL", TLS_NULL_WITH_NULL_NULL},
>
> Hm. Is this whole business of defining array constants in a header just
> done to avoid having a .c file that needs to be compiled both in
> frontend and backend code?
That was the original motivation, but I guess I should just bit the bullet and
make it a .c compiled in both frontend and backend?

>> +/*
>> + * The nspr/obsolete/protypes.h NSPR header typedefs uint64 and int64 with
>> + * colliding definitions from ours, causing a much expected compiler error.
>> + * The definitions are however not actually used in NSPR at all, and are only
>> + * intended for what seems to be backwards compatibility for apps written
>> + * against old versions of NSPR.  The following comment is in the referenced
>> + * file, and was added in 1998:
>> + *
>> + * This section typedefs the old 'native' types to the new PR<type>s.
>> + * These definitions are scheduled to be eliminated at the earliest
>> + * possible time. The NSPR API is implemented and documented using
>> + * the new definitions.
>> + *
>> + * As there is no opt-out from pulling in these typedefs, we define the guard
>> + * for the file to exclude it. This is incredibly ugly, but seems to be about
>> + * the only way around it.
>> + */
>
> There's a lot of duplicated comments here. Could we move either of the
> files to reference the other for longer ones?
I took a stab at this in the attached version.  The code is perhaps over-
commented in parts but I tried to encode my understanding of NSS into the
comments where documentation is lacking, since I assume I'm not the only one
who is new to NSS.  There might be a need to pare back to keep it focused in
case this patch goes futher.

>> +/*
>> + * PR_ImportTCPSocket() is a private API, but very widely used, as it's the
>> + * only way to make NSS use an already set up POSIX file descriptor rather
>> + * than opening one itself. To quote the NSS documentation:
>> + *
>> + * "In theory, code that uses PR_ImportTCPSocket may break when NSPR's
>> + * implementation changes. In practice, this is unlikely to happen because
>> + * NSPR's implementation has been stable for years and because of NSPR's
>> + * strong commitment to backward compatibility."
>> + *
>> + * https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSPR/Reference/PR_ImportTCPSocket
>> + *
>> + * The function is declared in <private/pprio.h>, but as it is a header marked
>> + * private we declare it here rather than including it.
>> + */
>> +NSPR_API(PRFileDesc *) PR_ImportTCPSocket(int);
>
> Ugh. This is really the way to do this? How do other applications deal
> with this problem?
They either #include <private/pprio.h> or they do it like this (or vendor NSPR
which makes calling private APIs less problematic).  It sure is ugly, but there
is no alternative to using this function.

>> +#if defined(WIN32)
>> +static const char *ca_trust_name = "nssckbi.dll";
>> +#elif defined(__darwin__)
>> +static const char *ca_trust_name = "libnssckbi.dylib";
>> +#else
>> +static const char *ca_trust_name = "libnssckbi.so";
>> +#endif
>
> There's really no pre-existing handling for this in nss???

NSS_Init does have more or less the above logic (see snippet below), but only
when there is a cert database defined.

    /*
     * The following code is an attempt to automagically find the external root
     * module.
     * Note: Keep the #if-defined chunks in order. HPUX must select before UNIX.
     */

    static const char *dllname =
    #if defined(XP_WIN32) || defined(XP_OS2)
        "nssckbi.dll";
    #elif defined(HPUX) && !defined(__ia64) /* HP-UX PA-RISC */
        "libnssckbi.sl";
    #elif defined(DARWIN)
        "libnssckbi.dylib";
    #elif defined(XP_UNIX) || defined(XP_BEOS)
        "libnssckbi.so";
    #else
    #error "Uh! Oh! I don't know about this platform."
    #endif

In the NSS_INIT_NOCERTDB case there is no such handling of the libname provided
by NSS so we need to do that ourselves.

>> + /*
>> + * The original design of NSS was for a single application to use a single
>> + * copy of it, initialized with NSS_Initialize() which isn't returning any
>> + * handle with which to refer to NSS. NSS initialization and shutdown are
>> + * global for the application, so a shutdown in another NSS enabled
>> + * library would cause NSS to be stopped for libpq as well.  The fix has
>> + * been to introduce NSS_InitContext which returns a context handle to
>> + * pass to NSS_ShutdownContext.  NSS_InitContext was introduced in NSS
>> + * 3.12, but the use of it is not very well documented.
>> + * https://bugzilla.redhat.com/show_bug.cgi?id=738456
>> + *
>> + * The InitParameters struct passed can be used to override internal
>> + * values in NSS, but the usage is not documented at all. When using
>> + * NSS_Init initializations, the values are instead set via PK11_Configure
>> + * calls so the PK11_Configure documentation can be used to glean some
>> + * details on these.
>> + *
>> + * https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSS/PKCS11/Module_Specs
>
>> +
>> + if (!nss_context)
>> + {
>> + char   *err = pg_SSLerrmessage(PR_GetError());
>> +
>> + printfPQExpBuffer(&conn->errorMessage,
>> +  libpq_gettext("unable to %s certificate database: %s"),
>> +  conn->cert_database ? "open" : "create",
>> +  err);
>> + free(err);
>> + return PGRES_POLLING_FAILED;
>> + }
>> +
>> + /*
>> + * Configure cipher policy.
>> + */
>> + status = NSS_SetDomesticPolicy();
>
> Why is "domestic" the right thing here?
Historically there are three cipher policies in NSS: Domestic, Export and
France.  These would enable a set of ciphers based on US export restrictions
(domest/export) or French import restrictions.  All ciphers would start
disabled and then the ciphers belonging to the chosen set would be enabled.
Long ago, that was however removed and they now all get enabled by calling
either of these three functions.  NSS_SetDomesticPolicy enables all implemented
ciphers, and the other calls just call NSS_SetDomesticPolicy, I guess that API
was kept for backwards compatibility.  The below bugzilla entry has a bit more
information on this:

   https://bugzilla.mozilla.org/show_bug.cgi?id=848384

That being said, the comment in the code did not reflect that, so I've reworded
it hoping it will be clearer now.

>> +
>> + PK11_SetPasswordFunc(PQssl_passwd_cb);
>
> Is it actually OK to do stuff like this when other users of NSS might be
> present? That's obviously more likely in the libpq case, compared to the
> backend case (where it's also possible, of course). What prevents us
> from overriding another user's callback?

The password callback pointer is stored in a static variable in NSS (in the
file lib/pk11wrap/pk11auth.c).

>> +ssize_t
>> +pgtls_read(PGconn *conn, void *ptr, size_t len)
>> +{
>> + PRInt32 nread;
>> + PRErrorCode status;
>> + int read_errno = 0;
>> +
>> + nread = PR_Recv(conn->pr_fd, ptr, len, 0, PR_INTERVAL_NO_WAIT);
>> +
>> + /*
>> + * PR_Recv blocks until there is data to read or the timeout expires. Zero
>> + * is returned for closed connections, while -1 indicates an error within
>> + * the ongoing connection.
>> + */
>> + if (nread == 0)
>> + {
>> + read_errno = ECONNRESET;
>> + return -1;
>> + }
>
> It's a bit confusing to talk about blocking when the socket presumably
> is in non-blocking mode, and you're also asking to never wait?
Fair enough, I can agree that the wording isn't spot on. The socket is
non-blocking while PR_Recv can block (which is what we ask it not to).  I've
reworded and moved the comment around to hopefully make it clearer.

>> + if (nread == -1)
>> + {
>> + status = PR_GetError();
>> +
>> + switch (status)
>> + {
>> + case PR_WOULD_BLOCK_ERROR:
>> + read_errno = EINTR;
>> + break;
>
> Uh, isn't this going to cause a busy-loop by the caller? EINTR isn't the
> same as EAGAIN/EWOULDBLOCK?
Right, that's clearly not right.

>> + case PR_IO_TIMEOUT_ERROR:
>> + break;
>
> What does this mean? We'll return with a 0 errno here, right? When is
> this case reachable?

It should, AFAICT, only be reachable when PR_Recv is used with a timeout which
we don't do.  It mentioned somewhere that it had happened in no-wait calls due
to a bug, but I fail to find that reference now.  Either way, I've removed it
to fall into the default error handling which now sets errno correctly as that
was a paddle short here.

> E.g. the comment in fe-misc.c:
> /* pqsecure_read set the error message for us */
> for this case doesn't seem to be fulfilled by this.

Fixed, I hope.

>> +/*
>> + * Verify that the server certificate matches the hostname we connected to.
>> + *
>> + * The certificate's Common Name and Subject Alternative Names are considered.
>> + */
>> +int
>> +pgtls_verify_peer_name_matches_certificate_guts(PGconn *conn,
>> + int *names_examined,
>> + char **first_name)
>> +{
>> + return 1;
>> +}
>
> Uh, huh? Certainly doesn't verify anything...
Doh, the verification was done as part of the cert validation callback and I
had missed moving it to the stub.  Fixed and also expanded to closer match how
it's done in the OpenSSL implementation.

>> +/* ------------------------------------------------------------ */
>> +/* PostgreSQL specific TLS support functions */
>> +/* ------------------------------------------------------------ */
>> +
>> +/*
>> + * TODO: this a 99% copy of the same function in the backend, make these share
>> + * a single implementation instead.
>> + */
>> +static char *
>> +pg_SSLerrmessage(PRErrorCode errcode)
>> +{
>> + const char *error;
>> +
>> + error = PR_ErrorToName(errcode);
>> + if (error)
>> + return strdup(error);
>> +
>> + return strdup("unknown TLS error");
>> +}
>
> Btw, why does this need to duplicate strings, instead of returning a
> const char*?
No, it doesn't, and no longer does.

The attached includes fixes for the above mentioned issues (and a few small
other ones I stumbled across), hopefully without introducing too many new.  As
mentioned, I'll perform the split into multiple patches in a separate version
which only performs a split to make it easier to diff the individual patchfile
versions.

cheers ./daniel


0001-Support-for-NSS-as-a-TLS-backend-v13.patch (282K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Support for NSS as a libpq TLS backend

Heikki Linnakangas
On 27/10/2020 22:07, Daniel Gustafsson wrote:
> /*
>  * Track whether the NSS database has a password set or not. There is no API
>  * function for retrieving password status, so we simply flip this to true in
>  * case NSS invoked the password callback - as that will only happen in case
>  * there is a password. The reason for tracking this is that there are calls
>  * which require a password parameter, but doesn't use the callbacks provided,
>  * so we must call the callback on behalf of these.
>  */
> static bool has_password = false;

This is set in PQssl_passwd_cb function, but never reset. That seems
wrong. The NSS database used in one connection might have a password,
while another one might not. Or have I completely misunderstood this?

- Heikki


Reply | Threaded
Open this post in threaded view
|

Re: Support for NSS as a libpq TLS backend

Andres Freund
In reply to this post by Daniel Gustafsson
Hi,

On 2020-10-27 21:07:01 +0100, Daniel Gustafsson wrote:

> > On 2020-10-20 14:24:24 +0200, Daniel Gustafsson wrote:
> >> From 0cb0e6a0ce9adb18bc9d212bd03e4e09fa452972 Mon Sep 17 00:00:00 2001
> >> From: Daniel Gustafsson <[hidden email]>
> >> Date: Thu, 8 Oct 2020 18:44:28 +0200
> >> Subject: [PATCH] Support for NSS as a TLS backend v12
> >> ---
> >> configure                                     |  223 +++-
> >> configure.ac                                  |   39 +-
> >> contrib/Makefile                              |    2 +-
> >> contrib/pgcrypto/Makefile                     |    5 +
> >> contrib/pgcrypto/nss.c                        |  773 +++++++++++
> >> contrib/pgcrypto/openssl.c                    |    2 +-
> >> contrib/pgcrypto/px.c                         |    1 +
> >> contrib/pgcrypto/px.h                         |    1 +
> >
> > Personally I'd like to see this patch broken up a bit - it's quite
> > large. Several of the changes could easily be committed separately, no?
>
> Not sure how much of this makes sense committed separately (unless separately
> means in quick succession), but it could certainly be broken up for the sake of
> making review easier.

Committing e.g. the pgcrypto pieces separately from the backend code
seems unproblematic. But yes, I would expect them to go in close to each
other. I'm mainly concerned with smaller review-able units.

Have you done testing to ensure that NSS PG cooperates correctly with
openssl PG? Is there a way we can make that easier to do? E.g. allowing
to build frontend with NSS and backend with openssl and vice versa?


> >> if test "$with_openssl" = yes ; then
> >> +  if test x"$with_nss" = x"yes" ; then
> >> +    AC_MSG_ERROR([multiple SSL backends cannot be enabled simultaneously"])
> >> +  fi
> >
> > Based on a quick look there's no similar error check for the msvc
> > build. Should there be?
>
> Thats a good question.  When embarking on this is seemed quite natural to me
> that it should be, but now I'm not so sure.  Maybe there should be a
> --with-openssl-preferred like how we handle readline/libedit or just allow
> multiple and let the last one win?  Do you have any input on what would make
> sense?
>
> The only thing I think makes no sense is to allow multiple ones at the same
> time given the current autoconf switches, even if it would just be to pick say
> pg_strong_random from one and libpq TLS from another.

Maybe we should just have --with-ssl={openssl,nss}? That'd avoid needing
to check for errors.

Even better, of course, would be to allow switching of the SSL backend
based on config options (PGC_POSTMASTER GUC for backend, connection
string for frontend). Mainly because that would make testing of
interoperability so much easier.  Obviously still a few places like
pgcrypto, randomness, etc, where only a compile time decision seems to
make sense.


> >> +  CLEANLDFLAGS="$LDFLAGS"
> >> +  # TODO: document this set of LDFLAGS
> >> +  LDFLAGS="-lssl3 -lsmime3 -lnss3 -lplds4 -lplc4 -lnspr4 $LDFLAGS"
> >
> > Shouldn't this use nss-config or such?
>
> Indeed it should, where available.  I've added rudimentary support for that
> without a fallback as of now.

When would we need a fallback?


> > I think it'd also be better if we could include these files as nss/ssl.h
> > etc - ssl.h is a name way too likely to conflict imo.
>
> I've changed this to be nss/ssl.h and nspr/nspr.h etc, but the include path
> will still need the direct path to the headers (from autoconf) since nss.h
> includes NSPR headers as #include <nspr.h> and so on.

Hm. Then it's probably not worth going there...


> >> +static SECStatus
> >> +pg_cert_auth_handler(void *arg, PRFileDesc * fd, PRBool checksig, PRBool isServer)
> >> +{
> >> + SECStatus status;
> >> + Port   *port = (Port *) arg;
> >> + CERTCertificate *cert;
> >> + char   *peer_cn;
> >> + int len;
> >> +
> >> + status = SSL_AuthCertificate(CERT_GetDefaultCertDB(), port->pr_fd, checksig, PR_TRUE);
> >> + if (status == SECSuccess)
> >> + {
> >> + cert = SSL_PeerCertificate(port->pr_fd);
> >> + len = strlen(cert->subjectName);
> >> + peer_cn = MemoryContextAllocZero(TopMemoryContext, len + 1);
> >> + if (strncmp(cert->subjectName, "CN=", 3) == 0)
> >> + strlcpy(peer_cn, cert->subjectName + strlen("CN="), len + 1);
> >> + else
> >> + strlcpy(peer_cn, cert->subjectName, len + 1);
> >> + CERT_DestroyCertificate(cert);
> >> +
> >> + port->peer_cn = peer_cn;
> >> + port->peer_cert_valid = true;
> >
> > Hm. We either should have something similar to
> >
> > /*
> > * Reject embedded NULLs in certificate common name to prevent
> > * attacks like CVE-2009-4034.
> > */
> > if (len != strlen(peer_cn))
> > {
> > ereport(COMMERROR,
> > (errcode(ERRCODE_PROTOCOL_VIOLATION),
> > errmsg("SSL certificate's common name contains embedded null")));
> > pfree(peer_cn);
> > return -1;
> > }
> > here, or a comment explaining why not.
>
> We should, but it's proving rather difficult as there is no equivalent API call
> to get the string as well as the expected length of it.

Hm. Should at least have a test to ensure that's not a problem then. I
hope/assume NSS rejects this somewhere internally...


> > Also, what's up with the CN= bit? Why is that needed here, but not for
> > openssl?
>
> OpenSSL returns only the value portion, whereas NSS returns key=value so we
> need to skip over the key= part.

Why is it a conditional path though?




> >> +/*
> >> + * PR_ImportTCPSocket() is a private API, but very widely used, as it's the
> >> + * only way to make NSS use an already set up POSIX file descriptor rather
> >> + * than opening one itself. To quote the NSS documentation:
> >> + *
> >> + * "In theory, code that uses PR_ImportTCPSocket may break when NSPR's
> >> + * implementation changes. In practice, this is unlikely to happen because
> >> + * NSPR's implementation has been stable for years and because of NSPR's
> >> + * strong commitment to backward compatibility."
> >> + *
> >> + * https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSPR/Reference/PR_ImportTCPSocket
> >> + *
> >> + * The function is declared in <private/pprio.h>, but as it is a header marked
> >> + * private we declare it here rather than including it.
> >> + */
> >> +NSPR_API(PRFileDesc *) PR_ImportTCPSocket(int);
> >
> > Ugh. This is really the way to do this? How do other applications deal
> > with this problem?
>
> They either #include <private/pprio.h> or they do it like this (or vendor NSPR
> which makes calling private APIs less problematic).  It sure is ugly, but there
> is no alternative to using this function.

Hm - in debian unstable's NSS this function appears to be in nss/ssl.h,
not pprio.h:

/*
** Imports fd into SSL, returning a new socket.  Copies SSL configuration
** from model.
*/
SSL_IMPORT PRFileDesc *SSL_ImportFD(PRFileDesc *model, PRFileDesc *fd);

and ssl.h starts with:
/*
 * This file contains prototypes for the public SSL functions.


> >> +
> >> + PK11_SetPasswordFunc(PQssl_passwd_cb);
> >
> > Is it actually OK to do stuff like this when other users of NSS might be
> > present? That's obviously more likely in the libpq case, compared to the
> > backend case (where it's also possible, of course). What prevents us
> > from overriding another user's callback?
>
> The password callback pointer is stored in a static variable in NSS (in the
> file lib/pk11wrap/pk11auth.c).

But, uh, how is that not a problem? What happens if a backend imports
libpq? What if plpython imports curl which then also uses nss?


> + /*
> + * Finally we must configure the socket for being a server by setting the
> + * certificate and key.
> + */
> + status = SSL_ConfigSecureServer(model, server_cert, private_key, kt_rsa);
> + if (status != SECSuccess)
> + ereport(ERROR,
> + (errmsg("unable to configure secure server: %s",
> + pg_SSLerrmessage(PR_GetError()))));
> + status = SSL_ConfigServerCert(model, server_cert, private_key, NULL, 0);
> + if (status != SECSuccess)
> + ereport(ERROR,
> + (errmsg("unable to configure server for TLS server connections: %s",
> + pg_SSLerrmessage(PR_GetError()))));

Why do both of these need to get called? The NSS docs say:

/*
** Deprecated variant of SSL_ConfigServerCert.
**
...
SSL_IMPORT SECStatus SSL_ConfigSecureServer(
    PRFileDesc *fd, CERTCertificate *cert,
    SECKEYPrivateKey *key, SSLKEAType kea);


Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: Support for NSS as a libpq TLS backend

Daniel Gustafsson
>>> Personally I'd like to see this patch broken up a bit - it's quite
>>> large. Several of the changes could easily be committed separately, no?
>>
>> Not sure how much of this makes sense committed separately (unless separately
>> means in quick succession), but it could certainly be broken up for the sake of
>> making review easier.
>
> Committing e.g. the pgcrypto pieces separately from the backend code
> seems unproblematic. But yes, I would expect them to go in close to each
> other. I'm mainly concerned with smaller review-able units.

Attached is a v14 where the logical units are separated into individual
commits.  I hope this split makes it easier to read.

The 0006 commit were things not really related to NSS at all that can be
submitted to -hackers independently of this work, but they're still there since
this version wasn't supposed to change anything.

Most of the changes to sslinfo in 0005 are really only needed in case OpenSSL
isn't the only TLS library, but I would argue that they should be considered
regardless.  There we are still accessing the ->ssl member directly and passing
it to OpenSSL rather than using the be_tls_* API that we have.  I can extract
that portion as a separate patch submission unless there are objections.

cheers ./daniel


v14-0001-NSS-Frontend-Backend-and-build-infra.patch (139K) Download Attachment
v14-0002-NSS-Testharness-updates.patch (63K) Download Attachment
v14-0003-NSS-pg_strong_random-support.patch (7K) Download Attachment
v14-0004-NSS-Documentation.patch (21K) Download Attachment
v14-0005-NSS-contrib-modules.patch (50K) Download Attachment
v14-0006-NSS-to-be-submitted-separately.patch (3K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Support for NSS as a libpq TLS backend

Daniel Gustafsson
In reply to this post by Andres Freund
> On 28 Oct 2020, at 07:39, Andres Freund <[hidden email]> wrote:

> Have you done testing to ensure that NSS PG cooperates correctly with
> openssl PG? Is there a way we can make that easier to do? E.g. allowing
> to build frontend with NSS and backend with openssl and vice versa?

When I wrote the Secure Transport patch I had a patch against PostgresNode
which allowed for overriding the server binaries like so:

   SSLTEST_SERVER_BIN=/path/bin/ make -C src/test/ssl/ check

I've used that coupled with manual testing so far to make sure that an openssl
client can talk to an NSS backend and so on.  Before any other backend is added
we clearly need *a* way of doing this, one which no doubt will need to be
improved upon to suit more workflows.

This is sort of the same situation as pg_upgrade, where two trees is needed to
really test it.

I can clean that patch up and post as a starting point for discussions.

>>>> if test "$with_openssl" = yes ; then
>>>> +  if test x"$with_nss" = x"yes" ; then
>>>> +    AC_MSG_ERROR([multiple SSL backends cannot be enabled simultaneously"])
>>>> +  fi
>>>
>>> Based on a quick look there's no similar error check for the msvc
>>> build. Should there be?
>>
>> Thats a good question.  When embarking on this is seemed quite natural to me
>> that it should be, but now I'm not so sure.  Maybe there should be a
>> --with-openssl-preferred like how we handle readline/libedit or just allow
>> multiple and let the last one win?  Do you have any input on what would make
>> sense?
>>
>> The only thing I think makes no sense is to allow multiple ones at the same
>> time given the current autoconf switches, even if it would just be to pick say
>> pg_strong_random from one and libpq TLS from another.
>
> Maybe we should just have --with-ssl={openssl,nss}? That'd avoid needing
> to check for errors.
Thats another option, with --with-openssl being an alias for --with-ssl=openssl.

After another round of thinking I like this even better as it makes the build
infra cleaner, so the attached patch has this implemented.

> Even better, of course, would be to allow switching of the SSL backend
> based on config options (PGC_POSTMASTER GUC for backend, connection
> string for frontend). Mainly because that would make testing of
> interoperability so much easier.  Obviously still a few places like
> pgcrypto, randomness, etc, where only a compile time decision seems to
> make sense.

It would make testing easier, but the expense seems potentially rather high.
How would a GUC switch be allowed to operate, would we have mixed backends or
would be require all openssl connectins to be dropped before serving nss ones?

>>>> +  CLEANLDFLAGS="$LDFLAGS"
>>>> +  # TODO: document this set of LDFLAGS
>>>> +  LDFLAGS="-lssl3 -lsmime3 -lnss3 -lplds4 -lplc4 -lnspr4 $LDFLAGS"
>>>
>>> Shouldn't this use nss-config or such?
>>
>> Indeed it should, where available.  I've added rudimentary support for that
>> without a fallback as of now.
>
> When would we need a fallback?
One one of my boxes I have NSS/NSPR installed via homebrew and they don't ship
an nss-config AFAICT. I wouldn't be surprised if there are other cases.

>>> I think it'd also be better if we could include these files as nss/ssl.h
>>> etc - ssl.h is a name way too likely to conflict imo.
>>
>> I've changed this to be nss/ssl.h and nspr/nspr.h etc, but the include path
>> will still need the direct path to the headers (from autoconf) since nss.h
>> includes NSPR headers as #include <nspr.h> and so on.
>
> Hm. Then it's probably not worth going there...

It does however make visual parsing of the source files easer since it's clear
which ssl.h is being referred to.  I'm in favor of keeping it.

>>>> +static SECStatus
>>>> +pg_cert_auth_handler(void *arg, PRFileDesc * fd, PRBool checksig, PRBool isServer)
>>>> +{
>>>> + SECStatus status;
>>>> + Port   *port = (Port *) arg;
>>>> + CERTCertificate *cert;
>>>> + char   *peer_cn;
>>>> + int len;
>>>> +
>>>> + status = SSL_AuthCertificate(CERT_GetDefaultCertDB(), port->pr_fd, checksig, PR_TRUE);
>>>> + if (status == SECSuccess)
>>>> + {
>>>> + cert = SSL_PeerCertificate(port->pr_fd);
>>>> + len = strlen(cert->subjectName);
>>>> + peer_cn = MemoryContextAllocZero(TopMemoryContext, len + 1);
>>>> + if (strncmp(cert->subjectName, "CN=", 3) == 0)
>>>> + strlcpy(peer_cn, cert->subjectName + strlen("CN="), len + 1);
>>>> + else
>>>> + strlcpy(peer_cn, cert->subjectName, len + 1);
>>>> + CERT_DestroyCertificate(cert);
>>>> +
>>>> + port->peer_cn = peer_cn;
>>>> + port->peer_cert_valid = true;
>>>
>>> Hm. We either should have something similar to
>>>
>>> /*
>>> * Reject embedded NULLs in certificate common name to prevent
>>> * attacks like CVE-2009-4034.
>>> */
>>> if (len != strlen(peer_cn))
>>> {
>>> ereport(COMMERROR,
>>> (errcode(ERRCODE_PROTOCOL_VIOLATION),
>>> errmsg("SSL certificate's common name contains embedded null")));
>>> pfree(peer_cn);
>>> return -1;
>>> }
>>> here, or a comment explaining why not.
>>
>> We should, but it's proving rather difficult as there is no equivalent API call
>> to get the string as well as the expected length of it.
>
> Hm. Should at least have a test to ensure that's not a problem then. I
> hope/assume NSS rejects this somewhere internally...
Agreed, I'll try to hack up a testcase.

>>> Also, what's up with the CN= bit? Why is that needed here, but not for
>>> openssl?
>>
>> OpenSSL returns only the value portion, whereas NSS returns key=value so we
>> need to skip over the key= part.
>
> Why is it a conditional path though?

It was mostly just a belts-and-suspenders thing, I don't have any hard evidence
that it's been a thing in any modern NSS version so it can be removed.

>>>> +/*
>>>> + * PR_ImportTCPSocket() is a private API, but very widely used, as it's the
>>>> + * only way to make NSS use an already set up POSIX file descriptor rather
>>>> + * than opening one itself. To quote the NSS documentation:
>>>> + *
>>>> + * "In theory, code that uses PR_ImportTCPSocket may break when NSPR's
>>>> + * implementation changes. In practice, this is unlikely to happen because
>>>> + * NSPR's implementation has been stable for years and because of NSPR's
>>>> + * strong commitment to backward compatibility."
>>>> + *
>>>> + * https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSPR/Reference/PR_ImportTCPSocket
>>>> + *
>>>> + * The function is declared in <private/pprio.h>, but as it is a header marked
>>>> + * private we declare it here rather than including it.
>>>> + */
>>>> +NSPR_API(PRFileDesc *) PR_ImportTCPSocket(int);
>>>
>>> Ugh. This is really the way to do this? How do other applications deal
>>> with this problem?
>>
>> They either #include <private/pprio.h> or they do it like this (or vendor NSPR
>> which makes calling private APIs less problematic).  It sure is ugly, but there
>> is no alternative to using this function.
>
> Hm - in debian unstable's NSS this function appears to be in nss/ssl.h,
> not pprio.h:
>
> /*
> ** Imports fd into SSL, returning a new socket.  Copies SSL configuration
> ** from model.
> */
> SSL_IMPORT PRFileDesc *SSL_ImportFD(PRFileDesc *model, PRFileDesc *fd);
>
> and ssl.h starts with:
> /*
> * This file contains prototypes for the public SSL functions.
Right, but that's Import*FD*, not Import*TCPSocket*.  We use ImportFD as well
since it's the API for importing an NSPR socket into NSS and enabling SSL/TLS
on it.  Thats been a public API for a long time.  ImportTCPSocket is used to
import an already opened socket into NSPR, else NSPR must open the socket
itself.  That part has been kept private for reasons unknown, as it's
incredibly useful.

>>>> + PK11_SetPasswordFunc(PQssl_passwd_cb);
>>>
>>> Is it actually OK to do stuff like this when other users of NSS might be
>>> present? That's obviously more likely in the libpq case, compared to the
>>> backend case (where it's also possible, of course). What prevents us
>>> from overriding another user's callback?
>>
>> The password callback pointer is stored in a static variable in NSS (in the
>> file lib/pk11wrap/pk11auth.c).
>
> But, uh, how is that not a problem? What happens if a backend imports
> libpq? What if plpython imports curl which then also uses nss?
Sorry, that sentence wasn't really finished.  What I meant to write was that I
don't really have good answers here.  The available implementation is via the
static var, and there are no alternative APIs.  I've tried googling for
insights but haven't come across any.

The only datapoint I have is that I can't recall there ever being a complaint
against libcurl doing this exact thing.  That of course doesn't mean it cannot
happen or cause problems.

>> + /*
>> + * Finally we must configure the socket for being a server by setting the
>> + * certificate and key.
>> + */
>> + status = SSL_ConfigSecureServer(model, server_cert, private_key, kt_rsa);
>> + if (status != SECSuccess)
>> + ereport(ERROR,
>> + (errmsg("unable to configure secure server: %s",
>> + pg_SSLerrmessage(PR_GetError()))));
>> + status = SSL_ConfigServerCert(model, server_cert, private_key, NULL, 0);
>> + if (status != SECSuccess)
>> + ereport(ERROR,
>> + (errmsg("unable to configure server for TLS server connections: %s",
>> + pg_SSLerrmessage(PR_GetError()))));
>
> Why do both of these need to get called? The NSS docs say:
>
> /*
> ** Deprecated variant of SSL_ConfigServerCert.
> **
> ...
> SSL_IMPORT SECStatus SSL_ConfigSecureServer(
>    PRFileDesc *fd, CERTCertificate *ce rt,
>    SECKEYPrivateKey *key, SSLKEAType kea);
They don't, I had missed the deprecation warning as it's not mentioned at all
in the online documentation:

https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSS/SSL_functions/sslfnc.html

(SSL_ConfigServerCert isn't at all mentioned there which dates it to before
this went it obsoleting SSL_ConfigSecureServer.)

Fixed by removing the superfluous call.

Thanks again for reviewing!

cheers ./daniel


v15-0001-NSS-Frontend-Backend-and-build-infra.patch (145K) Download Attachment
v15-0002-NSS-Testharness-updates.patch (65K) Download Attachment
v15-0003-NSS-pg_strong_random-support.patch (5K) Download Attachment
v15-0004-NSS-Documentation.patch (21K) Download Attachment
v15-0005-NSS-contrib-modules.patch (50K) Download Attachment
v15-0006-NSS-to-be-submitted-separately.patch (3K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Support for NSS as a libpq TLS backend

Andrew Dunstan

On 10/29/20 11:20 AM, Daniel Gustafsson wrote:

>> On 28 Oct 2020, at 07:39, Andres Freund <[hidden email]> wrote:
>> Have you done testing to ensure that NSS PG cooperates correctly with
>> openssl PG? Is there a way we can make that easier to do? E.g. allowing
>> to build frontend with NSS and backend with openssl and vice versa?
> When I wrote the Secure Transport patch I had a patch against PostgresNode
> which allowed for overriding the server binaries like so:
>
>    SSLTEST_SERVER_BIN=/path/bin/ make -C src/test/ssl/ check
>
> I've used that coupled with manual testing so far to make sure that an openssl
> client can talk to an NSS backend and so on.  Before any other backend is added
> we clearly need *a* way of doing this, one which no doubt will need to be
> improved upon to suit more workflows.
>
> This is sort of the same situation as pg_upgrade, where two trees is needed to
> really test it.
>
> I can clean that patch up and post as a starting point for discussions.
>
>>>>> if test "$with_openssl" = yes ; then
>>>>> +  if test x"$with_nss" = x"yes" ; then
>>>>> +    AC_MSG_ERROR([multiple SSL backends cannot be enabled simultaneously"])
>>>>> +  fi
>>>> Based on a quick look there's no similar error check for the msvc
>>>> build. Should there be?
>>> Thats a good question.  When embarking on this is seemed quite natural to me
>>> that it should be, but now I'm not so sure.  Maybe there should be a
>>> --with-openssl-preferred like how we handle readline/libedit or just allow
>>> multiple and let the last one win?  Do you have any input on what would make
>>> sense?
>>>
>>> The only thing I think makes no sense is to allow multiple ones at the same
>>> time given the current autoconf switches, even if it would just be to pick say
>>> pg_strong_random from one and libpq TLS from another.
>> Maybe we should just have --with-ssl={openssl,nss}? That'd avoid needing
>> to check for errors.
> Thats another option, with --with-openssl being an alias for --with-ssl=openssl.
>
> After another round of thinking I like this even better as it makes the build
> infra cleaner, so the attached patch has this implemented.
>
>> Even better, of course, would be to allow switching of the SSL backend
>> based on config options (PGC_POSTMASTER GUC for backend, connection
>> string for frontend). Mainly because that would make testing of
>> interoperability so much easier.  Obviously still a few places like
>> pgcrypto, randomness, etc, where only a compile time decision seems to
>> make sense.
> It would make testing easier, but the expense seems potentially rather high.
> How would a GUC switch be allowed to operate, would we have mixed backends or
> would be require all openssl connectins to be dropped before serving nss ones?
>
>>>>> +  CLEANLDFLAGS="$LDFLAGS"
>>>>> +  # TODO: document this set of LDFLAGS
>>>>> +  LDFLAGS="-lssl3 -lsmime3 -lnss3 -lplds4 -lplc4 -lnspr4 $LDFLAGS"
>>>> Shouldn't this use nss-config or such?
>>> Indeed it should, where available.  I've added rudimentary support for that
>>> without a fallback as of now.
>> When would we need a fallback?
> One one of my boxes I have NSS/NSPR installed via homebrew and they don't ship
> an nss-config AFAICT. I wouldn't be surprised if there are other cases.
>
>>>> I think it'd also be better if we could include these files as nss/ssl.h
>>>> etc - ssl.h is a name way too likely to conflict imo.
>>> I've changed this to be nss/ssl.h and nspr/nspr.h etc, but the include path
>>> will still need the direct path to the headers (from autoconf) since nss.h
>>> includes NSPR headers as #include <nspr.h> and so on.
>> Hm. Then it's probably not worth going there...
> It does however make visual parsing of the source files easer since it's clear
> which ssl.h is being referred to.  I'm in favor of keeping it.
>
>>>>> +static SECStatus
>>>>> +pg_cert_auth_handler(void *arg, PRFileDesc * fd, PRBool checksig, PRBool isServer)
>>>>> +{
>>>>> + SECStatus status;
>>>>> + Port   *port = (Port *) arg;
>>>>> + CERTCertificate *cert;
>>>>> + char   *peer_cn;
>>>>> + int len;
>>>>> +
>>>>> + status = SSL_AuthCertificate(CERT_GetDefaultCertDB(), port->pr_fd, checksig, PR_TRUE);
>>>>> + if (status == SECSuccess)
>>>>> + {
>>>>> + cert = SSL_PeerCertificate(port->pr_fd);
>>>>> + len = strlen(cert->subjectName);
>>>>> + peer_cn = MemoryContextAllocZero(TopMemoryContext, len + 1);
>>>>> + if (strncmp(cert->subjectName, "CN=", 3) == 0)
>>>>> + strlcpy(peer_cn, cert->subjectName + strlen("CN="), len + 1);
>>>>> + else
>>>>> + strlcpy(peer_cn, cert->subjectName, len + 1);
>>>>> + CERT_DestroyCertificate(cert);
>>>>> +
>>>>> + port->peer_cn = peer_cn;
>>>>> + port->peer_cert_valid = true;
>>>> Hm. We either should have something similar to
>>>>
>>>> /*
>>>> * Reject embedded NULLs in certificate common name to prevent
>>>> * attacks like CVE-2009-4034.
>>>> */
>>>> if (len != strlen(peer_cn))
>>>> {
>>>> ereport(COMMERROR,
>>>> (errcode(ERRCODE_PROTOCOL_VIOLATION),
>>>> errmsg("SSL certificate's common name contains embedded null")));
>>>> pfree(peer_cn);
>>>> return -1;
>>>> }
>>>> here, or a comment explaining why not.
>>> We should, but it's proving rather difficult as there is no equivalent API call
>>> to get the string as well as the expected length of it.
>> Hm. Should at least have a test to ensure that's not a problem then. I
>> hope/assume NSS rejects this somewhere internally...
> Agreed, I'll try to hack up a testcase.
>
>>>> Also, what's up with the CN= bit? Why is that needed here, but not for
>>>> openssl?
>>> OpenSSL returns only the value portion, whereas NSS returns key=value so we
>>> need to skip over the key= part.
>> Why is it a conditional path though?
> It was mostly just a belts-and-suspenders thing, I don't have any hard evidence
> that it's been a thing in any modern NSS version so it can be removed.
>
>>>>> +/*
>>>>> + * PR_ImportTCPSocket() is a private API, but very widely used, as it's the
>>>>> + * only way to make NSS use an already set up POSIX file descriptor rather
>>>>> + * than opening one itself. To quote the NSS documentation:
>>>>> + *
>>>>> + * "In theory, code that uses PR_ImportTCPSocket may break when NSPR's
>>>>> + * implementation changes. In practice, this is unlikely to happen because
>>>>> + * NSPR's implementation has been stable for years and because of NSPR's
>>>>> + * strong commitment to backward compatibility."
>>>>> + *
>>>>> + * https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSPR/Reference/PR_ImportTCPSocket
>>>>> + *
>>>>> + * The function is declared in <private/pprio.h>, but as it is a header marked
>>>>> + * private we declare it here rather than including it.
>>>>> + */
>>>>> +NSPR_API(PRFileDesc *) PR_ImportTCPSocket(int);
>>>> Ugh. This is really the way to do this? How do other applications deal
>>>> with this problem?
>>> They either #include <private/pprio.h> or they do it like this (or vendor NSPR
>>> which makes calling private APIs less problematic).  It sure is ugly, but there
>>> is no alternative to using this function.
>> Hm - in debian unstable's NSS this function appears to be in nss/ssl.h,
>> not pprio.h:
>>
>> /*
>> ** Imports fd into SSL, returning a new socket.  Copies SSL configuration
>> ** from model.
>> */
>> SSL_IMPORT PRFileDesc *SSL_ImportFD(PRFileDesc *model, PRFileDesc *fd);
>>
>> and ssl.h starts with:
>> /*
>> * This file contains prototypes for the public SSL functions.
> Right, but that's Import*FD*, not Import*TCPSocket*.  We use ImportFD as well
> since it's the API for importing an NSPR socket into NSS and enabling SSL/TLS
> on it.  Thats been a public API for a long time.  ImportTCPSocket is used to
> import an already opened socket into NSPR, else NSPR must open the socket
> itself.  That part has been kept private for reasons unknown, as it's
> incredibly useful.
>
>>>>> + PK11_SetPasswordFunc(PQssl_passwd_cb);
>>>> Is it actually OK to do stuff like this when other users of NSS might be
>>>> present? That's obviously more likely in the libpq case, compared to the
>>>> backend case (where it's also possible, of course). What prevents us
>>>> from overriding another user's callback?
>>> The password callback pointer is stored in a static variable in NSS (in the
>>> file lib/pk11wrap/pk11auth.c).
>> But, uh, how is that not a problem? What happens if a backend imports
>> libpq? What if plpython imports curl which then also uses nss?
> Sorry, that sentence wasn't really finished.  What I meant to write was that I
> don't really have good answers here.  The available implementation is via the
> static var, and there are no alternative APIs.  I've tried googling for
> insights but haven't come across any.
>
> The only datapoint I have is that I can't recall there ever being a complaint
> against libcurl doing this exact thing.  That of course doesn't mean it cannot
> happen or cause problems.
>
>>> + /*
>>> + * Finally we must configure the socket for being a server by setting the
>>> + * certificate and key.
>>> + */
>>> + status = SSL_ConfigSecureServer(model, server_cert, private_key, kt_rsa);
>>> + if (status != SECSuccess)
>>> + ereport(ERROR,
>>> + (errmsg("unable to configure secure server: %s",
>>> + pg_SSLerrmessage(PR_GetError()))));
>>> + status = SSL_ConfigServerCert(model, server_cert, private_key, NULL, 0);
>>> + if (status != SECSuccess)
>>> + ereport(ERROR,
>>> + (errmsg("unable to configure server for TLS server connections: %s",
>>> + pg_SSLerrmessage(PR_GetError()))));
>> Why do both of these need to get called? The NSS docs say:
>>
>> /*
>> ** Deprecated variant of SSL_ConfigServerCert.
>> **
>> ...
>> SSL_IMPORT SECStatus SSL_ConfigSecureServer(
>>    PRFileDesc *fd, CERTCertificate *ce rt,
>>    SECKEYPrivateKey *key, SSLKEAType kea);
> They don't, I had missed the deprecation warning as it's not mentioned at all
> in the online documentation:
>
> https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSS/SSL_functions/sslfnc.html
>
> (SSL_ConfigServerCert isn't at all mentioned there which dates it to before
> this went it obsoleting SSL_ConfigSecureServer.)
>
> Fixed by removing the superfluous call.
>



I've been looking through the new patch set, in particular the testing
setup.

The way it seems to proceed is to use the existing openssl generated
certificates and imports them into NSS certificate databases. That seems
fine to bootstrap testing, but it seems to me it would be more sound not
to rely on openssl at all. I'd rather see the Makefile containing
commands to create these from scratch, which mirror the openssl
variants. IOW you should be able to build and test this from scratch,
including certificate generation, without having openssl installed at all.

I also notice that the invocations to pk12util don't contain the "sql:"
prefix to the -d option, even though the database was created with that
prefix a few lines above. That seems like a mistake from my reading of
the pk12util man page.


cheers


andrew






Reply | Threaded
Open this post in threaded view
|

Re: Support for NSS as a libpq TLS backend

Daniel Gustafsson
> On 1 Nov 2020, at 14:13, Andrew Dunstan <[hidden email]> wrote:

> I've been looking through the new patch set, in particular the testing
> setup.

Thanks!

> The way it seems to proceed is to use the existing openssl generated
> certificates and imports them into NSS certificate databases. That seems
> fine to bootstrap testing,

That's pretty much why I opted for using the existing certs: to bootstrap the
patch and ensure OpenSSL-backend compatibility.

> but it seems to me it would be more sound not
> to rely on openssl at all. I'd rather see the Makefile containing
> commands to create these from scratch, which mirror the openssl
> variants. IOW you should be able to build and test this from scratch,
> including certificate generation, without having openssl installed at all.

I don't disagree with this, but I do also believe there is value in testing all
TLS backends with exactly the same certificates to act as a baseline.  The
nssfiles target should definitely be able to generate from scratch, but maybe a
combination is the best option?

Being well versed in the buildfarm code, do you have an off-the-cuff idea on
how to do cross library testing such that OpenSSL/NSS compatibility can be
ensured?  Andres was floating the idea of making a single sourcetree be able to
have both for testing but more discussion is needed to settle on a way forward.

> I also notice that the invocations to pk12util don't contain the "sql:"
> prefix to the -d option, even though the database was created with that
> prefix a few lines above. That seems like a mistake from my reading of
> the pk12util man page.

Fixed in the attached v16, which also drops the parts of the patchset which
have been submitted separately to -hackers (the sslinfo patch hunks are still
there are they are required).

cheers ./daniel


v16-0001-NSS-Frontend-Backend-and-build-infra.patch (145K) Download Attachment
v16-0002-NSS-Testharness-updates.patch (65K) Download Attachment
v16-0003-NSS-pg_strong_random-support.patch (5K) Download Attachment
v16-0004-NSS-Documentation.patch (21K) Download Attachment
v16-0005-NSS-contrib-modules.patch (49K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Support for NSS as a libpq TLS backend

Andrew Dunstan

On 11/1/20 5:04 PM, Daniel Gustafsson wrote:

>> On 1 Nov 2020, at 14:13, Andrew Dunstan <[hidden email]> wrote:
>> I've been looking through the new patch set, in particular the testing
>> setup.
> Thanks!
>
>> The way it seems to proceed is to use the existing openssl generated
>> certificates and imports them into NSS certificate databases. That seems
>> fine to bootstrap testing,
> That's pretty much why I opted for using the existing certs: to bootstrap the
> patch and ensure OpenSSL-backend compatibility.
>
>> but it seems to me it would be more sound not
>> to rely on openssl at all. I'd rather see the Makefile containing
>> commands to create these from scratch, which mirror the openssl
>> variants. IOW you should be able to build and test this from scratch,
>> including certificate generation, without having openssl installed at all.
> I don't disagree with this, but I do also believe there is value in testing all
> TLS backends with exactly the same certificates to act as a baseline.  The
> nssfiles target should definitely be able to generate from scratch, but maybe a
> combination is the best option?


Yeah. I certainly think we need something that should how we would
generate them from scratch using nss. That said, the importation code is
also useful.



>
> Being well versed in the buildfarm code, do you have an off-the-cuff idea onIU
> how to do cross library testing such that OpenSSL/NSS compatibility can be
> ensured?  Andres was floating the idea of making a single sourcetree be able to
> have both for testing but more discussion is needed to settle on a way forward.


Well, I'd probably try to leverage the knowledge we have in doing
cross-version upgrade testing. It works like this: After the
install-check-C stage each branch saves its binaries and data files in a
special location, adjusting things like library locations to match. then
to test that version it uses that against all the older versions
similarly saved.


We could generalize that saving mechanism and do it if any module
required it. But instead of testing against a different branch, we'd
test against a different animal. So we'd have two animals, one building
with openssl and one with nss, and they would test against each other
(i.e. one as the client and one as the sever, and vice versa).


This would involve a deal of work on my part, but it's very doable, I
believe.


We'd need a way to run tests where we could specify the client and
server binary locations.


Anyway, those are my thoughts. Comments welcome.



cheers


andrew


--
Andrew Dunstan
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Reply | Threaded
Open this post in threaded view
|

Re: Support for NSS as a libpq TLS backend

Daniel Gustafsson
In reply to this post by Heikki Linnakangas
> On 27 Oct 2020, at 21:18, Heikki Linnakangas <[hidden email]> wrote:

>
> On 27/10/2020 22:07, Daniel Gustafsson wrote:
>> /*
>> * Track whether the NSS database has a password set or not. There is no API
>> * function for retrieving password status, so we simply flip this to true in
>> * case NSS invoked the password callback - as that will only happen in case
>> * there is a password. The reason for tracking this is that there are calls
>> * which require a password parameter, but doesn't use the callbacks provided,
>> * so we must call the callback on behalf of these.
>> */
>> static bool has_password = false;
>
> This is set in PQssl_passwd_cb function, but never reset. That seems wrong. The NSS database used in one connection might have a password, while another one might not. Or have I completely misunderstood this?
(sorry for slow response).  You are absolutely right, the has_password flag
must be tracked per connection in PGconn.  The attached v17 implements this as
well a frontend bugfix which caused dropped connections and some smaller fixups
to make strings more translateable.

I've also included a WIP version of SCRAM channel binding in the attached
patch, it's currently failing to connect but someone here might spot the bug
before I do so I figured it's better to include it.

The 0005 patch is now, thanks to the sslinfo patch going in on master, only
containing NSS specific code.

cheers ./daniel


v17-0001-NSS-Frontend-Backend-and-build-infrastructure.patch (153K) Download Attachment
v17-0002-NSS-Testharness-updates.patch (65K) Download Attachment
v17-0003-NSS-pg_strong_random-support.patch (5K) Download Attachment
v17-0004-NSS-Documentation.patch (21K) Download Attachment
v17-0005-NSS-contrib-modules.patch (42K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Support for NSS as a libpq TLS backend

Daniel Gustafsson
In reply to this post by Andrew Dunstan
> On 2 Nov 2020, at 15:17, Andrew Dunstan <[hidden email]> wrote:

> We could generalize that saving mechanism and do it if any module
> required it. But instead of testing against a different branch, we'd
> test against a different animal. So we'd have two animals, one building
> with openssl and one with nss, and they would test against each other
> (i.e. one as the client and one as the sever, and vice versa).

That seems like a very good plan.  It would also allow us to test a backend
compiled with OpenSSL 1.0.2 against a frontend with OpenSSL 1.1.1 which might
come in handy when OpenSSL 3.0.0 lands.

> This would involve a deal of work on my part, but it's very doable, I
> believe.

I have no experience with the buildfarm code, but I'm happy to help if theres
anything I can do.

cheers ./daniel


Reply | Threaded
Open this post in threaded view
|

Re: Support for NSS as a libpq TLS backend

Jacob Champion-2
In reply to this post by Daniel Gustafsson
On Nov 4, 2020, at 5:09 AM, Daniel Gustafsson <[hidden email]> wrote:

> (sorry for slow response).  You are absolutely right, the has_password flag
> must be tracked per connection in PGconn.  The attached v17 implements this as
> well a frontend bugfix which caused dropped connections and some smaller fixups
> to make strings more translateable.

Some initial notes from building and testing on macOS Mojave. I'm working with
both a brew-packaged NSS/NSPR (which includes basic nss-/nspr-config) and a
hand-built NSS/NSPR (which does not).

1. In configure.ac:

> +  LDFLAGS="$LDFLAGS $NSS_LIBS $NSPR_LIBS"
> +  CFLAGS="$CFLAGS $NSS_CFLAGS $NSPR_CFLAGS"
> +
> +  AC_CHECK_LIB(nss3, SSL_VersionRangeSet, [], [AC_MSG_ERROR([library 'nss3' is required for NSS])])

Looks like SSL_VersionRangeSet is part of libssl3, not libnss3. So this fails
with the hand-built stack, where there is no nss-config to populate LDFLAGS. I
changed the function to NSS_InitContext and that seems to work nicely.

2. Among the things to eventually think about when it comes to configuring, it
looks like some platforms [1] install the headers under <nspr4/...> and
<nss3/...> instead of <nspr/...> and <nss/...>. It's unfortunate that the NSS
maintainers never chose an official installation layout.

3. I need two more `#define NO_NSPR_10_SUPPORT` guards added in both

  src/include/common/pg_nss.h
  src/port/pg_strong_random.c

before the tree will compile for me. Both of those files include NSS headers.

4. be_tls_init() refuses to run correctly for me; I end up getting an NSPR
assertion that looks like

  sslMutex_Init not implemented for multi-process applications !

With assertions disabled, this ends up showing a somewhat unhelpful

  FATAL:  unable to set up TLS connection cache: security library failure. (SEC_ERROR_LIBRARY_FAILURE)

It looks like cross-process locking isn't actually enabled on macOS, which is a
long-standing bug in NSPR [2, 3]. So calls to SSL_ConfigMPServerSIDCache()
error out.

--Jacob

[1] https://github.com/erthink/ReOpenLDAP/issues/112
[2] https://bugzilla.mozilla.org/show_bug.cgi?id=538680
[3] https://bugzilla.mozilla.org/show_bug.cgi?id=1192500



Reply | Threaded
Open this post in threaded view
|

Re: Support for NSS as a libpq TLS backend

Daniel Gustafsson
> On 6 Nov 2020, at 21:37, Jacob Champion <[hidden email]> wrote:

> Some initial notes from building and testing on macOS Mojave. I'm working with
> both a brew-packaged NSS/NSPR (which includes basic nss-/nspr-config) and a
> hand-built NSS/NSPR (which does not).

Thanks for looking!

> 1. In configure.ac:
>
>> +  LDFLAGS="$LDFLAGS $NSS_LIBS $NSPR_LIBS"
>> +  CFLAGS="$CFLAGS $NSS_CFLAGS $NSPR_CFLAGS"
>> +
>> +  AC_CHECK_LIB(nss3, SSL_VersionRangeSet, [], [AC_MSG_ERROR([library 'nss3' is required for NSS])])
>
> Looks like SSL_VersionRangeSet is part of libssl3, not libnss3. So this fails
> with the hand-built stack, where there is no nss-config to populate LDFLAGS. I
> changed the function to NSS_InitContext and that seems to work nicely.
Ah yes, fixed.

> 2. Among the things to eventually think about when it comes to configuring, it
> looks like some platforms [1] install the headers under <nspr4/...> and
> <nss3/...> instead of <nspr/...> and <nss/...>. It's unfortunate that the NSS
> maintainers never chose an official installation layout.

Yeah, maybe we need to start with the most common path and have fallbacks in
case not found?

> 3. I need two more `#define NO_NSPR_10_SUPPORT` guards added in both
>
>  src/include/common/pg_nss.h
>  src/port/pg_strong_random.c
>
> before the tree will compile for me. Both of those files include NSS headers.

Odd that I was able to compile on Linux, but I've added these.

> 4. be_tls_init() refuses to run correctly for me; I end up getting an NSPR
> assertion that looks like
>
>  sslMutex_Init not implemented for multi-process applications !
>
> With assertions disabled, this ends up showing a somewhat unhelpful
>
>  FATAL:  unable to set up TLS connection cache: security library failure. (SEC_ERROR_LIBRARY_FAILURE)
>
> It looks like cross-process locking isn't actually enabled on macOS, which is a
> long-standing bug in NSPR [2, 3]. So calls to SSL_ConfigMPServerSIDCache()
> error out.
Thats unfortunate since the session cache is required for a server application
backed by NSS.  The attached switches to SSL_ConfigServerSessionIDCacheWithOpt
with which one can explicitly make the cache non-shared, which in turn backs
the mutexes with NSPR locks rather than the missing sem_init.  Can you test
this version and see if that makes it work?

This version also contains a channel binding bug that Heikki pointed out off-
list (sadly not The bug) and a few very minor cleanups as well as a rebase to
handle the new pg_strong_random_init.  Actually performing the context init
there is yet a TODO, but I wanted a version out that at all compiled.

cheers ./daniel


v18-0001-NSS-Frontend-Backend-and-build-infrastructure.patch (151K) Download Attachment
v18-0002-NSS-Testharness-updates.patch (65K) Download Attachment
v18-0003-NSS-pg_strong_random-support.patch (6K) Download Attachment
v18-0004-NSS-Documentation.patch (21K) Download Attachment
v18-0005-NSS-contrib-modules.patch (42K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Support for NSS as a libpq TLS backend

Jacob Champion-2
On Nov 6, 2020, at 3:11 PM, Daniel Gustafsson <[hidden email]> wrote:
>
> The attached switches to SSL_ConfigServerSessionIDCacheWithOpt
> with which one can explicitly make the cache non-shared, which in turn backs
> the mutexes with NSPR locks rather than the missing sem_init.  Can you test
> this version and see if that makes it work?

Yep, I get much farther through the tests with that patch. I'm currently
diving into another assertion failure during socket disconnection:

    Assertion failure: fd->secret == NULL, at prlayer.c:45

cURL has some ominously vague references to this [1], though I'm not
sure that we should work around it in the same way without knowing what
the cause is...

--Jacob

[1] https://github.com/curl/curl/blob/4d2f800/lib/vtls/nss.c#L1266



Reply | Threaded
Open this post in threaded view
|

Re: Support for NSS as a libpq TLS backend

Daniel Gustafsson
> On 10 Nov 2020, at 21:11, Jacob Champion <[hidden email]> wrote:
> On Nov 6, 2020, at 3:11 PM, Daniel Gustafsson <[hidden email]> wrote:

>> The attached switches to SSL_ConfigServerSessionIDCacheWithOpt
>> with which one can explicitly make the cache non-shared, which in turn backs
>> the mutexes with NSPR locks rather than the missing sem_init.  Can you test
>> this version and see if that makes it work?
>
> Yep, I get much farther through the tests with that patch.

Great, thanks for confirming.

> I'm currently
> diving into another assertion failure during socket disconnection:
>
>    Assertion failure: fd->secret == NULL, at prlayer.c:45
>
> cURL has some ominously vague references to this [1], though I'm not
> sure that we should work around it in the same way without knowing what
> the cause is...

Digging through the archives from when this landed in curl, the assertion
failure was never fully identified back then but happened spuriously.  Which
version of NSPR is this happening with?

cheers ./daniel

Reply | Threaded
Open this post in threaded view
|

Re: Support for NSS as a libpq TLS backend

Jacob Champion-2
On Nov 10, 2020, at 2:28 PM, Daniel Gustafsson <[hidden email]> wrote:
>
> Digging through the archives from when this landed in curl, the assertion
> failure was never fully identified back then but happened spuriously.  Which
> version of NSPR is this happening with?

This is NSPR 4.29, with debugging enabled. The fd that causes the
assertion is the custom layer that's added during be_tls_open_server(),
which connects a Port as the layer secret. It looks like NSPR is trying
to help surface potential memory leaks by asserting if the secret is
non-NULL at the time the stack is being closed.

In this case, it doesn't matter since the Port lifetime is managed
elsewhere, but it looks easy enough to add a custom close in the way
that cURL and the NSPR test programs [1] do. Sample patch attached,
which gets me to the end of the tests without any assertions. (Two
failures left on my machine.)

--Jacob

[1] https://hg.mozilla.org/projects/nspr/file/bf6620c143/pr/tests/nblayer.c#l354

123