On Thu, Sep 17, 2020 at 11:41:28AM +0200, Daniel Gustafsson wrote:
> Attached is a v10 rebased to apply on top of HEAD. I am afraid that this needs a new rebase. The patch is failing to apply, per the CF bot. :/ -- Michael |
> On 29 Sep 2020, at 07:59, Michael Paquier <[hidden email]> wrote:
> > On Thu, Sep 17, 2020 at 11:41:28AM +0200, Daniel Gustafsson wrote: >> Attached is a v10 rebased to apply on top of HEAD. > > I am afraid that this needs a new rebase. The patch is failing to > apply, per the CF bot. :/ It's failing on binary diffs due to the NSS certificate databases being included to make hacking on the patch easier: File src/test/ssl/ssl/nss/server.crl: git binary diffs are not supported. This is a limitation of the CFBot patch tester, the text portions of the patch still applies with a tiny but of fuzz. cheers ./daniel |
> On 29 Sep 2020, at 09:52, Daniel Gustafsson <[hidden email]> wrote:
Attached is a new version which doesn't contain the NSS certificate databases
> >> On 29 Sep 2020, at 07:59, Michael Paquier <[hidden email]> wrote: >> >> On Thu, Sep 17, 2020 at 11:41:28AM +0200, Daniel Gustafsson wrote: >>> Attached is a v10 rebased to apply on top of HEAD. >> >> I am afraid that this needs a new rebase. The patch is failing to >> apply, per the CF bot. :/ > > It's failing on binary diffs due to the NSS certificate databases being > included to make hacking on the patch easier: > > File src/test/ssl/ssl/nss/server.crl: git binary diffs are not supported. > > This is a limitation of the CFBot patch tester, the text portions of the patch > still applies with a tiny but of fuzz. to keep the CFBot happy. It also implements server-side passphrase callbacks as well as re-enables the tests for those. The callback works a bit differently from the OpenSSL one as it must run in the forked process, so it can't run on server reload. There's also no default fallback reading from a TTY like in OpenSSL, so if no callback it set the always-failing dummy is set. cheers ./daniel |
The attached v12 adds support for pgcrypto as well as pg_strong_random, which I
believe completes the required subsystems where we have OpenSSL support today. I opted for not adding code to handle the internal shaXXX implementations until the dust settles around the proposal to change the API there. Blowfish is not supported by NSS AFAICT, even though the cipher mechanism is defined, so the internal implementation is used there instead. CAST5 is supported, but segfaults inside NSS on most inputs so support for that is not included for now. cheers ./daniel |
Hi,
On 2020-10-20 14:24:24 +0200, Daniel Gustafsson wrote: > From 0cb0e6a0ce9adb18bc9d212bd03e4e09fa452972 Mon Sep 17 00:00:00 2001 > From: Daniel Gustafsson <[hidden email]> > Date: Thu, 8 Oct 2020 18:44:28 +0200 > Subject: [PATCH] Support for NSS as a TLS backend v12 > --- > configure | 223 +++- > configure.ac | 39 +- > contrib/Makefile | 2 +- > contrib/pgcrypto/Makefile | 5 + > contrib/pgcrypto/nss.c | 773 +++++++++++ > contrib/pgcrypto/openssl.c | 2 +- > contrib/pgcrypto/px.c | 1 + > contrib/pgcrypto/px.h | 1 + Personally I'd like to see this patch broken up a bit - it's quite large. Several of the changes could easily be committed separately, no? > if test "$with_openssl" = yes ; then > + if test x"$with_nss" = x"yes" ; then > + AC_MSG_ERROR([multiple SSL backends cannot be enabled simultaneously"]) > + fi Based on a quick look there's no similar error check for the msvc build. Should there be? > > +if test "$with_nss" = yes ; then > + if test x"$with_openssl" = x"yes" ; then > + AC_MSG_ERROR([multiple SSL backends cannot be enabled simultaneously"]) > + fi Isn't this a repetition of the earlier check? > + CLEANLDFLAGS="$LDFLAGS" > + # TODO: document this set of LDFLAGS > + LDFLAGS="-lssl3 -lsmime3 -lnss3 -lplds4 -lplc4 -lnspr4 $LDFLAGS" Shouldn't this use nss-config or such? > +if test "$with_nss" = yes ; then > + AC_CHECK_HEADER(ssl.h, [], [AC_MSG_ERROR([header file <ssl.h> is required for NSS])]) > + AC_CHECK_HEADER(nss.h, [], [AC_MSG_ERROR([header file <nss.h> is required for NSS])]) > +fi Hm. For me, on debian, these headers are not directly in the default include search path, but would be as nss/ssl.h. I don't see you adding nss/ to CFLAGS anywhere? How does this work currently? I think it'd also be better if we could include these files as nss/ssl.h etc - ssl.h is a name way too likely to conflict imo. > +++ b/src/backend/libpq/be-secure-nss.c > @@ -0,0 +1,1158 @@ > +/* > + * BITS_PER_BYTE is also defined in the NSPR header files, so we need to undef > + * our version to avoid compiler warnings on redefinition. > + */ > +#define pg_BITS_PER_BYTE BITS_PER_BYTE > +#undef BITS_PER_BYTE Most compilers/preprocessors don't warn about redefinitions when they would result in the same value (IIRC we have some cases of redefinitions in tree even). Does nspr's differ? > +/* > + * The nspr/obsolete/protypes.h NSPR header typedefs uint64 and int64 with > + * colliding definitions from ours, causing a much expected compiler error. > + * The definitions are however not actually used in NSPR at all, and are only > + * intended for what seems to be backwards compatibility for apps written > + * against old versions of NSPR. The following comment is in the referenced > + * file, and was added in 1998: > + * > + * This section typedefs the old 'native' types to the new PR<type>s. > + * These definitions are scheduled to be eliminated at the earliest > + * possible time. The NSPR API is implemented and documented using > + * the new definitions. > + * > + * As there is no opt-out from pulling in these typedefs, we define the guard > + * for the file to exclude it. This is incredibly ugly, but seems to be about > + * the only way around it. > + */ > +#define PROTYPES_H > +#include <nspr.h> > +#undef PROTYPES_H Yuck :(. > +int > +be_tls_init(bool isServerStart) > +{ > + SECStatus status; > + SSLVersionRange supported_sslver; > + > + /* > + * Set up the connection cache for multi-processing application behavior. Hm. Do we necessarily want that? Session resumption is not exactly unproblematic... Or does this do something else? > + * If we are in ServerStart then we initialize the cache. If the server is > + * already started, we inherit the cache such that it can be used for > + * connections. Calling SSL_ConfigMPServerSIDCache sets an environment > + * variable which contains enough information for the forked child to know > + * how to access it. Passing NULL to SSL_InheritMPServerSIDCache will > + * make the forked child look it up by the default name SSL_INHERITANCE, > + * if env vars aren't inherited then the contents of the variable can be > + * passed instead. > + */ Does this stuff work on windows / EXEC_BACKEND? > + * The below parameters are what the implicit initialization would've done > + * for us, and should work even for older versions where it might not be > + * done automatically. The last parameter, maxPTDs, is set to various > + * values in other codebases, but has been unused since NSPR 2.1 which was > + * released sometime in 1998. > + */ > + PR_Init(PR_USER_THREAD, PR_PRIORITY_NORMAL, 0 /* maxPTDs */ ); https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSPR/Reference/PR_Init says that currently all parameters are ignored? > + /* > + * Import the already opened socket as we don't want to use NSPR functions > + * for opening the network socket due to how the PostgreSQL protocol works > + * with TLS connections. This function is not part of the NSPR public API, > + * see the comment at the top of the file for the rationale of still using > + * it. > + */ > + pr_fd = PR_ImportTCPSocket(port->sock); > + if (!pr_fd) > + ereport(ERROR, > + (errmsg("unable to connect to socket"))); I don't see the comment you're referring to? > + /* > + * Most of the documentation available, and implementations of, NSS/NSPR > + * use the PR_NewTCPSocket() function here, which has the drawback that it > + * can only create IPv4 sockets. Instead use PR_OpenTCPSocket() which > + * copes with IPv6 as well. > + */ > + model = PR_OpenTCPSocket(port->laddr.addr.ss_family); > + if (!model) > + ereport(ERROR, > + (errmsg("unable to open socket"))); > + > + /* > + * Convert the NSPR socket to an SSL socket. Ensuring the success of this > + * operation is critical as NSS SSL_* functions may return SECSuccess on > + * the socket even though SSL hasn't been enabled, which introduce a risk > + * of silent downgrades. > + */ > + model = SSL_ImportFD(NULL, model); > + if (!model) > + ereport(ERROR, > + (errmsg("unable to enable TLS on socket"))); It's confusing that these functions do not actually reference the socket via some handle :(. What does opening a socket do here? > + /* > + * Configure the allowed cipher. If there are no user preferred suites, *ciphers? > + > + port->pr_fd = SSL_ImportFD(model, pr_fd); > + if (!port->pr_fd) > + ereport(ERROR, > + (errmsg("unable to initialize"))); > + > + PR_Close(model); A comment explaining why we first import a NULL into the model, and then release the model, and import the real fd would be good. > +ssize_t > +be_tls_read(Port *port, void *ptr, size_t len, int *waitfor) > +{ > + ssize_t n_read; > + PRErrorCode err; > + > + n_read = PR_Read(port->pr_fd, ptr, len); > + > + if (n_read < 0) > + { > + err = PR_GetError(); Yay, more thread global state :(. > + /* XXX: This logic seems potentially bogus? */ > + if (err == PR_WOULD_BLOCK_ERROR) > + *waitfor = WL_SOCKET_READABLE; > + else > + *waitfor = WL_SOCKET_WRITEABLE; Don't we need to handle failed connections somewhere here? secure_read() won't know about PR_GetError() etc? How would SSL errors be signalled upwards here? Also, as you XXX, it's not clear to me that your mapping would always result in waiting for the right event? A tls write could e.g. very well require receiving data etc? > + /* > + * At least one byte with password content was returned, and NSS requires > + * that we return it allocated in NSS controlled memory. If we fail to > + * allocate then abort without passing back NULL and bubble up the error > + * on the PG side. > + */ > + password = (char *) PR_Malloc(len + 1); > + if (!password) > + ereport(ERROR, > + (errcode(ERRCODE_OUT_OF_MEMORY), > + errmsg("out of memory"))); > > + strlcpy(password, buf, sizeof(password)); > + explicit_bzero(buf, sizeof(buf)); > + In case of error you're not bzero'ing out the password! Separately, I wonder if we should introduce a function for throwing OOM errors - which then e.g. could print the memory context stats in those places too... > +static SECStatus > +pg_cert_auth_handler(void *arg, PRFileDesc * fd, PRBool checksig, PRBool isServer) > +{ > + SECStatus status; > + Port *port = (Port *) arg; > + CERTCertificate *cert; > + char *peer_cn; > + int len; > + > + status = SSL_AuthCertificate(CERT_GetDefaultCertDB(), port->pr_fd, checksig, PR_TRUE); > + if (status == SECSuccess) > + { > + cert = SSL_PeerCertificate(port->pr_fd); > + len = strlen(cert->subjectName); > + peer_cn = MemoryContextAllocZero(TopMemoryContext, len + 1); > + if (strncmp(cert->subjectName, "CN=", 3) == 0) > + strlcpy(peer_cn, cert->subjectName + strlen("CN="), len + 1); > + else > + strlcpy(peer_cn, cert->subjectName, len + 1); > + CERT_DestroyCertificate(cert); > + > + port->peer_cn = peer_cn; > + port->peer_cert_valid = true; Hm. We either should have something similar to /* * Reject embedded NULLs in certificate common name to prevent * attacks like CVE-2009-4034. */ if (len != strlen(peer_cn)) { ereport(COMMERROR, (errcode(ERRCODE_PROTOCOL_VIOLATION), errmsg("SSL certificate's common name contains embedded null"))); pfree(peer_cn); return -1; } here, or a comment explaining why not. Also, what's up with the CN= bit? Why is that needed here, but not for openssl? > +static PRFileDesc * > +init_iolayer(Port *port, int loglevel) > +{ > + const PRIOMethods *default_methods; > + PRFileDesc *layer; > + > + /* > + * Start by initializing our layer with all the default methods so that we > + * can selectively override the ones we want while still ensuring that we > + * have a complete layer specification. > + */ > + default_methods = PR_GetDefaultIOMethods(); > + memcpy(&pr_iomethods, default_methods, sizeof(PRIOMethods)); > + > + pr_iomethods.recv = pg_ssl_read; > + pr_iomethods.send = pg_ssl_write; > + > + /* > + * Each IO layer must be identified by a unique name, where uniqueness is > + * per connection. Each connection in a postgres cluster can generate the > + * identity from the same string as they will create their IO layers on > + * different sockets. Only one layer per socket can have the same name. > + */ > + pr_id = PR_GetUniqueIdentity("PostgreSQL"); Seems like it might not be a bad idea to append Server or something? > + > + /* > + * Create the actual IO layer as a stub such that it can be pushed onto > + * the layer stack. The step via a stub is required as we define custom > + * callbacks. > + */ > + layer = PR_CreateIOLayerStub(pr_id, &pr_iomethods); > + if (!layer) > + { > + ereport(loglevel, > + (errmsg("unable to create NSS I/O layer"))); > + return NULL; > + } Why is this accepting a variable log level? The only caller passes ERROR? > +/* > + * pg_SSLerrmessage > + * Create and return a human readable error message given > + * the specified error code > + * > + * PR_ErrorToName only converts the enum identifier of the error to string, > + * but that can be quite useful for debugging (and in case PR_ErrorToString is > + * unable to render a message then we at least have something). > + */ > +static char * > +pg_SSLerrmessage(PRErrorCode errcode) > +{ > + char error[128]; > + int ret; > + > + /* TODO: this should perhaps use a StringInfo instead.. */ > + ret = pg_snprintf(error, sizeof(error), "%s (%s)", > + PR_ErrorToString(errcode, PR_LANGUAGE_I_DEFAULT), > + PR_ErrorToName(errcode)); > + if (ret) > + return pstrdup(error); > + return pstrdup(_("unknown TLS error")); > +} Why not use psrintf() here? > +++ b/src/include/common/pg_nss.h > @@ -0,0 +1,141 @@ > +/*------------------------------------------------------------------------- > + * > + * pg_nss.h > + * Support for NSS as a TLS backend > + * > + * These definitions are used by both frontend and backend code. > + * > + * Copyright (c) 2020, PostgreSQL Global Development Group > + * > + * IDENTIFICATION > + * src/include/common/pg_nss.h > + * > + *------------------------------------------------------------------------- > + */ > +#ifndef PG_NSS_H > +#define PG_NSS_H > + > +#ifdef USE_NSS > + > +#include <sslproto.h> > + > +PRUint16 pg_find_cipher(char *name); > + > +typedef struct > +{ > + const char *name; > + PRUint16 number; > +} NSSCiphers; > + > +#define INVALID_CIPHER 0xFFFF > + > +/* > + * This list is a partial copy of the ciphers in NSS files lib/ssl/sslproto.h > + * in order to provide a human readable version of the ciphers. It would be > + * nice to not have to have this, but NSS doesn't provide any API addressing > + * the ciphers by name. TODO: do we want more of the ciphers, or perhaps less? > + */ > +static const NSSCiphers NSS_CipherList[] = { > + > + {"TLS_NULL_WITH_NULL_NULL", TLS_NULL_WITH_NULL_NULL}, Hm. Is this whole business of defining array constants in a header just done to avoid having a .c file that needs to be compiled both in frontend and backend code? > +/* > + * The nspr/obsolete/protypes.h NSPR header typedefs uint64 and int64 with > + * colliding definitions from ours, causing a much expected compiler error. > + * The definitions are however not actually used in NSPR at all, and are only > + * intended for what seems to be backwards compatibility for apps written > + * against old versions of NSPR. The following comment is in the referenced > + * file, and was added in 1998: > + * > + * This section typedefs the old 'native' types to the new PR<type>s. > + * These definitions are scheduled to be eliminated at the earliest > + * possible time. The NSPR API is implemented and documented using > + * the new definitions. > + * > + * As there is no opt-out from pulling in these typedefs, we define the guard > + * for the file to exclude it. This is incredibly ugly, but seems to be about > + * the only way around it. > + */ There's a lot of duplicated comments here. Could we move either of the files to reference the other for longer ones? > +/* > + * PR_ImportTCPSocket() is a private API, but very widely used, as it's the > + * only way to make NSS use an already set up POSIX file descriptor rather > + * than opening one itself. To quote the NSS documentation: > + * > + * "In theory, code that uses PR_ImportTCPSocket may break when NSPR's > + * implementation changes. In practice, this is unlikely to happen because > + * NSPR's implementation has been stable for years and because of NSPR's > + * strong commitment to backward compatibility." > + * > + * https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSPR/Reference/PR_ImportTCPSocket > + * > + * The function is declared in <private/pprio.h>, but as it is a header marked > + * private we declare it here rather than including it. > + */ > +NSPR_API(PRFileDesc *) PR_ImportTCPSocket(int); Ugh. This is really the way to do this? How do other applications deal with this problem? > +#if defined(WIN32) > +static const char *ca_trust_name = "nssckbi.dll"; > +#elif defined(__darwin__) > +static const char *ca_trust_name = "libnssckbi.dylib"; > +#else > +static const char *ca_trust_name = "libnssckbi.so"; > +#endif There's really no pre-existing handling for this in nss??? > + /* > + * The original design of NSS was for a single application to use a single > + * copy of it, initialized with NSS_Initialize() which isn't returning any > + * handle with which to refer to NSS. NSS initialization and shutdown are > + * global for the application, so a shutdown in another NSS enabled > + * library would cause NSS to be stopped for libpq as well. The fix has > + * been to introduce NSS_InitContext which returns a context handle to > + * pass to NSS_ShutdownContext. NSS_InitContext was introduced in NSS > + * 3.12, but the use of it is not very well documented. > + * https://bugzilla.redhat.com/show_bug.cgi?id=738456 > + * > + * The InitParameters struct passed can be used to override internal > + * values in NSS, but the usage is not documented at all. When using > + * NSS_Init initializations, the values are instead set via PK11_Configure > + * calls so the PK11_Configure documentation can be used to glean some > + * details on these. > + * > + * https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSS/PKCS11/Module_Specs > + > + if (!nss_context) > + { > + char *err = pg_SSLerrmessage(PR_GetError()); > + > + printfPQExpBuffer(&conn->errorMessage, > + libpq_gettext("unable to %s certificate database: %s"), > + conn->cert_database ? "open" : "create", > + err); > + free(err); > + return PGRES_POLLING_FAILED; > + } > + > + /* > + * Configure cipher policy. > + */ > + status = NSS_SetDomesticPolicy(); Why is "domestic" the right thing here? > + > + PK11_SetPasswordFunc(PQssl_passwd_cb); Is it actually OK to do stuff like this when other users of NSS might be present? That's obviously more likely in the libpq case, compared to the backend case (where it's also possible, of course). What prevents us from overriding another user's callback? > +ssize_t > +pgtls_read(PGconn *conn, void *ptr, size_t len) > +{ > + PRInt32 nread; > + PRErrorCode status; > + int read_errno = 0; > + > + nread = PR_Recv(conn->pr_fd, ptr, len, 0, PR_INTERVAL_NO_WAIT); > + > + /* > + * PR_Recv blocks until there is data to read or the timeout expires. Zero > + * is returned for closed connections, while -1 indicates an error within > + * the ongoing connection. > + */ > + if (nread == 0) > + { > + read_errno = ECONNRESET; > + return -1; > + } It's a bit confusing to talk about blocking when the socket presumably is in non-blocking mode, and you're also asking to never wait? > + if (nread == -1) > + { > + status = PR_GetError(); > + > + switch (status) > + { > + case PR_WOULD_BLOCK_ERROR: > + read_errno = EINTR; > + break; Uh, isn't this going to cause a busy-loop by the caller? EINTR isn't the same as EAGAIN/EWOULDBLOCK? > + case PR_IO_TIMEOUT_ERROR: > + break; What does this mean? We'll return with a 0 errno here, right? When is this case reachable? E.g. the comment in fe-misc.c: /* pqsecure_read set the error message for us */ for this case doesn't seem to be fulfilled by this. > +/* > + * Verify that the server certificate matches the hostname we connected to. > + * > + * The certificate's Common Name and Subject Alternative Names are considered. > + */ > +int > +pgtls_verify_peer_name_matches_certificate_guts(PGconn *conn, > + int *names_examined, > + char **first_name) > +{ > + return 1; > +} Uh, huh? Certainly doesn't verify anything... > +/* ------------------------------------------------------------ */ > +/* PostgreSQL specific TLS support functions */ > +/* ------------------------------------------------------------ */ > + > +/* > + * TODO: this a 99% copy of the same function in the backend, make these share > + * a single implementation instead. > + */ > +static char * > +pg_SSLerrmessage(PRErrorCode errcode) > +{ > + const char *error; > + > + error = PR_ErrorToName(errcode); > + if (error) > + return strdup(error); > + > + return strdup("unknown TLS error"); > +} Btw, why does this need to duplicate strings, instead of returning a const char*? Greetings, Andres Freund |
> On 20 Oct 2020, at 21:15, Andres Freund <[hidden email]> wrote:
> > Hi, Thanks for your review, much appreciated! > On 2020-10-20 14:24:24 +0200, Daniel Gustafsson wrote: >> From 0cb0e6a0ce9adb18bc9d212bd03e4e09fa452972 Mon Sep 17 00:00:00 2001 >> From: Daniel Gustafsson <[hidden email]> >> Date: Thu, 8 Oct 2020 18:44:28 +0200 >> Subject: [PATCH] Support for NSS as a TLS backend v12 >> --- >> configure | 223 +++- >> configure.ac | 39 +- >> contrib/Makefile | 2 +- >> contrib/pgcrypto/Makefile | 5 + >> contrib/pgcrypto/nss.c | 773 +++++++++++ >> contrib/pgcrypto/openssl.c | 2 +- >> contrib/pgcrypto/px.c | 1 + >> contrib/pgcrypto/px.h | 1 + > > Personally I'd like to see this patch broken up a bit - it's quite > large. Several of the changes could easily be committed separately, no? means in quick succession), but it could certainly be broken up for the sake of making review easier. I will take a stab at that, but in a follow-up email as I would like the split to be a version just doing the split and not also introducing/fixing things. >> if test "$with_openssl" = yes ; then >> + if test x"$with_nss" = x"yes" ; then >> + AC_MSG_ERROR([multiple SSL backends cannot be enabled simultaneously"]) >> + fi > > Based on a quick look there's no similar error check for the msvc > build. Should there be? Thats a good question. When embarking on this is seemed quite natural to me that it should be, but now I'm not so sure. Maybe there should be a --with-openssl-preferred like how we handle readline/libedit or just allow multiple and let the last one win? Do you have any input on what would make sense? The only thing I think makes no sense is to allow multiple ones at the same time given the current autoconf switches, even if it would just be to pick say pg_strong_random from one and libpq TLS from another. >> +if test "$with_nss" = yes ; then >> + if test x"$with_openssl" = x"yes" ; then >> + AC_MSG_ERROR([multiple SSL backends cannot be enabled simultaneously"]) >> + fi > > Isn't this a repetition of the earlier check? It is, and it we want to keep such a check it should be broken out into a separate step performed before all library specific checks IMO. >> + CLEANLDFLAGS="$LDFLAGS" >> + # TODO: document this set of LDFLAGS >> + LDFLAGS="-lssl3 -lsmime3 -lnss3 -lplds4 -lplc4 -lnspr4 $LDFLAGS" > > Shouldn't this use nss-config or such? Indeed it should, where available. I've added rudimentary support for that without a fallback as of now. >> +if test "$with_nss" = yes ; then >> + AC_CHECK_HEADER(ssl.h, [], [AC_MSG_ERROR([header file <ssl.h> is required for NSS])]) >> + AC_CHECK_HEADER(nss.h, [], [AC_MSG_ERROR([header file <nss.h> is required for NSS])]) >> +fi > > Hm. For me, on debian, these headers are not directly in the default > include search path, but would be as nss/ssl.h. I don't see you adding > nss/ to CFLAGS anywhere? How does this work currently? I had Stockholm-syndromed myself into passing --with-includes and hadn't really realized. Sometimes the obvious is too obvious in a 4000+ LOC patch. > I think it'd also be better if we could include these files as nss/ssl.h > etc - ssl.h is a name way too likely to conflict imo. I've changed this to be nss/ssl.h and nspr/nspr.h etc, but the include path will still need the direct path to the headers (from autoconf) since nss.h includes NSPR headers as #include <nspr.h> and so on. >> +++ b/src/backend/libpq/be-secure-nss.c >> @@ -0,0 +1,1158 @@ >> +/* >> + * BITS_PER_BYTE is also defined in the NSPR header files, so we need to undef >> + * our version to avoid compiler warnings on redefinition. >> + */ >> +#define pg_BITS_PER_BYTE BITS_PER_BYTE >> +#undef BITS_PER_BYTE > > Most compilers/preprocessors don't warn about redefinitions when they > would result in the same value (IIRC we have some cases of redefinitions > in tree even). Does nspr's differ? In file included from /usr/include/nspr/prtypes.h:26, from /usr/include/nspr/pratom.h:14, from /usr/include/nspr/nspr.h:9, from be-secure-nss.c:45: /usr/include/nspr/prcpucfg.h:1143: warning: "BITS_PER_BYTE" redefined #define BITS_PER_BYTE PR_BITS_PER_BYTE In file included from ../../../src/include/c.h:55, from ../../../src/include/postgres.h:46, from be-secure-nss.c:16: ../../../src/include/pg_config_manual.h:115: note: this is the location of the previous definition #define BITS_PER_BYTE 8 PR_BITS_PER_BYTE is defined per platform in pr/include/md/_<platform>.cfg and is as expected 8. I assume it's that indirection which cause the warning? >> +/* >> + * The nspr/obsolete/protypes.h NSPR header typedefs uint64 and int64 with >> + * colliding definitions from ours, causing a much expected compiler error. >> + * The definitions are however not actually used in NSPR at all, and are only >> + * intended for what seems to be backwards compatibility for apps written >> + * against old versions of NSPR. The following comment is in the referenced >> + * file, and was added in 1998: >> + * >> + * This section typedefs the old 'native' types to the new PR<type>s. >> + * These definitions are scheduled to be eliminated at the earliest >> + * possible time. The NSPR API is implemented and documented using >> + * the new definitions. >> + * >> + * As there is no opt-out from pulling in these typedefs, we define the guard >> + * for the file to exclude it. This is incredibly ugly, but seems to be about >> + * the only way around it. >> + */ >> +#define PROTYPES_H >> +#include <nspr.h> >> +#undef PROTYPES_H > > Yuck :(. however find a proper way to deal with this. Defining NO_NSPR_10_SUPPORT stops NSPR from using the files in obsolete/. So fixed, yay! >> +int >> +be_tls_init(bool isServerStart) >> +{ >> + SECStatus status; >> + SSLVersionRange supported_sslver; >> + >> + /* >> + * Set up the connection cache for multi-processing application behavior. > > Hm. Do we necessarily want that? Session resumption is not exactly > unproblematic... Or does this do something else? must set up a connection cache in order to accept connections. Not entirely sure, and the docs aren't terribly clear for non SSLv2/v3 environments (it seems to only cache for SSLv2/3 and not TLSv+) but it seems like it may have other uses internally. I will hunt down some more information on the NSS mailing list. >> + * If we are in ServerStart then we initialize the cache. If the server is >> + * already started, we inherit the cache such that it can be used for >> + * connections. Calling SSL_ConfigMPServerSIDCache sets an environment >> + * variable which contains enough information for the forked child to know >> + * how to access it. Passing NULL to SSL_InheritMPServerSIDCache will >> + * make the forked child look it up by the default name SSL_INHERITANCE, >> + * if env vars aren't inherited then the contents of the variable can be >> + * passed instead. >> + */ > > Does this stuff work on windows in an earlier version of the patch. I need to get a proper Windows env for testing/dev up and running as mine has bitrotted to nothingness. > / EXEC_BACKEND? That's a good point, maybe we need to do a SSL_ConfigServerSessionIDCache rather than the MP version for EXEC_BACKEND? Not sure. >> + * The below parameters are what the implicit initialization would've done >> + * for us, and should work even for older versions where it might not be >> + * done automatically. The last parameter, maxPTDs, is set to various >> + * values in other codebases, but has been unused since NSPR 2.1 which was >> + * released sometime in 1998. >> + */ >> + PR_Init(PR_USER_THREAD, PR_PRIORITY_NORMAL, 0 /* maxPTDs */ ); > > https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSPR/Reference/PR_Init > says that currently all parameters are ignored? one of them has been unused since RUN DMC topped the charts with "It's like that". Comment updated. >> + /* >> + * Import the already opened socket as we don't want to use NSPR functions >> + * for opening the network socket due to how the PostgreSQL protocol works >> + * with TLS connections. This function is not part of the NSPR public API, >> + * see the comment at the top of the file for the rationale of still using >> + * it. >> + */ >> + pr_fd = PR_ImportTCPSocket(port->sock); >> + if (!pr_fd) >> + ereport(ERROR, >> + (errmsg("unable to connect to socket"))); > > I don't see the comment you're referring to? call, yet still used by everyone (which is also discussed later in this review). >> + /* >> + * Most of the documentation available, and implementations of, NSS/NSPR >> + * use the PR_NewTCPSocket() function here, which has the drawback that it >> + * can only create IPv4 sockets. Instead use PR_OpenTCPSocket() which >> + * copes with IPv6 as well. >> + */ >> + model = PR_OpenTCPSocket(port->laddr.addr.ss_family); >> + if (!model) >> + ereport(ERROR, >> + (errmsg("unable to open socket"))); >> + >> + /* >> + * Convert the NSPR socket to an SSL socket. Ensuring the success of this >> + * operation is critical as NSS SSL_* functions may return SECSuccess on >> + * the socket even though SSL hasn't been enabled, which introduce a risk >> + * of silent downgrades. >> + */ >> + model = SSL_ImportFD(NULL, model); >> + if (!model) >> + ereport(ERROR, >> + (errmsg("unable to enable TLS on socket"))); > > It's confusing that these functions do not actually reference the socket > via some handle :(. What does opening a socket do here? capable socket which NSS will work with. This is a required step for "activating" NSS on the socket. >> + /* >> + * Configure the allowed cipher. If there are no user preferred suites, > > *ciphers? Yes, fixed. >> + >> + port->pr_fd = SSL_ImportFD(model, pr_fd); >> + if (!port->pr_fd) >> + ereport(ERROR, >> + (errmsg("unable to initialize"))); >> + >> + PR_Close(model); > > A comment explaining why we first import a NULL into the model, and then > release the model, and import the real fd would be good. for the actual socket. This part of NSS/NSPR is a bit overcomplicated for how we have connections, it's more geared towards having many open sockets in the same process. >> +ssize_t >> +be_tls_read(Port *port, void *ptr, size_t len, int *waitfor) >> +{ >> + ssize_t n_read; >> + PRErrorCode err; >> + >> + n_read = PR_Read(port->pr_fd, ptr, len); >> + >> + if (n_read < 0) >> + { >> + err = PR_GetError(); > > Yay, more thread global state :(. >> + /* XXX: This logic seems potentially bogus? */ >> + if (err == PR_WOULD_BLOCK_ERROR) >> + *waitfor = WL_SOCKET_READABLE; >> + else >> + *waitfor = WL_SOCKET_WRITEABLE; > > Don't we need to handle failed connections somewhere here? secure_read() > won't know about PR_GetError() etc? How would SSL errors be signalled > upwards here? > > Also, as you XXX, it's not clear to me that your mapping would always > result in waiting for the right event? A tls write could e.g. very well > require receiving data etc? >> + /* >> + * At least one byte with password content was returned, and NSS requires >> + * that we return it allocated in NSS controlled memory. If we fail to >> + * allocate then abort without passing back NULL and bubble up the error >> + * on the PG side. >> + */ >> + password = (char *) PR_Malloc(len + 1); >> + if (!password) >> + ereport(ERROR, >> + (errcode(ERRCODE_OUT_OF_MEMORY), >> + errmsg("out of memory"))); >> >> + strlcpy(password, buf, sizeof(password)); >> + explicit_bzero(buf, sizeof(buf)); >> + > > In case of error you're not bzero'ing out the password! > Separately, I wonder if we should introduce a function for throwing OOM > errors - which then e.g. could print the memory context stats in those > places too... +1. I'd be happy to review such a patch. >> +static SECStatus >> +pg_cert_auth_handler(void *arg, PRFileDesc * fd, PRBool checksig, PRBool isServer) >> +{ >> + SECStatus status; >> + Port *port = (Port *) arg; >> + CERTCertificate *cert; >> + char *peer_cn; >> + int len; >> + >> + status = SSL_AuthCertificate(CERT_GetDefaultCertDB(), port->pr_fd, checksig, PR_TRUE); >> + if (status == SECSuccess) >> + { >> + cert = SSL_PeerCertificate(port->pr_fd); >> + len = strlen(cert->subjectName); >> + peer_cn = MemoryContextAllocZero(TopMemoryContext, len + 1); >> + if (strncmp(cert->subjectName, "CN=", 3) == 0) >> + strlcpy(peer_cn, cert->subjectName + strlen("CN="), len + 1); >> + else >> + strlcpy(peer_cn, cert->subjectName, len + 1); >> + CERT_DestroyCertificate(cert); >> + >> + port->peer_cn = peer_cn; >> + port->peer_cert_valid = true; > > Hm. We either should have something similar to > > /* > * Reject embedded NULLs in certificate common name to prevent > * attacks like CVE-2009-4034. > */ > if (len != strlen(peer_cn)) > { > ereport(COMMERROR, > (errcode(ERRCODE_PROTOCOL_VIOLATION), > errmsg("SSL certificate's common name contains embedded null"))); > pfree(peer_cn); > return -1; > } > here, or a comment explaining why not. to get the string as well as the expected length of it. > Also, what's up with the CN= bit? Why is that needed here, but not for > openssl? OpenSSL returns only the value portion, whereas NSS returns key=value so we need to skip over the key= part. >> +static PRFileDesc * >> +init_iolayer(Port *port, int loglevel) >> +{ >> + const PRIOMethods *default_methods; >> + PRFileDesc *layer; >> + >> + /* >> + * Start by initializing our layer with all the default methods so that we >> + * can selectively override the ones we want while still ensuring that we >> + * have a complete layer specification. >> + */ >> + default_methods = PR_GetDefaultIOMethods(); >> + memcpy(&pr_iomethods, default_methods, sizeof(PRIOMethods)); >> + >> + pr_iomethods.recv = pg_ssl_read; >> + pr_iomethods.send = pg_ssl_write; >> + >> + /* >> + * Each IO layer must be identified by a unique name, where uniqueness is >> + * per connection. Each connection in a postgres cluster can generate the >> + * identity from the same string as they will create their IO layers on >> + * different sockets. Only one layer per socket can have the same name. >> + */ >> + pr_id = PR_GetUniqueIdentity("PostgreSQL"); > > Seems like it might not be a bad idea to append Server or something? >> + /* >> + * Create the actual IO layer as a stub such that it can be pushed onto >> + * the layer stack. The step via a stub is required as we define custom >> + * callbacks. >> + */ >> + layer = PR_CreateIOLayerStub(pr_id, &pr_iomethods); >> + if (!layer) >> + { >> + ereport(loglevel, >> + (errmsg("unable to create NSS I/O layer"))); >> + return NULL; >> + } > > Why is this accepting a variable log level? The only caller passes ERROR? sense. loglevel param removed. >> +/* >> + * pg_SSLerrmessage >> + * Create and return a human readable error message given >> + * the specified error code >> + * >> + * PR_ErrorToName only converts the enum identifier of the error to string, >> + * but that can be quite useful for debugging (and in case PR_ErrorToString is >> + * unable to render a message then we at least have something). >> + */ >> +static char * >> +pg_SSLerrmessage(PRErrorCode errcode) >> +{ >> + char error[128]; >> + int ret; >> + >> + /* TODO: this should perhaps use a StringInfo instead.. */ >> + ret = pg_snprintf(error, sizeof(error), "%s (%s)", >> + PR_ErrorToString(errcode, PR_LANGUAGE_I_DEFAULT), >> + PR_ErrorToName(errcode)); >> + if (ret) >> + return pstrdup(error); > >> + return pstrdup(_("unknown TLS error")); >> +} > > Why not use psrintf() here? just that. >> +++ b/src/include/common/pg_nss.h >> @@ -0,0 +1,141 @@ >> +/*------------------------------------------------------------------------- >> + * >> + * pg_nss.h >> + * Support for NSS as a TLS backend >> + * >> + * These definitions are used by both frontend and backend code. >> + * >> + * Copyright (c) 2020, PostgreSQL Global Development Group >> + * >> + * IDENTIFICATION >> + * src/include/common/pg_nss.h >> + * >> + *------------------------------------------------------------------------- >> + */ >> +#ifndef PG_NSS_H >> +#define PG_NSS_H >> + >> +#ifdef USE_NSS >> + >> +#include <sslproto.h> >> + >> +PRUint16 pg_find_cipher(char *name); >> + >> +typedef struct >> +{ >> + const char *name; >> + PRUint16 number; >> +} NSSCiphers; >> + >> +#define INVALID_CIPHER 0xFFFF >> + >> +/* >> + * This list is a partial copy of the ciphers in NSS files lib/ssl/sslproto.h >> + * in order to provide a human readable version of the ciphers. It would be >> + * nice to not have to have this, but NSS doesn't provide any API addressing >> + * the ciphers by name. TODO: do we want more of the ciphers, or perhaps less? >> + */ >> +static const NSSCiphers NSS_CipherList[] = { >> + >> + {"TLS_NULL_WITH_NULL_NULL", TLS_NULL_WITH_NULL_NULL}, > > Hm. Is this whole business of defining array constants in a header just > done to avoid having a .c file that needs to be compiled both in > frontend and backend code? make it a .c compiled in both frontend and backend? >> +/* >> + * The nspr/obsolete/protypes.h NSPR header typedefs uint64 and int64 with >> + * colliding definitions from ours, causing a much expected compiler error. >> + * The definitions are however not actually used in NSPR at all, and are only >> + * intended for what seems to be backwards compatibility for apps written >> + * against old versions of NSPR. The following comment is in the referenced >> + * file, and was added in 1998: >> + * >> + * This section typedefs the old 'native' types to the new PR<type>s. >> + * These definitions are scheduled to be eliminated at the earliest >> + * possible time. The NSPR API is implemented and documented using >> + * the new definitions. >> + * >> + * As there is no opt-out from pulling in these typedefs, we define the guard >> + * for the file to exclude it. This is incredibly ugly, but seems to be about >> + * the only way around it. >> + */ > > There's a lot of duplicated comments here. Could we move either of the > files to reference the other for longer ones? commented in parts but I tried to encode my understanding of NSS into the comments where documentation is lacking, since I assume I'm not the only one who is new to NSS. There might be a need to pare back to keep it focused in case this patch goes futher. >> +/* >> + * PR_ImportTCPSocket() is a private API, but very widely used, as it's the >> + * only way to make NSS use an already set up POSIX file descriptor rather >> + * than opening one itself. To quote the NSS documentation: >> + * >> + * "In theory, code that uses PR_ImportTCPSocket may break when NSPR's >> + * implementation changes. In practice, this is unlikely to happen because >> + * NSPR's implementation has been stable for years and because of NSPR's >> + * strong commitment to backward compatibility." >> + * >> + * https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSPR/Reference/PR_ImportTCPSocket >> + * >> + * The function is declared in <private/pprio.h>, but as it is a header marked >> + * private we declare it here rather than including it. >> + */ >> +NSPR_API(PRFileDesc *) PR_ImportTCPSocket(int); > > Ugh. This is really the way to do this? How do other applications deal > with this problem? which makes calling private APIs less problematic). It sure is ugly, but there is no alternative to using this function. >> +#if defined(WIN32) >> +static const char *ca_trust_name = "nssckbi.dll"; >> +#elif defined(__darwin__) >> +static const char *ca_trust_name = "libnssckbi.dylib"; >> +#else >> +static const char *ca_trust_name = "libnssckbi.so"; >> +#endif > > There's really no pre-existing handling for this in nss??? NSS_Init does have more or less the above logic (see snippet below), but only when there is a cert database defined. /* * The following code is an attempt to automagically find the external root * module. * Note: Keep the #if-defined chunks in order. HPUX must select before UNIX. */ static const char *dllname = #if defined(XP_WIN32) || defined(XP_OS2) "nssckbi.dll"; #elif defined(HPUX) && !defined(__ia64) /* HP-UX PA-RISC */ "libnssckbi.sl"; #elif defined(DARWIN) "libnssckbi.dylib"; #elif defined(XP_UNIX) || defined(XP_BEOS) "libnssckbi.so"; #else #error "Uh! Oh! I don't know about this platform." #endif In the NSS_INIT_NOCERTDB case there is no such handling of the libname provided by NSS so we need to do that ourselves. >> + /* >> + * The original design of NSS was for a single application to use a single >> + * copy of it, initialized with NSS_Initialize() which isn't returning any >> + * handle with which to refer to NSS. NSS initialization and shutdown are >> + * global for the application, so a shutdown in another NSS enabled >> + * library would cause NSS to be stopped for libpq as well. The fix has >> + * been to introduce NSS_InitContext which returns a context handle to >> + * pass to NSS_ShutdownContext. NSS_InitContext was introduced in NSS >> + * 3.12, but the use of it is not very well documented. >> + * https://bugzilla.redhat.com/show_bug.cgi?id=738456 >> + * >> + * The InitParameters struct passed can be used to override internal >> + * values in NSS, but the usage is not documented at all. When using >> + * NSS_Init initializations, the values are instead set via PK11_Configure >> + * calls so the PK11_Configure documentation can be used to glean some >> + * details on these. >> + * >> + * https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSS/PKCS11/Module_Specs > >> + >> + if (!nss_context) >> + { >> + char *err = pg_SSLerrmessage(PR_GetError()); >> + >> + printfPQExpBuffer(&conn->errorMessage, >> + libpq_gettext("unable to %s certificate database: %s"), >> + conn->cert_database ? "open" : "create", >> + err); >> + free(err); >> + return PGRES_POLLING_FAILED; >> + } >> + >> + /* >> + * Configure cipher policy. >> + */ >> + status = NSS_SetDomesticPolicy(); > > Why is "domestic" the right thing here? France. These would enable a set of ciphers based on US export restrictions (domest/export) or French import restrictions. All ciphers would start disabled and then the ciphers belonging to the chosen set would be enabled. Long ago, that was however removed and they now all get enabled by calling either of these three functions. NSS_SetDomesticPolicy enables all implemented ciphers, and the other calls just call NSS_SetDomesticPolicy, I guess that API was kept for backwards compatibility. The below bugzilla entry has a bit more information on this: https://bugzilla.mozilla.org/show_bug.cgi?id=848384 That being said, the comment in the code did not reflect that, so I've reworded it hoping it will be clearer now. >> + >> + PK11_SetPasswordFunc(PQssl_passwd_cb); > > Is it actually OK to do stuff like this when other users of NSS might be > present? That's obviously more likely in the libpq case, compared to the > backend case (where it's also possible, of course). What prevents us > from overriding another user's callback? The password callback pointer is stored in a static variable in NSS (in the file lib/pk11wrap/pk11auth.c). >> +ssize_t >> +pgtls_read(PGconn *conn, void *ptr, size_t len) >> +{ >> + PRInt32 nread; >> + PRErrorCode status; >> + int read_errno = 0; >> + >> + nread = PR_Recv(conn->pr_fd, ptr, len, 0, PR_INTERVAL_NO_WAIT); >> + >> + /* >> + * PR_Recv blocks until there is data to read or the timeout expires. Zero >> + * is returned for closed connections, while -1 indicates an error within >> + * the ongoing connection. >> + */ >> + if (nread == 0) >> + { >> + read_errno = ECONNRESET; >> + return -1; >> + } > > It's a bit confusing to talk about blocking when the socket presumably > is in non-blocking mode, and you're also asking to never wait? non-blocking while PR_Recv can block (which is what we ask it not to). I've reworded and moved the comment around to hopefully make it clearer. >> + if (nread == -1) >> + { >> + status = PR_GetError(); >> + >> + switch (status) >> + { >> + case PR_WOULD_BLOCK_ERROR: >> + read_errno = EINTR; >> + break; > > Uh, isn't this going to cause a busy-loop by the caller? EINTR isn't the > same as EAGAIN/EWOULDBLOCK? >> + case PR_IO_TIMEOUT_ERROR: >> + break; > > What does this mean? We'll return with a 0 errno here, right? When is > this case reachable? It should, AFAICT, only be reachable when PR_Recv is used with a timeout which we don't do. It mentioned somewhere that it had happened in no-wait calls due to a bug, but I fail to find that reference now. Either way, I've removed it to fall into the default error handling which now sets errno correctly as that was a paddle short here. > E.g. the comment in fe-misc.c: > /* pqsecure_read set the error message for us */ > for this case doesn't seem to be fulfilled by this. Fixed, I hope. >> +/* >> + * Verify that the server certificate matches the hostname we connected to. >> + * >> + * The certificate's Common Name and Subject Alternative Names are considered. >> + */ >> +int >> +pgtls_verify_peer_name_matches_certificate_guts(PGconn *conn, >> + int *names_examined, >> + char **first_name) >> +{ >> + return 1; >> +} > > Uh, huh? Certainly doesn't verify anything... had missed moving it to the stub. Fixed and also expanded to closer match how it's done in the OpenSSL implementation. >> +/* ------------------------------------------------------------ */ >> +/* PostgreSQL specific TLS support functions */ >> +/* ------------------------------------------------------------ */ >> + >> +/* >> + * TODO: this a 99% copy of the same function in the backend, make these share >> + * a single implementation instead. >> + */ >> +static char * >> +pg_SSLerrmessage(PRErrorCode errcode) >> +{ >> + const char *error; >> + >> + error = PR_ErrorToName(errcode); >> + if (error) >> + return strdup(error); >> + >> + return strdup("unknown TLS error"); >> +} > > Btw, why does this need to duplicate strings, instead of returning a > const char*? The attached includes fixes for the above mentioned issues (and a few small other ones I stumbled across), hopefully without introducing too many new. As mentioned, I'll perform the split into multiple patches in a separate version which only performs a split to make it easier to diff the individual patchfile versions. cheers ./daniel |
On 27/10/2020 22:07, Daniel Gustafsson wrote:
> /* > * Track whether the NSS database has a password set or not. There is no API > * function for retrieving password status, so we simply flip this to true in > * case NSS invoked the password callback - as that will only happen in case > * there is a password. The reason for tracking this is that there are calls > * which require a password parameter, but doesn't use the callbacks provided, > * so we must call the callback on behalf of these. > */ > static bool has_password = false; This is set in PQssl_passwd_cb function, but never reset. That seems wrong. The NSS database used in one connection might have a password, while another one might not. Or have I completely misunderstood this? - Heikki |
In reply to this post by Daniel Gustafsson
Hi,
On 2020-10-27 21:07:01 +0100, Daniel Gustafsson wrote: > > On 2020-10-20 14:24:24 +0200, Daniel Gustafsson wrote: > >> From 0cb0e6a0ce9adb18bc9d212bd03e4e09fa452972 Mon Sep 17 00:00:00 2001 > >> From: Daniel Gustafsson <[hidden email]> > >> Date: Thu, 8 Oct 2020 18:44:28 +0200 > >> Subject: [PATCH] Support for NSS as a TLS backend v12 > >> --- > >> configure | 223 +++- > >> configure.ac | 39 +- > >> contrib/Makefile | 2 +- > >> contrib/pgcrypto/Makefile | 5 + > >> contrib/pgcrypto/nss.c | 773 +++++++++++ > >> contrib/pgcrypto/openssl.c | 2 +- > >> contrib/pgcrypto/px.c | 1 + > >> contrib/pgcrypto/px.h | 1 + > > > > Personally I'd like to see this patch broken up a bit - it's quite > > large. Several of the changes could easily be committed separately, no? > > Not sure how much of this makes sense committed separately (unless separately > means in quick succession), but it could certainly be broken up for the sake of > making review easier. Committing e.g. the pgcrypto pieces separately from the backend code seems unproblematic. But yes, I would expect them to go in close to each other. I'm mainly concerned with smaller review-able units. Have you done testing to ensure that NSS PG cooperates correctly with openssl PG? Is there a way we can make that easier to do? E.g. allowing to build frontend with NSS and backend with openssl and vice versa? > >> if test "$with_openssl" = yes ; then > >> + if test x"$with_nss" = x"yes" ; then > >> + AC_MSG_ERROR([multiple SSL backends cannot be enabled simultaneously"]) > >> + fi > > > > Based on a quick look there's no similar error check for the msvc > > build. Should there be? > > Thats a good question. When embarking on this is seemed quite natural to me > that it should be, but now I'm not so sure. Maybe there should be a > --with-openssl-preferred like how we handle readline/libedit or just allow > multiple and let the last one win? Do you have any input on what would make > sense? > > The only thing I think makes no sense is to allow multiple ones at the same > time given the current autoconf switches, even if it would just be to pick say > pg_strong_random from one and libpq TLS from another. Maybe we should just have --with-ssl={openssl,nss}? That'd avoid needing to check for errors. Even better, of course, would be to allow switching of the SSL backend based on config options (PGC_POSTMASTER GUC for backend, connection string for frontend). Mainly because that would make testing of interoperability so much easier. Obviously still a few places like pgcrypto, randomness, etc, where only a compile time decision seems to make sense. > >> + CLEANLDFLAGS="$LDFLAGS" > >> + # TODO: document this set of LDFLAGS > >> + LDFLAGS="-lssl3 -lsmime3 -lnss3 -lplds4 -lplc4 -lnspr4 $LDFLAGS" > > > > Shouldn't this use nss-config or such? > > Indeed it should, where available. I've added rudimentary support for that > without a fallback as of now. When would we need a fallback? > > I think it'd also be better if we could include these files as nss/ssl.h > > etc - ssl.h is a name way too likely to conflict imo. > > I've changed this to be nss/ssl.h and nspr/nspr.h etc, but the include path > will still need the direct path to the headers (from autoconf) since nss.h > includes NSPR headers as #include <nspr.h> and so on. Hm. Then it's probably not worth going there... > >> +static SECStatus > >> +pg_cert_auth_handler(void *arg, PRFileDesc * fd, PRBool checksig, PRBool isServer) > >> +{ > >> + SECStatus status; > >> + Port *port = (Port *) arg; > >> + CERTCertificate *cert; > >> + char *peer_cn; > >> + int len; > >> + > >> + status = SSL_AuthCertificate(CERT_GetDefaultCertDB(), port->pr_fd, checksig, PR_TRUE); > >> + if (status == SECSuccess) > >> + { > >> + cert = SSL_PeerCertificate(port->pr_fd); > >> + len = strlen(cert->subjectName); > >> + peer_cn = MemoryContextAllocZero(TopMemoryContext, len + 1); > >> + if (strncmp(cert->subjectName, "CN=", 3) == 0) > >> + strlcpy(peer_cn, cert->subjectName + strlen("CN="), len + 1); > >> + else > >> + strlcpy(peer_cn, cert->subjectName, len + 1); > >> + CERT_DestroyCertificate(cert); > >> + > >> + port->peer_cn = peer_cn; > >> + port->peer_cert_valid = true; > > > > Hm. We either should have something similar to > > > > /* > > * Reject embedded NULLs in certificate common name to prevent > > * attacks like CVE-2009-4034. > > */ > > if (len != strlen(peer_cn)) > > { > > ereport(COMMERROR, > > (errcode(ERRCODE_PROTOCOL_VIOLATION), > > errmsg("SSL certificate's common name contains embedded null"))); > > pfree(peer_cn); > > return -1; > > } > > here, or a comment explaining why not. > > We should, but it's proving rather difficult as there is no equivalent API call > to get the string as well as the expected length of it. Hm. Should at least have a test to ensure that's not a problem then. I hope/assume NSS rejects this somewhere internally... > > Also, what's up with the CN= bit? Why is that needed here, but not for > > openssl? > > OpenSSL returns only the value portion, whereas NSS returns key=value so we > need to skip over the key= part. Why is it a conditional path though? > >> +/* > >> + * PR_ImportTCPSocket() is a private API, but very widely used, as it's the > >> + * only way to make NSS use an already set up POSIX file descriptor rather > >> + * than opening one itself. To quote the NSS documentation: > >> + * > >> + * "In theory, code that uses PR_ImportTCPSocket may break when NSPR's > >> + * implementation changes. In practice, this is unlikely to happen because > >> + * NSPR's implementation has been stable for years and because of NSPR's > >> + * strong commitment to backward compatibility." > >> + * > >> + * https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSPR/Reference/PR_ImportTCPSocket > >> + * > >> + * The function is declared in <private/pprio.h>, but as it is a header marked > >> + * private we declare it here rather than including it. > >> + */ > >> +NSPR_API(PRFileDesc *) PR_ImportTCPSocket(int); > > > > Ugh. This is really the way to do this? How do other applications deal > > with this problem? > > They either #include <private/pprio.h> or they do it like this (or vendor NSPR > which makes calling private APIs less problematic). It sure is ugly, but there > is no alternative to using this function. Hm - in debian unstable's NSS this function appears to be in nss/ssl.h, not pprio.h: /* ** Imports fd into SSL, returning a new socket. Copies SSL configuration ** from model. */ SSL_IMPORT PRFileDesc *SSL_ImportFD(PRFileDesc *model, PRFileDesc *fd); and ssl.h starts with: /* * This file contains prototypes for the public SSL functions. > >> + > >> + PK11_SetPasswordFunc(PQssl_passwd_cb); > > > > Is it actually OK to do stuff like this when other users of NSS might be > > present? That's obviously more likely in the libpq case, compared to the > > backend case (where it's also possible, of course). What prevents us > > from overriding another user's callback? > > The password callback pointer is stored in a static variable in NSS (in the > file lib/pk11wrap/pk11auth.c). But, uh, how is that not a problem? What happens if a backend imports libpq? What if plpython imports curl which then also uses nss? > + /* > + * Finally we must configure the socket for being a server by setting the > + * certificate and key. > + */ > + status = SSL_ConfigSecureServer(model, server_cert, private_key, kt_rsa); > + if (status != SECSuccess) > + ereport(ERROR, > + (errmsg("unable to configure secure server: %s", > + pg_SSLerrmessage(PR_GetError())))); > + status = SSL_ConfigServerCert(model, server_cert, private_key, NULL, 0); > + if (status != SECSuccess) > + ereport(ERROR, > + (errmsg("unable to configure server for TLS server connections: %s", > + pg_SSLerrmessage(PR_GetError())))); Why do both of these need to get called? The NSS docs say: /* ** Deprecated variant of SSL_ConfigServerCert. ** ... SSL_IMPORT SECStatus SSL_ConfigSecureServer( PRFileDesc *fd, CERTCertificate *cert, SECKEYPrivateKey *key, SSLKEAType kea); Greetings, Andres Freund |
>>> Personally I'd like to see this patch broken up a bit - it's quite
>>> large. Several of the changes could easily be committed separately, no? >> >> Not sure how much of this makes sense committed separately (unless separately >> means in quick succession), but it could certainly be broken up for the sake of >> making review easier. > > Committing e.g. the pgcrypto pieces separately from the backend code > seems unproblematic. But yes, I would expect them to go in close to each > other. I'm mainly concerned with smaller review-able units. Attached is a v14 where the logical units are separated into individual commits. I hope this split makes it easier to read. The 0006 commit were things not really related to NSS at all that can be submitted to -hackers independently of this work, but they're still there since this version wasn't supposed to change anything. Most of the changes to sslinfo in 0005 are really only needed in case OpenSSL isn't the only TLS library, but I would argue that they should be considered regardless. There we are still accessing the ->ssl member directly and passing it to OpenSSL rather than using the be_tls_* API that we have. I can extract that portion as a separate patch submission unless there are objections. cheers ./daniel ![]() ![]() ![]() ![]() ![]() ![]() |
In reply to this post by Andres Freund
> On 28 Oct 2020, at 07:39, Andres Freund <[hidden email]> wrote:
> Have you done testing to ensure that NSS PG cooperates correctly with > openssl PG? Is there a way we can make that easier to do? E.g. allowing > to build frontend with NSS and backend with openssl and vice versa? When I wrote the Secure Transport patch I had a patch against PostgresNode which allowed for overriding the server binaries like so: SSLTEST_SERVER_BIN=/path/bin/ make -C src/test/ssl/ check I've used that coupled with manual testing so far to make sure that an openssl client can talk to an NSS backend and so on. Before any other backend is added we clearly need *a* way of doing this, one which no doubt will need to be improved upon to suit more workflows. This is sort of the same situation as pg_upgrade, where two trees is needed to really test it. I can clean that patch up and post as a starting point for discussions. >>>> if test "$with_openssl" = yes ; then >>>> + if test x"$with_nss" = x"yes" ; then >>>> + AC_MSG_ERROR([multiple SSL backends cannot be enabled simultaneously"]) >>>> + fi >>> >>> Based on a quick look there's no similar error check for the msvc >>> build. Should there be? >> >> Thats a good question. When embarking on this is seemed quite natural to me >> that it should be, but now I'm not so sure. Maybe there should be a >> --with-openssl-preferred like how we handle readline/libedit or just allow >> multiple and let the last one win? Do you have any input on what would make >> sense? >> >> The only thing I think makes no sense is to allow multiple ones at the same >> time given the current autoconf switches, even if it would just be to pick say >> pg_strong_random from one and libpq TLS from another. > > Maybe we should just have --with-ssl={openssl,nss}? That'd avoid needing > to check for errors. After another round of thinking I like this even better as it makes the build infra cleaner, so the attached patch has this implemented. > Even better, of course, would be to allow switching of the SSL backend > based on config options (PGC_POSTMASTER GUC for backend, connection > string for frontend). Mainly because that would make testing of > interoperability so much easier. Obviously still a few places like > pgcrypto, randomness, etc, where only a compile time decision seems to > make sense. It would make testing easier, but the expense seems potentially rather high. How would a GUC switch be allowed to operate, would we have mixed backends or would be require all openssl connectins to be dropped before serving nss ones? >>>> + CLEANLDFLAGS="$LDFLAGS" >>>> + # TODO: document this set of LDFLAGS >>>> + LDFLAGS="-lssl3 -lsmime3 -lnss3 -lplds4 -lplc4 -lnspr4 $LDFLAGS" >>> >>> Shouldn't this use nss-config or such? >> >> Indeed it should, where available. I've added rudimentary support for that >> without a fallback as of now. > > When would we need a fallback? an nss-config AFAICT. I wouldn't be surprised if there are other cases. >>> I think it'd also be better if we could include these files as nss/ssl.h >>> etc - ssl.h is a name way too likely to conflict imo. >> >> I've changed this to be nss/ssl.h and nspr/nspr.h etc, but the include path >> will still need the direct path to the headers (from autoconf) since nss.h >> includes NSPR headers as #include <nspr.h> and so on. > > Hm. Then it's probably not worth going there... It does however make visual parsing of the source files easer since it's clear which ssl.h is being referred to. I'm in favor of keeping it. >>>> +static SECStatus >>>> +pg_cert_auth_handler(void *arg, PRFileDesc * fd, PRBool checksig, PRBool isServer) >>>> +{ >>>> + SECStatus status; >>>> + Port *port = (Port *) arg; >>>> + CERTCertificate *cert; >>>> + char *peer_cn; >>>> + int len; >>>> + >>>> + status = SSL_AuthCertificate(CERT_GetDefaultCertDB(), port->pr_fd, checksig, PR_TRUE); >>>> + if (status == SECSuccess) >>>> + { >>>> + cert = SSL_PeerCertificate(port->pr_fd); >>>> + len = strlen(cert->subjectName); >>>> + peer_cn = MemoryContextAllocZero(TopMemoryContext, len + 1); >>>> + if (strncmp(cert->subjectName, "CN=", 3) == 0) >>>> + strlcpy(peer_cn, cert->subjectName + strlen("CN="), len + 1); >>>> + else >>>> + strlcpy(peer_cn, cert->subjectName, len + 1); >>>> + CERT_DestroyCertificate(cert); >>>> + >>>> + port->peer_cn = peer_cn; >>>> + port->peer_cert_valid = true; >>> >>> Hm. We either should have something similar to >>> >>> /* >>> * Reject embedded NULLs in certificate common name to prevent >>> * attacks like CVE-2009-4034. >>> */ >>> if (len != strlen(peer_cn)) >>> { >>> ereport(COMMERROR, >>> (errcode(ERRCODE_PROTOCOL_VIOLATION), >>> errmsg("SSL certificate's common name contains embedded null"))); >>> pfree(peer_cn); >>> return -1; >>> } >>> here, or a comment explaining why not. >> >> We should, but it's proving rather difficult as there is no equivalent API call >> to get the string as well as the expected length of it. > > Hm. Should at least have a test to ensure that's not a problem then. I > hope/assume NSS rejects this somewhere internally... >>> Also, what's up with the CN= bit? Why is that needed here, but not for >>> openssl? >> >> OpenSSL returns only the value portion, whereas NSS returns key=value so we >> need to skip over the key= part. > > Why is it a conditional path though? It was mostly just a belts-and-suspenders thing, I don't have any hard evidence that it's been a thing in any modern NSS version so it can be removed. >>>> +/* >>>> + * PR_ImportTCPSocket() is a private API, but very widely used, as it's the >>>> + * only way to make NSS use an already set up POSIX file descriptor rather >>>> + * than opening one itself. To quote the NSS documentation: >>>> + * >>>> + * "In theory, code that uses PR_ImportTCPSocket may break when NSPR's >>>> + * implementation changes. In practice, this is unlikely to happen because >>>> + * NSPR's implementation has been stable for years and because of NSPR's >>>> + * strong commitment to backward compatibility." >>>> + * >>>> + * https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSPR/Reference/PR_ImportTCPSocket >>>> + * >>>> + * The function is declared in <private/pprio.h>, but as it is a header marked >>>> + * private we declare it here rather than including it. >>>> + */ >>>> +NSPR_API(PRFileDesc *) PR_ImportTCPSocket(int); >>> >>> Ugh. This is really the way to do this? How do other applications deal >>> with this problem? >> >> They either #include <private/pprio.h> or they do it like this (or vendor NSPR >> which makes calling private APIs less problematic). It sure is ugly, but there >> is no alternative to using this function. > > Hm - in debian unstable's NSS this function appears to be in nss/ssl.h, > not pprio.h: > > /* > ** Imports fd into SSL, returning a new socket. Copies SSL configuration > ** from model. > */ > SSL_IMPORT PRFileDesc *SSL_ImportFD(PRFileDesc *model, PRFileDesc *fd); > > and ssl.h starts with: > /* > * This file contains prototypes for the public SSL functions. since it's the API for importing an NSPR socket into NSS and enabling SSL/TLS on it. Thats been a public API for a long time. ImportTCPSocket is used to import an already opened socket into NSPR, else NSPR must open the socket itself. That part has been kept private for reasons unknown, as it's incredibly useful. >>>> + PK11_SetPasswordFunc(PQssl_passwd_cb); >>> >>> Is it actually OK to do stuff like this when other users of NSS might be >>> present? That's obviously more likely in the libpq case, compared to the >>> backend case (where it's also possible, of course). What prevents us >>> from overriding another user's callback? >> >> The password callback pointer is stored in a static variable in NSS (in the >> file lib/pk11wrap/pk11auth.c). > > But, uh, how is that not a problem? What happens if a backend imports > libpq? What if plpython imports curl which then also uses nss? don't really have good answers here. The available implementation is via the static var, and there are no alternative APIs. I've tried googling for insights but haven't come across any. The only datapoint I have is that I can't recall there ever being a complaint against libcurl doing this exact thing. That of course doesn't mean it cannot happen or cause problems. >> + /* >> + * Finally we must configure the socket for being a server by setting the >> + * certificate and key. >> + */ >> + status = SSL_ConfigSecureServer(model, server_cert, private_key, kt_rsa); >> + if (status != SECSuccess) >> + ereport(ERROR, >> + (errmsg("unable to configure secure server: %s", >> + pg_SSLerrmessage(PR_GetError())))); >> + status = SSL_ConfigServerCert(model, server_cert, private_key, NULL, 0); >> + if (status != SECSuccess) >> + ereport(ERROR, >> + (errmsg("unable to configure server for TLS server connections: %s", >> + pg_SSLerrmessage(PR_GetError())))); > > Why do both of these need to get called? The NSS docs say: > > /* > ** Deprecated variant of SSL_ConfigServerCert. > ** > ... > SSL_IMPORT SECStatus SSL_ConfigSecureServer( > PRFileDesc *fd, CERTCertificate *ce rt, > SECKEYPrivateKey *key, SSLKEAType kea); in the online documentation: https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSS/SSL_functions/sslfnc.html (SSL_ConfigServerCert isn't at all mentioned there which dates it to before this went it obsoleting SSL_ConfigSecureServer.) Fixed by removing the superfluous call. Thanks again for reviewing! cheers ./daniel ![]() ![]() ![]() ![]() ![]() ![]() |
On 10/29/20 11:20 AM, Daniel Gustafsson wrote: >> On 28 Oct 2020, at 07:39, Andres Freund <[hidden email]> wrote: >> Have you done testing to ensure that NSS PG cooperates correctly with >> openssl PG? Is there a way we can make that easier to do? E.g. allowing >> to build frontend with NSS and backend with openssl and vice versa? > When I wrote the Secure Transport patch I had a patch against PostgresNode > which allowed for overriding the server binaries like so: > > SSLTEST_SERVER_BIN=/path/bin/ make -C src/test/ssl/ check > > I've used that coupled with manual testing so far to make sure that an openssl > client can talk to an NSS backend and so on. Before any other backend is added > we clearly need *a* way of doing this, one which no doubt will need to be > improved upon to suit more workflows. > > This is sort of the same situation as pg_upgrade, where two trees is needed to > really test it. > > I can clean that patch up and post as a starting point for discussions. > >>>>> if test "$with_openssl" = yes ; then >>>>> + if test x"$with_nss" = x"yes" ; then >>>>> + AC_MSG_ERROR([multiple SSL backends cannot be enabled simultaneously"]) >>>>> + fi >>>> Based on a quick look there's no similar error check for the msvc >>>> build. Should there be? >>> Thats a good question. When embarking on this is seemed quite natural to me >>> that it should be, but now I'm not so sure. Maybe there should be a >>> --with-openssl-preferred like how we handle readline/libedit or just allow >>> multiple and let the last one win? Do you have any input on what would make >>> sense? >>> >>> The only thing I think makes no sense is to allow multiple ones at the same >>> time given the current autoconf switches, even if it would just be to pick say >>> pg_strong_random from one and libpq TLS from another. >> Maybe we should just have --with-ssl={openssl,nss}? That'd avoid needing >> to check for errors. > Thats another option, with --with-openssl being an alias for --with-ssl=openssl. > > After another round of thinking I like this even better as it makes the build > infra cleaner, so the attached patch has this implemented. > >> Even better, of course, would be to allow switching of the SSL backend >> based on config options (PGC_POSTMASTER GUC for backend, connection >> string for frontend). Mainly because that would make testing of >> interoperability so much easier. Obviously still a few places like >> pgcrypto, randomness, etc, where only a compile time decision seems to >> make sense. > It would make testing easier, but the expense seems potentially rather high. > How would a GUC switch be allowed to operate, would we have mixed backends or > would be require all openssl connectins to be dropped before serving nss ones? > >>>>> + CLEANLDFLAGS="$LDFLAGS" >>>>> + # TODO: document this set of LDFLAGS >>>>> + LDFLAGS="-lssl3 -lsmime3 -lnss3 -lplds4 -lplc4 -lnspr4 $LDFLAGS" >>>> Shouldn't this use nss-config or such? >>> Indeed it should, where available. I've added rudimentary support for that >>> without a fallback as of now. >> When would we need a fallback? > One one of my boxes I have NSS/NSPR installed via homebrew and they don't ship > an nss-config AFAICT. I wouldn't be surprised if there are other cases. > >>>> I think it'd also be better if we could include these files as nss/ssl.h >>>> etc - ssl.h is a name way too likely to conflict imo. >>> I've changed this to be nss/ssl.h and nspr/nspr.h etc, but the include path >>> will still need the direct path to the headers (from autoconf) since nss.h >>> includes NSPR headers as #include <nspr.h> and so on. >> Hm. Then it's probably not worth going there... > It does however make visual parsing of the source files easer since it's clear > which ssl.h is being referred to. I'm in favor of keeping it. > >>>>> +static SECStatus >>>>> +pg_cert_auth_handler(void *arg, PRFileDesc * fd, PRBool checksig, PRBool isServer) >>>>> +{ >>>>> + SECStatus status; >>>>> + Port *port = (Port *) arg; >>>>> + CERTCertificate *cert; >>>>> + char *peer_cn; >>>>> + int len; >>>>> + >>>>> + status = SSL_AuthCertificate(CERT_GetDefaultCertDB(), port->pr_fd, checksig, PR_TRUE); >>>>> + if (status == SECSuccess) >>>>> + { >>>>> + cert = SSL_PeerCertificate(port->pr_fd); >>>>> + len = strlen(cert->subjectName); >>>>> + peer_cn = MemoryContextAllocZero(TopMemoryContext, len + 1); >>>>> + if (strncmp(cert->subjectName, "CN=", 3) == 0) >>>>> + strlcpy(peer_cn, cert->subjectName + strlen("CN="), len + 1); >>>>> + else >>>>> + strlcpy(peer_cn, cert->subjectName, len + 1); >>>>> + CERT_DestroyCertificate(cert); >>>>> + >>>>> + port->peer_cn = peer_cn; >>>>> + port->peer_cert_valid = true; >>>> Hm. We either should have something similar to >>>> >>>> /* >>>> * Reject embedded NULLs in certificate common name to prevent >>>> * attacks like CVE-2009-4034. >>>> */ >>>> if (len != strlen(peer_cn)) >>>> { >>>> ereport(COMMERROR, >>>> (errcode(ERRCODE_PROTOCOL_VIOLATION), >>>> errmsg("SSL certificate's common name contains embedded null"))); >>>> pfree(peer_cn); >>>> return -1; >>>> } >>>> here, or a comment explaining why not. >>> We should, but it's proving rather difficult as there is no equivalent API call >>> to get the string as well as the expected length of it. >> Hm. Should at least have a test to ensure that's not a problem then. I >> hope/assume NSS rejects this somewhere internally... > Agreed, I'll try to hack up a testcase. > >>>> Also, what's up with the CN= bit? Why is that needed here, but not for >>>> openssl? >>> OpenSSL returns only the value portion, whereas NSS returns key=value so we >>> need to skip over the key= part. >> Why is it a conditional path though? > It was mostly just a belts-and-suspenders thing, I don't have any hard evidence > that it's been a thing in any modern NSS version so it can be removed. > >>>>> +/* >>>>> + * PR_ImportTCPSocket() is a private API, but very widely used, as it's the >>>>> + * only way to make NSS use an already set up POSIX file descriptor rather >>>>> + * than opening one itself. To quote the NSS documentation: >>>>> + * >>>>> + * "In theory, code that uses PR_ImportTCPSocket may break when NSPR's >>>>> + * implementation changes. In practice, this is unlikely to happen because >>>>> + * NSPR's implementation has been stable for years and because of NSPR's >>>>> + * strong commitment to backward compatibility." >>>>> + * >>>>> + * https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSPR/Reference/PR_ImportTCPSocket >>>>> + * >>>>> + * The function is declared in <private/pprio.h>, but as it is a header marked >>>>> + * private we declare it here rather than including it. >>>>> + */ >>>>> +NSPR_API(PRFileDesc *) PR_ImportTCPSocket(int); >>>> Ugh. This is really the way to do this? How do other applications deal >>>> with this problem? >>> They either #include <private/pprio.h> or they do it like this (or vendor NSPR >>> which makes calling private APIs less problematic). It sure is ugly, but there >>> is no alternative to using this function. >> Hm - in debian unstable's NSS this function appears to be in nss/ssl.h, >> not pprio.h: >> >> /* >> ** Imports fd into SSL, returning a new socket. Copies SSL configuration >> ** from model. >> */ >> SSL_IMPORT PRFileDesc *SSL_ImportFD(PRFileDesc *model, PRFileDesc *fd); >> >> and ssl.h starts with: >> /* >> * This file contains prototypes for the public SSL functions. > Right, but that's Import*FD*, not Import*TCPSocket*. We use ImportFD as well > since it's the API for importing an NSPR socket into NSS and enabling SSL/TLS > on it. Thats been a public API for a long time. ImportTCPSocket is used to > import an already opened socket into NSPR, else NSPR must open the socket > itself. That part has been kept private for reasons unknown, as it's > incredibly useful. > >>>>> + PK11_SetPasswordFunc(PQssl_passwd_cb); >>>> Is it actually OK to do stuff like this when other users of NSS might be >>>> present? That's obviously more likely in the libpq case, compared to the >>>> backend case (where it's also possible, of course). What prevents us >>>> from overriding another user's callback? >>> The password callback pointer is stored in a static variable in NSS (in the >>> file lib/pk11wrap/pk11auth.c). >> But, uh, how is that not a problem? What happens if a backend imports >> libpq? What if plpython imports curl which then also uses nss? > Sorry, that sentence wasn't really finished. What I meant to write was that I > don't really have good answers here. The available implementation is via the > static var, and there are no alternative APIs. I've tried googling for > insights but haven't come across any. > > The only datapoint I have is that I can't recall there ever being a complaint > against libcurl doing this exact thing. That of course doesn't mean it cannot > happen or cause problems. > >>> + /* >>> + * Finally we must configure the socket for being a server by setting the >>> + * certificate and key. >>> + */ >>> + status = SSL_ConfigSecureServer(model, server_cert, private_key, kt_rsa); >>> + if (status != SECSuccess) >>> + ereport(ERROR, >>> + (errmsg("unable to configure secure server: %s", >>> + pg_SSLerrmessage(PR_GetError())))); >>> + status = SSL_ConfigServerCert(model, server_cert, private_key, NULL, 0); >>> + if (status != SECSuccess) >>> + ereport(ERROR, >>> + (errmsg("unable to configure server for TLS server connections: %s", >>> + pg_SSLerrmessage(PR_GetError())))); >> Why do both of these need to get called? The NSS docs say: >> >> /* >> ** Deprecated variant of SSL_ConfigServerCert. >> ** >> ... >> SSL_IMPORT SECStatus SSL_ConfigSecureServer( >> PRFileDesc *fd, CERTCertificate *ce rt, >> SECKEYPrivateKey *key, SSLKEAType kea); > They don't, I had missed the deprecation warning as it's not mentioned at all > in the online documentation: > > https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSS/SSL_functions/sslfnc.html > > (SSL_ConfigServerCert isn't at all mentioned there which dates it to before > this went it obsoleting SSL_ConfigSecureServer.) > > Fixed by removing the superfluous call. > I've been looking through the new patch set, in particular the testing setup. The way it seems to proceed is to use the existing openssl generated certificates and imports them into NSS certificate databases. That seems fine to bootstrap testing, but it seems to me it would be more sound not to rely on openssl at all. I'd rather see the Makefile containing commands to create these from scratch, which mirror the openssl variants. IOW you should be able to build and test this from scratch, including certificate generation, without having openssl installed at all. I also notice that the invocations to pk12util don't contain the "sql:" prefix to the -d option, even though the database was created with that prefix a few lines above. That seems like a mistake from my reading of the pk12util man page. cheers andrew |
> On 1 Nov 2020, at 14:13, Andrew Dunstan <[hidden email]> wrote:
> I've been looking through the new patch set, in particular the testing > setup. Thanks! > The way it seems to proceed is to use the existing openssl generated > certificates and imports them into NSS certificate databases. That seems > fine to bootstrap testing, That's pretty much why I opted for using the existing certs: to bootstrap the patch and ensure OpenSSL-backend compatibility. > but it seems to me it would be more sound not > to rely on openssl at all. I'd rather see the Makefile containing > commands to create these from scratch, which mirror the openssl > variants. IOW you should be able to build and test this from scratch, > including certificate generation, without having openssl installed at all. I don't disagree with this, but I do also believe there is value in testing all TLS backends with exactly the same certificates to act as a baseline. The nssfiles target should definitely be able to generate from scratch, but maybe a combination is the best option? Being well versed in the buildfarm code, do you have an off-the-cuff idea on how to do cross library testing such that OpenSSL/NSS compatibility can be ensured? Andres was floating the idea of making a single sourcetree be able to have both for testing but more discussion is needed to settle on a way forward. > I also notice that the invocations to pk12util don't contain the "sql:" > prefix to the -d option, even though the database was created with that > prefix a few lines above. That seems like a mistake from my reading of > the pk12util man page. Fixed in the attached v16, which also drops the parts of the patchset which have been submitted separately to -hackers (the sslinfo patch hunks are still there are they are required). cheers ./daniel ![]() ![]() ![]() ![]() ![]() |
On 11/1/20 5:04 PM, Daniel Gustafsson wrote: >> On 1 Nov 2020, at 14:13, Andrew Dunstan <[hidden email]> wrote: >> I've been looking through the new patch set, in particular the testing >> setup. > Thanks! > >> The way it seems to proceed is to use the existing openssl generated >> certificates and imports them into NSS certificate databases. That seems >> fine to bootstrap testing, > That's pretty much why I opted for using the existing certs: to bootstrap the > patch and ensure OpenSSL-backend compatibility. > >> but it seems to me it would be more sound not >> to rely on openssl at all. I'd rather see the Makefile containing >> commands to create these from scratch, which mirror the openssl >> variants. IOW you should be able to build and test this from scratch, >> including certificate generation, without having openssl installed at all. > I don't disagree with this, but I do also believe there is value in testing all > TLS backends with exactly the same certificates to act as a baseline. The > nssfiles target should definitely be able to generate from scratch, but maybe a > combination is the best option? Yeah. I certainly think we need something that should how we would generate them from scratch using nss. That said, the importation code is also useful. > > Being well versed in the buildfarm code, do you have an off-the-cuff idea onIU > how to do cross library testing such that OpenSSL/NSS compatibility can be > ensured? Andres was floating the idea of making a single sourcetree be able to > have both for testing but more discussion is needed to settle on a way forward. Well, I'd probably try to leverage the knowledge we have in doing cross-version upgrade testing. It works like this: After the install-check-C stage each branch saves its binaries and data files in a special location, adjusting things like library locations to match. then to test that version it uses that against all the older versions similarly saved. We could generalize that saving mechanism and do it if any module required it. But instead of testing against a different branch, we'd test against a different animal. So we'd have two animals, one building with openssl and one with nss, and they would test against each other (i.e. one as the client and one as the sever, and vice versa). This would involve a deal of work on my part, but it's very doable, I believe. We'd need a way to run tests where we could specify the client and server binary locations. Anyway, those are my thoughts. Comments welcome. cheers andrew -- Andrew Dunstan EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company |
In reply to this post by Heikki Linnakangas
> On 27 Oct 2020, at 21:18, Heikki Linnakangas <[hidden email]> wrote:
(sorry for slow response). You are absolutely right, the has_password flag
> > On 27/10/2020 22:07, Daniel Gustafsson wrote: >> /* >> * Track whether the NSS database has a password set or not. There is no API >> * function for retrieving password status, so we simply flip this to true in >> * case NSS invoked the password callback - as that will only happen in case >> * there is a password. The reason for tracking this is that there are calls >> * which require a password parameter, but doesn't use the callbacks provided, >> * so we must call the callback on behalf of these. >> */ >> static bool has_password = false; > > This is set in PQssl_passwd_cb function, but never reset. That seems wrong. The NSS database used in one connection might have a password, while another one might not. Or have I completely misunderstood this? must be tracked per connection in PGconn. The attached v17 implements this as well a frontend bugfix which caused dropped connections and some smaller fixups to make strings more translateable. I've also included a WIP version of SCRAM channel binding in the attached patch, it's currently failing to connect but someone here might spot the bug before I do so I figured it's better to include it. The 0005 patch is now, thanks to the sslinfo patch going in on master, only containing NSS specific code. cheers ./daniel ![]() ![]() ![]() ![]() ![]() |
In reply to this post by Andrew Dunstan
> On 2 Nov 2020, at 15:17, Andrew Dunstan <[hidden email]> wrote:
> We could generalize that saving mechanism and do it if any module > required it. But instead of testing against a different branch, we'd > test against a different animal. So we'd have two animals, one building > with openssl and one with nss, and they would test against each other > (i.e. one as the client and one as the sever, and vice versa). That seems like a very good plan. It would also allow us to test a backend compiled with OpenSSL 1.0.2 against a frontend with OpenSSL 1.1.1 which might come in handy when OpenSSL 3.0.0 lands. > This would involve a deal of work on my part, but it's very doable, I > believe. I have no experience with the buildfarm code, but I'm happy to help if theres anything I can do. cheers ./daniel |
In reply to this post by Daniel Gustafsson
On Nov 4, 2020, at 5:09 AM, Daniel Gustafsson <[hidden email]> wrote:
> (sorry for slow response). You are absolutely right, the has_password flag > must be tracked per connection in PGconn. The attached v17 implements this as > well a frontend bugfix which caused dropped connections and some smaller fixups > to make strings more translateable. Some initial notes from building and testing on macOS Mojave. I'm working with both a brew-packaged NSS/NSPR (which includes basic nss-/nspr-config) and a hand-built NSS/NSPR (which does not). 1. In configure.ac: > + LDFLAGS="$LDFLAGS $NSS_LIBS $NSPR_LIBS" > + CFLAGS="$CFLAGS $NSS_CFLAGS $NSPR_CFLAGS" > + > + AC_CHECK_LIB(nss3, SSL_VersionRangeSet, [], [AC_MSG_ERROR([library 'nss3' is required for NSS])]) Looks like SSL_VersionRangeSet is part of libssl3, not libnss3. So this fails with the hand-built stack, where there is no nss-config to populate LDFLAGS. I changed the function to NSS_InitContext and that seems to work nicely. 2. Among the things to eventually think about when it comes to configuring, it looks like some platforms [1] install the headers under <nspr4/...> and <nss3/...> instead of <nspr/...> and <nss/...>. It's unfortunate that the NSS maintainers never chose an official installation layout. 3. I need two more `#define NO_NSPR_10_SUPPORT` guards added in both src/include/common/pg_nss.h src/port/pg_strong_random.c before the tree will compile for me. Both of those files include NSS headers. 4. be_tls_init() refuses to run correctly for me; I end up getting an NSPR assertion that looks like sslMutex_Init not implemented for multi-process applications ! With assertions disabled, this ends up showing a somewhat unhelpful FATAL: unable to set up TLS connection cache: security library failure. (SEC_ERROR_LIBRARY_FAILURE) It looks like cross-process locking isn't actually enabled on macOS, which is a long-standing bug in NSPR [2, 3]. So calls to SSL_ConfigMPServerSIDCache() error out. --Jacob [1] https://github.com/erthink/ReOpenLDAP/issues/112 [2] https://bugzilla.mozilla.org/show_bug.cgi?id=538680 [3] https://bugzilla.mozilla.org/show_bug.cgi?id=1192500 |
> On 6 Nov 2020, at 21:37, Jacob Champion <[hidden email]> wrote:
> Some initial notes from building and testing on macOS Mojave. I'm working with > both a brew-packaged NSS/NSPR (which includes basic nss-/nspr-config) and a > hand-built NSS/NSPR (which does not). Thanks for looking! > 1. In configure.ac: > >> + LDFLAGS="$LDFLAGS $NSS_LIBS $NSPR_LIBS" >> + CFLAGS="$CFLAGS $NSS_CFLAGS $NSPR_CFLAGS" >> + >> + AC_CHECK_LIB(nss3, SSL_VersionRangeSet, [], [AC_MSG_ERROR([library 'nss3' is required for NSS])]) > > Looks like SSL_VersionRangeSet is part of libssl3, not libnss3. So this fails > with the hand-built stack, where there is no nss-config to populate LDFLAGS. I > changed the function to NSS_InitContext and that seems to work nicely. > 2. Among the things to eventually think about when it comes to configuring, it > looks like some platforms [1] install the headers under <nspr4/...> and > <nss3/...> instead of <nspr/...> and <nss/...>. It's unfortunate that the NSS > maintainers never chose an official installation layout. Yeah, maybe we need to start with the most common path and have fallbacks in case not found? > 3. I need two more `#define NO_NSPR_10_SUPPORT` guards added in both > > src/include/common/pg_nss.h > src/port/pg_strong_random.c > > before the tree will compile for me. Both of those files include NSS headers. Odd that I was able to compile on Linux, but I've added these. > 4. be_tls_init() refuses to run correctly for me; I end up getting an NSPR > assertion that looks like > > sslMutex_Init not implemented for multi-process applications ! > > With assertions disabled, this ends up showing a somewhat unhelpful > > FATAL: unable to set up TLS connection cache: security library failure. (SEC_ERROR_LIBRARY_FAILURE) > > It looks like cross-process locking isn't actually enabled on macOS, which is a > long-standing bug in NSPR [2, 3]. So calls to SSL_ConfigMPServerSIDCache() > error out. backed by NSS. The attached switches to SSL_ConfigServerSessionIDCacheWithOpt with which one can explicitly make the cache non-shared, which in turn backs the mutexes with NSPR locks rather than the missing sem_init. Can you test this version and see if that makes it work? This version also contains a channel binding bug that Heikki pointed out off- list (sadly not The bug) and a few very minor cleanups as well as a rebase to handle the new pg_strong_random_init. Actually performing the context init there is yet a TODO, but I wanted a version out that at all compiled. cheers ./daniel ![]() ![]() ![]() ![]() ![]() |
On Nov 6, 2020, at 3:11 PM, Daniel Gustafsson <[hidden email]> wrote:
> > The attached switches to SSL_ConfigServerSessionIDCacheWithOpt > with which one can explicitly make the cache non-shared, which in turn backs > the mutexes with NSPR locks rather than the missing sem_init. Can you test > this version and see if that makes it work? Yep, I get much farther through the tests with that patch. I'm currently diving into another assertion failure during socket disconnection: Assertion failure: fd->secret == NULL, at prlayer.c:45 cURL has some ominously vague references to this [1], though I'm not sure that we should work around it in the same way without knowing what the cause is... --Jacob [1] https://github.com/curl/curl/blob/4d2f800/lib/vtls/nss.c#L1266 |
> On 10 Nov 2020, at 21:11, Jacob Champion <[hidden email]> wrote:
> On Nov 6, 2020, at 3:11 PM, Daniel Gustafsson <[hidden email]> wrote: >> The attached switches to SSL_ConfigServerSessionIDCacheWithOpt >> with which one can explicitly make the cache non-shared, which in turn backs >> the mutexes with NSPR locks rather than the missing sem_init. Can you test >> this version and see if that makes it work? > > Yep, I get much farther through the tests with that patch. Great, thanks for confirming. > I'm currently > diving into another assertion failure during socket disconnection: > > Assertion failure: fd->secret == NULL, at prlayer.c:45 > > cURL has some ominously vague references to this [1], though I'm not > sure that we should work around it in the same way without knowing what > the cause is... Digging through the archives from when this landed in curl, the assertion failure was never fully identified back then but happened spuriously. Which version of NSPR is this happening with? cheers ./daniel |
On Nov 10, 2020, at 2:28 PM, Daniel Gustafsson <[hidden email]> wrote:
> > Digging through the archives from when this landed in curl, the assertion > failure was never fully identified back then but happened spuriously. Which > version of NSPR is this happening with? This is NSPR 4.29, with debugging enabled. The fd that causes the assertion is the custom layer that's added during be_tls_open_server(), which connects a Port as the layer secret. It looks like NSPR is trying to help surface potential memory leaks by asserting if the secret is non-NULL at the time the stack is being closed. In this case, it doesn't matter since the Port lifetime is managed elsewhere, but it looks easy enough to add a custom close in the way that cURL and the NSPR test programs [1] do. Sample patch attached, which gets me to the end of the tests without any assertions. (Two failures left on my machine.) --Jacob [1] https://hg.mozilla.org/projects/nspr/file/bf6620c143/pr/tests/nblayer.c#l354 |
Free forum by Nabble | Edit this page |