Bug with memory leak on cert validation in libpq

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Bug with memory leak on cert validation in libpq

Roman Peshkurov
Hi,

The problem is incorrect usage of openssl stack "destructor". We noticed it on a graphs for the few months. It looked lika an increasing line of res memory usage.

Sorry, I've already made pull request (wasn't familiar with your flow). The all information is described there:
https://github.com/postgres/postgres/pull/52

Best regards, Roman.
Reply | Threaded
Open this post in threaded view
|

Re: Bug with memory leak on cert validation in libpq

Michael Paquier-2
On Mon, Apr 20, 2020 at 08:40:06PM +0300, Roman Peshkurov wrote:
> The problem is incorrect usage of openssl stack "destructor". We noticed it
> on a graphs for the few months. It looked lika an increasing line of res
> memory usage.
>
> Sorry, I've already made pull request (wasn't familiar with your flow). The
> all information is described there:
> https://github.com/postgres/postgres/pull/52

Thanks for taking the time to send an email here.  I suspect that few
people around here look at the cloned repo on github (I don't FWIW).

If this reference goes away, then then community archives would lose a
part of its history.  So, first, here is the patch for reference:
--- a/src/interfaces/libpq/fe-secure-openssl.c
+++ b/src/interfaces/libpq/fe-secure-openssl.c
@@ -552,7 +552,7 @@ pgtls_verify_peer_name_matches_certificate_guts(PGconn *conn,
            if (rc != 0)
                break;
            }
-           sk_GENERAL_NAME_free(peer_san);
+           sk_GENERAL_NAME_pop_free(peer_san, GENERAL_NAME_free);
        }

Looking at the samples of the OpenSSL wiki, it looks that your guess
is right:
https://wiki.openssl.org/index.php/Hostname_validation
After extracting the SAN data from the certificate using
X509_get_ext_d2i(), we get to use sk_GENERAL_NAME_pop_free() for a
correct in-depth free().

Now, we have in core a set of TAP tests in src/test/ssl with
certificates that include some SAN configuration for
sslmode=verify-full, able to trigger the code path you are mentioning
for pgtls_verify_peer_name_matches_certificate_guts() within
fe-secure-openssl.c.  In order to see the leak, (I have been lazy with
the method and did not use valgrind with a custom C program or just
psql, but the result is the same), I have put a custom hack in the
code path involved here by adding a tight loop repeating
X509_get_ext_d2i() + sk_GENERAL_NAME_free(), then ran the TAP tests of
src/test/ssl/.  After that, the leak becomes really clear after 5k
loops or such in psql, with sk_GENERAL_NAME_pop_free() preventing the
issue any rise of memory usage.

sk_GENERAL_NAME_pop_free() exists down to OpenSSL 0.9.6, which is what
I recall we officially support down to Postgres 9.5, so we are safe.

And finally, this needs to be backpatched all the way down to 9.5, so
I'll take care of it if there are no objections.

Thanks for the report, Roman!
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Bug with memory leak on cert validation in libpq

Hamid Akhtar
+1

Specifically the openssl documentation says:

void sk_free(STACK *);
    This function free()'s a stack structure.  The elements in the
stack will not be freed so one should 'pop' and free all elements from the
stack before calling this function or call sk_pop_free() instead.

void sk_pop_free(STACK *st; void (*func)());
    This function calls 'func' for each element on the stack, passing
the element as the argument.  sk_free() is then called to free the 'stack'
structure.



On Tue, Apr 21, 2020 at 9:42 AM Michael Paquier <[hidden email]> wrote:
On Mon, Apr 20, 2020 at 08:40:06PM +0300, Roman Peshkurov wrote:
> The problem is incorrect usage of openssl stack "destructor". We noticed it
> on a graphs for the few months. It looked lika an increasing line of res
> memory usage.
>
> Sorry, I've already made pull request (wasn't familiar with your flow). The
> all information is described there:
> https://github.com/postgres/postgres/pull/52

Thanks for taking the time to send an email here.  I suspect that few
people around here look at the cloned repo on github (I don't FWIW).

If this reference goes away, then then community archives would lose a
part of its history.  So, first, here is the patch for reference:
--- a/src/interfaces/libpq/fe-secure-openssl.c
+++ b/src/interfaces/libpq/fe-secure-openssl.c
@@ -552,7 +552,7 @@ pgtls_verify_peer_name_matches_certificate_guts(PGconn *conn,
            if (rc != 0)
                break;
            }
-           sk_GENERAL_NAME_free(peer_san);
+           sk_GENERAL_NAME_pop_free(peer_san, GENERAL_NAME_free);
        }

Looking at the samples of the OpenSSL wiki, it looks that your guess
is right:
https://wiki.openssl.org/index.php/Hostname_validation
After extracting the SAN data from the certificate using
X509_get_ext_d2i(), we get to use sk_GENERAL_NAME_pop_free() for a
correct in-depth free().

Now, we have in core a set of TAP tests in src/test/ssl with
certificates that include some SAN configuration for
sslmode=verify-full, able to trigger the code path you are mentioning
for pgtls_verify_peer_name_matches_certificate_guts() within
fe-secure-openssl.c.  In order to see the leak, (I have been lazy with
the method and did not use valgrind with a custom C program or just
psql, but the result is the same), I have put a custom hack in the
code path involved here by adding a tight loop repeating
X509_get_ext_d2i() + sk_GENERAL_NAME_free(), then ran the TAP tests of
src/test/ssl/.  After that, the leak becomes really clear after 5k
loops or such in psql, with sk_GENERAL_NAME_pop_free() preventing the
issue any rise of memory usage.

sk_GENERAL_NAME_pop_free() exists down to OpenSSL 0.9.6, which is what
I recall we officially support down to Postgres 9.5, so we are safe.

And finally, this needs to be backpatched all the way down to 9.5, so
I'll take care of it if there are no objections.

Thanks for the report, Roman!
--
Michael


--
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL:+923335449950  EMAIL: mailto:[hidden email]
SKYPE: engineeredvirus
Reply | Threaded
Open this post in threaded view
|

Re: Bug with memory leak on cert validation in libpq

David Steele
In reply to this post by Michael Paquier-2
On 4/21/20 12:42 AM, Michael Paquier wrote:
> On Mon, Apr 20, 2020 at 08:40:06PM +0300, Roman Peshkurov wrote:
>> The problem is incorrect usage of openssl stack "destructor". We noticed it
>> on a graphs for the few months. It looked lika an increasing line of res
>> memory usage.
>
> And finally, this needs to be backpatched all the way down to 9.5, so
> I'll take care of it if there are no objections.

+1. We have similar code in pgBackRest and I can see the leak there as
well when I adjust our Valgrind supressions a bit.

I also reviewed the OpenSSL code and agree this is the right fix.

Regards,
--
-David
[hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: Bug with memory leak on cert validation in libpq

Michael Paquier-2
On Tue, Apr 21, 2020 at 10:46:34AM -0400, David Steele wrote:
> I also reviewed the OpenSSL code and agree this is the right fix.

Thanks for confirming.  The fix has been applied down to 9.5.
--
Michael

signature.asc (849 bytes) Download Attachment