WIP: Data at rest encryption

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
94 messages Options
12345
Reply | Threaded
Open this post in threaded view
|

WIP: Data at rest encryption

Ants Aasma-3
Hi all,

I have been working on data-at-rest encryption support for PostgreSQL.
In my experience this is a common request that customers make. The
short of the feature is that all PostgreSQL data files are encrypted
with a single master key and are decrypted when read from the OS. It
does not provide column level encryption which is an almost orthogonal
feature, arguably better done client side.

Similar things can be achieved with filesystem level encryption.
However this is not always optimal for various reasons. One of the
better reasons is the desire for HSM based encryption in a storage
area network based setup.

Attached to this mail is a work in progress patch that adds an
extensible encryption mechanism. There are some loose ends left to tie
up, but the general concept and architecture is at a point where it's
ready for some feedback, fresh ideas and bikeshedding.

Usage
=====

Set up database like so:

    (read -sp "Postgres passphrase: " PGENCRYPTIONKEY; echo;
     export PGENCRYPTIONKEY
     initdb -k -K pgcrypto $PGDATA )

Start PostgreSQL:

    (read -sp "Postgres passphrase: " PGENCRYPTIONKEY; echo;
     export PGENCRYPTIONKEY
     postgres $PGDATA )

Design
======

The patch adds a new GUC called encryption_library, when specified the
named library is loaded before shared_preload_libraries and is
expected to register its encryption routines. For now the API is
pretty narrow, one parameterless function that lets the extension do
key setup on its own terms, and two functions for
encrypting/decrypting an arbitrary sized block of data with tweak. The
tweak should alter the encryption function so that identical block
contents are encrypted differently based on their location. The GUC
needs to be set at bootstrap time, so it gets set by a new option for
initdb. During bootstrap an encryption sample gets stored in the
control file, enabling useful error messages.

The library name is not stored in controldata. I'm not quite sure
about this decision. On one hand it would be very useful to tell the
user what he needs to get at his data if the configuration somehow
goes missing and it would get rid of the extra GUC. On the other hand
I don't really want to bloat control data, and the same encryption
algorithm could be provided by different implementations.

For now the encryption is done for everything that goes through md,
xlog and slru. Based on a review of read/write/fread/fwrite calls this
list is missing:

* BufFile - needs refactoring
* Logical reorder buffer serialization - probably needs a stream mode
cipher API addition.
* logical_heap_rewrite - can be encrypted as one big block
* 2PC state data - ditto
* pg_stat_statements - query texts get appended so a stream mode
cipher might be needed here too.

copydir needed some changes too because tablespace and database oid
are included in the tweak and so copying also needs to decrypt and
encrypt with the new tweak value.

For demonstration purposes I imported Brian Gladman's AES-128-XTS mode
implementation into pgcrypto and used an environment variable for key
setup. This part is not really in any reviewable state, the XTS code
needs heavy cleanup to bring it up to PostgreSQL coding standards,
keysetup needs something secure, like PBKDF2 or scrypt.

Performance with current AES implementation is not great, but not
horrible either, I'm seeing around 2x slowdown for larger than
shared_buffers, smaller than free memory workloads. However the plan
is to fix this - I have a prototype AES-NI implementation that does
3GB/s per core on my Haswell based laptop (1.25 B/cycle).

Open questions
==============

The main questions is what to do about BufFile? It currently provides
both unaligned random access and a block based interface. I wonder if
it would be a good idea to refactor it to be fully block based under
the covers.

I would also like to incorporate some database identifier as a salt in
key setup. However, system identifier stored in control file doesn't
fit this role well. It gets initialized somewhat too late in the
bootstrap process, and more importantly, gets changed on pg_upgrade.
This will make link mode upgrades impossible, which seems like a no
go. I'm torn whether to add a new value for this purpose (perhaps
stored outside the control file) or allow setting of system identifier
via initdb. The first seems like a better idea, the file could double
as a place to store additional encryption parameters, like key length
or different cipher primitive.

Regards,
Ants Aasma


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

data-at-rest-encryption-wip-2016.06.07.patch (126K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: WIP: Data at rest encryption

Haribabu Kommi-2
On Tue, Jun 7, 2016 at 11:56 PM, Ants Aasma <[hidden email]> wrote:

> Hi all,
>
> I have been working on data-at-rest encryption support for PostgreSQL.
> In my experience this is a common request that customers make. The
> short of the feature is that all PostgreSQL data files are encrypted
> with a single master key and are decrypted when read from the OS. It
> does not provide column level encryption which is an almost orthogonal
> feature, arguably better done client side.
>
> Similar things can be achieved with filesystem level encryption.
> However this is not always optimal for various reasons. One of the
> better reasons is the desire for HSM based encryption in a storage
> area network based setup.
>
> Attached to this mail is a work in progress patch that adds an
> extensible encryption mechanism. There are some loose ends left to tie
> up, but the general concept and architecture is at a point where it's
> ready for some feedback, fresh ideas and bikeshedding.

Yes, encryption is really a nice and wanted feature.
Following are my thoughts regarding the approach.

1. Instead of doing the entire database files encryption, how about
providing user an option to protect only some particular tables that
wants the encryption at table/tablespace level. This not only provides
an option to the user, it reduces the performance impact on tables
that doesn't need any encryption. The problem with this approach
is that every xlog record needs to validate to handle the encryption
/decryption, instead of at page level.

2. Instead of depending on a contrib module for the encryption, how
about integrating pgcrypto contrib in to the core and add that as a
default encryption method. And also provide an option to the user
to use a different encryption methods if needs.

3. Currently entire xlog pages are encrypted and stored in the file.
can pg_xlogdump works with those files?

4. For logical decoding, how about the adding a decoding behavior
based on the module to decide whether data to be encrypted/decrypted.

5. Instead of providing passphrase through environmental variable,
better to provide some options to pg_ctl etc.

6. I don't have any idea whether is it possible to integrate the checksum
and encryption in a single shot to avoid performance penalty.


> I would also like to incorporate some database identifier as a salt in
> key setup. However, system identifier stored in control file doesn't
> fit this role well. It gets initialized somewhat too late in the
> bootstrap process, and more importantly, gets changed on pg_upgrade.
> This will make link mode upgrades impossible, which seems like a no
> go. I'm torn whether to add a new value for this purpose (perhaps
> stored outside the control file) or allow setting of system identifier
> via initdb. The first seems like a better idea, the file could double
> as a place to store additional encryption parameters, like key length
> or different cipher primitive.

I feel separate file is better to include the key data instead of pg_control
file.


Regards,
Hari Babu
Fujitsu Australia


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: WIP: Data at rest encryption

Ants Aasma-3
On Fri, Jun 10, 2016 at 5:23 AM, Haribabu Kommi
<[hidden email]> wrote:
> 1. Instead of doing the entire database files encryption, how about
> providing user an option to protect only some particular tables that
> wants the encryption at table/tablespace level. This not only provides
> an option to the user, it reduces the performance impact on tables
> that doesn't need any encryption. The problem with this approach
> is that every xlog record needs to validate to handle the encryption
> /decryption, instead of at page level.

Is there a real need for this? The customers I have talked to want to
encrypt the whole database and my goal is to make the feature fast
enough to make that feasible for pretty much everyone. I guess
switching encryption off per table would be feasible, but the key
setup would still need to be done at server startup. Per record
encryption would result in some additional information leakage though.
Overall I thought it would not be worth it, but I'm willing to have my
mind changed on this.

> 2. Instead of depending on a contrib module for the encryption, how
> about integrating pgcrypto contrib in to the core and add that as a
> default encryption method. And also provide an option to the user
> to use a different encryption methods if needs.

Technically that would be simple enough, this is more of a policy
decision. I think having builtin encryption provided by pgcrypto is
completely fine. If a consensus emerges that it needs to be
integrated, it would need to be a separate patch anyway.

> 3. Currently entire xlog pages are encrypted and stored in the file.
> can pg_xlogdump works with those files?

Technically yes, with the patch as it stands, no. Added this to my todo list.

> 4. For logical decoding, how about the adding a decoding behavior
> based on the module to decide whether data to be encrypted/decrypted.

The data to be encrypted does not depend on the module used, so I
don't think it should be module controlled. The reorder buffer
contains pretty much the same stuff as the xlog, so not encrypting it
does not look like a valid choice. For logical heap rewrites it could
be argued that nothing useful is leaked in most cases, but encrypting
it is not hard. Just a small matter of programming.

> 5. Instead of providing passphrase through environmental variable,
> better to provide some options to pg_ctl etc.

That looks like it would be worse from a security perspective.
Integrating a passphrase prompt would be an option, but a way for
scripts to provide passphrases would still be needed.

> 6. I don't have any idea whether is it possible to integrate the checksum
> and encryption in a single shot to avoid performance penalty.

Currently no, the checksum gets stored in the page header and for any
decent cipher mode the encryption of the rest of the page will depend
on it. However, the performance difference should be negligible
because both algorithms are compute bound for cached data. The data is
very likely to be completely in L1 cache as the operations are done in
quick succession.

The non-cryptographic checksum algorithm could actually be an attack
vector for an adversary that can trigger repeated encryption by
tweaking a couple of bytes at the end of the page to see when the
checksum matches and try to infer the data from that. Similarly to the
CRIME attack. However the LSN stored at the beginning of the page
header basically provides a nonce that makes this impossible.

This also means that encryption needs to imply wal_log_hints. Will
include this in the next version of the patch.

>> I would also like to incorporate some database identifier as a salt in
>> key setup. However, system identifier stored in control file doesn't
>> fit this role well. It gets initialized somewhat too late in the
>> bootstrap process, and more importantly, gets changed on pg_upgrade.
>> This will make link mode upgrades impossible, which seems like a no
>> go. I'm torn whether to add a new value for this purpose (perhaps
>> stored outside the control file) or allow setting of system identifier
>> via initdb. The first seems like a better idea, the file could double
>> as a place to store additional encryption parameters, like key length
>> or different cipher primitive.
>
> I feel separate file is better to include the key data instead of pg_control
> file.

I guess that would be more flexible. However I think at least the fact
that the database is encrypted should remain in the control file to
provide useful error messages for faulty backup procedures.

Thanks for your input.

Regards,
Ants Aasma


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: WIP: Data at rest encryption

Michael Paquier
On Sun, Jun 12, 2016 at 4:13 PM, Ants Aasma <[hidden email]> wrote:
>> I feel separate file is better to include the key data instead of pg_control
>> file.
>
> I guess that would be more flexible. However I think at least the fact
> that the database is encrypted should remain in the control file to
> provide useful error messages for faulty backup procedures.

Another possibility could be always to do some encryption at data-type
level for text data. For example I recalled the following thing while
going through this thread:
https://github.com/nec-postgres/tdeforpg
Though I don't quite understand the use for encrypt.enable in this
code... This has the advantage to not patch upstream.
--
Michael


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: WIP: Data at rest encryption

Ants Aasma
On Mon, Jun 13, 2016 at 5:17 AM, Michael Paquier
<[hidden email]> wrote:

> On Sun, Jun 12, 2016 at 4:13 PM, Ants Aasma <[hidden email]> wrote:
>>> I feel separate file is better to include the key data instead of pg_control
>>> file.
>>
>> I guess that would be more flexible. However I think at least the fact
>> that the database is encrypted should remain in the control file to
>> provide useful error messages for faulty backup procedures.
>
> Another possibility could be always to do some encryption at data-type
> level for text data. For example I recalled the following thing while
> going through this thread:
> https://github.com/nec-postgres/tdeforpg
> Though I don't quite understand the use for encrypt.enable in this
> code... This has the advantage to not patch upstream.

While certainly possible, this does not cover the requirements I want
to satisfy - user data never gets stored on disk unencrypted without
making changes to the application or schema. This seems to be mostly
about separating administrator roles, specifically that centralised
storage and backup administrators should not have access to database
contents. I see this as orthogonal to per column encryption, which in
my opinion is better done in the application.

Regards,
Ants Aasma


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: WIP: Data at rest encryption

Peter Eisentraut-6
In reply to this post by Ants Aasma-3
On 6/7/16 9:56 AM, Ants Aasma wrote:
> Similar things can be achieved with filesystem level encryption.
> However this is not always optimal for various reasons. One of the
> better reasons is the desire for HSM based encryption in a storage
> area network based setup.

Could you explain this in more detail?

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: WIP: Data at rest encryption

Peter Eisentraut-6
In reply to this post by Ants Aasma-3
On 6/12/16 3:13 AM, Ants Aasma wrote:
>> 5. Instead of providing passphrase through environmental variable,
>> > better to provide some options to pg_ctl etc.
> That looks like it would be worse from a security perspective.
> Integrating a passphrase prompt would be an option, but a way for
> scripts to provide passphrases would still be needed.

Environment variables and command-line options are visible to other
processes on the machine, so neither of these approaches is really going
to work.  We would need some kind of integration with secure
password-entry mechanisms, such as pinentry.

Also note that all tools that work directly on the data directory would
need password-entry and encryption/decryption support, including
pg_basebackup, pg_controldata, pg_ctl, pg_receivexlog, pg_resetxlog,
pg_rewind, pg_upgrade, pg_xlogdump.

It seems that your implementation doesn't encrypt pg_control, thus
avoiding some of that.  But that doesn't seem right.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: WIP: Data at rest encryption

Haribabu Kommi-2
In reply to this post by Ants Aasma-3
On Sun, Jun 12, 2016 at 5:13 PM, Ants Aasma <[hidden email]> wrote:

> On Fri, Jun 10, 2016 at 5:23 AM, Haribabu Kommi
> <[hidden email]> wrote:
>
>> 2. Instead of depending on a contrib module for the encryption, how
>> about integrating pgcrypto contrib in to the core and add that as a
>> default encryption method. And also provide an option to the user
>> to use a different encryption methods if needs.
>
> Technically that would be simple enough, this is more of a policy
> decision. I think having builtin encryption provided by pgcrypto is
> completely fine. If a consensus emerges that it needs to be
> integrated, it would need to be a separate patch anyway.

In our proprietary database, we are using the encryption methods
provided by openSSL [1]. May be we can have a look at those
methods provided by openSSL for the use of encryption for builds
under USE_SSL. Ignore it if you have already validated.


>> 5. Instead of providing passphrase through environmental variable,
>> better to provide some options to pg_ctl etc.
>
> That looks like it would be worse from a security perspective.
> Integrating a passphrase prompt would be an option, but a way for
> scripts to provide passphrases would still be needed.

What I felt was, if we store the passphrase in an environmental variable,
a person who is having an access to the system can get the details
and using that it may be possible to decrypt the data files.


[1] - https://www.openssl.org/docs/manmaster/crypto/EVP_EncryptInit.html


Regards,
Hari Babu
Fujitsu Australia


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: WIP: Data at rest encryption

Jim Nasby-5
In reply to this post by Ants Aasma-3
On 6/12/16 2:13 AM, Ants Aasma wrote:

> On Fri, Jun 10, 2016 at 5:23 AM, Haribabu Kommi
> <[hidden email]> wrote:
>> > 1. Instead of doing the entire database files encryption, how about
>> > providing user an option to protect only some particular tables that
>> > wants the encryption at table/tablespace level. This not only provides
>> > an option to the user, it reduces the performance impact on tables
>> > that doesn't need any encryption. The problem with this approach
>> > is that every xlog record needs to validate to handle the encryption
>> > /decryption, instead of at page level.
> Is there a real need for this? The customers I have talked to want to
> encrypt the whole database and my goal is to make the feature fast
> enough to make that feasible for pretty much everyone. I guess
> switching encryption off per table would be feasible, but the key
> setup would still need to be done at server startup. Per record
> encryption would result in some additional information leakage though.
> Overall I thought it would not be worth it, but I'm willing to have my
> mind changed on this.

I actually design with this in mind. Tables that contain sensitive info
go into designated schemas, partly so that you can blanket move all of
those to an encrypted tablespace (or safer would be to move things not
in those schemas to an unencrypted tablespace). Since that can be done
with an encrypted filesystem maybe that's good enough. (It's not really
clear to me what this buys us over an encrypted FS, other than a feature
comparison checkmark...)
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: WIP: Data at rest encryption

PostgreSQL - Hans-Jürgen Schönig


On 06/14/2016 09:59 PM, Jim Nasby wrote:

> On 6/12/16 2:13 AM, Ants Aasma wrote:
>> On Fri, Jun 10, 2016 at 5:23 AM, Haribabu Kommi
>> <[hidden email]> wrote:
>>> > 1. Instead of doing the entire database files encryption, how about
>>> > providing user an option to protect only some particular tables that
>>> > wants the encryption at table/tablespace level. This not only
>>> provides
>>> > an option to the user, it reduces the performance impact on tables
>>> > that doesn't need any encryption. The problem with this approach
>>> > is that every xlog record needs to validate to handle the encryption
>>> > /decryption, instead of at page level.
>> Is there a real need for this? The customers I have talked to want to
>> encrypt the whole database and my goal is to make the feature fast
>> enough to make that feasible for pretty much everyone. I guess
>> switching encryption off per table would be feasible, but the key
>> setup would still need to be done at server startup. Per record
>> encryption would result in some additional information leakage though.
>> Overall I thought it would not be worth it, but I'm willing to have my
>> mind changed on this.
>
> I actually design with this in mind. Tables that contain sensitive
> info go into designated schemas, partly so that you can blanket move
> all of those to an encrypted tablespace (or safer would be to move
> things not in those schemas to an unencrypted tablespace). Since that
> can be done with an encrypted filesystem maybe that's good enough.
> (It's not really clear to me what this buys us over an encrypted FS,
> other than a feature comparison checkmark...)

the reason why this is needed is actually very simple: security
guidelines and legal requirements ...
we have dealt with a couple of companies recently, who explicitly
demanded PostgreSQL level encryption in a transparent way to fulfill
some internal or legal requirements. this is especially true for
financial stuff. and yes, sure ... you can do a lot of stuff with
filesystem encryption.
the core idea of this entire thing is however to have a counterpart on
the database level. if you don't have the key you cannot start the
instance and if you happen to get access to the filesystem you are still
not able to fire up the DB.
as it said: requirements by ever bigger companies.

as far as benchmarking is concerned: i did a quick test yesterday (not
with the final AES implementation yet) and i got pretty good results.
with a reasonably well cached database in a typical application I expect
to loose around 10-20%. if everything fits in memory there is 0 loss of
course. the worst I got with the standard AES (no hardware support used
yet) I lost around 45% or so. but this requires a value as low as 32 MB
of shared buffers or so.

     many thanks,

         hans


--
Hans-Jürgen Schönig
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at



--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: WIP: Data at rest encryption

Robert Haas
In reply to this post by Peter Eisentraut-6
On Mon, Jun 13, 2016 at 11:07 AM, Peter Eisentraut
<[hidden email]> wrote:
> On 6/7/16 9:56 AM, Ants Aasma wrote:
>>
>> Similar things can be achieved with filesystem level encryption.
>> However this is not always optimal for various reasons. One of the
>> better reasons is the desire for HSM based encryption in a storage
>> area network based setup.
>
> Could you explain this in more detail?

I don't think Ants ever responded to this point.

I'm curious whether this is something that is likely to be pursued for
PostgreSQL 11.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: WIP: Data at rest encryption

Ants Aasma
On Mon, Jun 12, 2017 at 10:38 PM, Robert Haas <[hidden email]> wrote:

> On Mon, Jun 13, 2016 at 11:07 AM, Peter Eisentraut
> <[hidden email]> wrote:
>> On 6/7/16 9:56 AM, Ants Aasma wrote:
>>>
>>> Similar things can be achieved with filesystem level encryption.
>>> However this is not always optimal for various reasons. One of the
>>> better reasons is the desire for HSM based encryption in a storage
>>> area network based setup.
>>
>> Could you explain this in more detail?
>
> I don't think Ants ever responded to this point.
>
> I'm curious whether this is something that is likely to be pursued for
> PostgreSQL 11.

Yes, the plan is to pick it up again, Real Soon Now(tm). There are a
couple of loose ends for stuff that should be encrypted, but in the
current state of the patch aren't yet (from the top of my head,
logical decoding and pg_stat_statements write some files). The code
handling keys could really take better precautions as Peter pointed
out in another e-mail. And I expect there to be a bunch of polishing
work to make the APIs as good as they can be.

To answer Peter's question about HSMs, many enterprise deployments are
on top of shared storage systems. For regulatory reasons or to limit
security clearance of storage administrators, the data on shared
storage should be encrypted. Now for there to be any point to this
endeavor, the key needs to be stored somewhere else. This is where
hardware security modules come in. They are basically hardware key
storage appliances that can either output the key when requested, or
for higher security hold onto the key and perform
encryption/decryption on behalf of the user. The patch enables the
user to use a custom shell command to go and fetch the key from the
HSM, for example using the KMIP protocol. Or a motivated person could
write an extension that implements the encryption hooks to delegate
encryption/decryption of blocks to an HSM.

Fundamentally there doesn't seem to be a big benefit of implementing
the encryption at PostgreSQL level instead of the filesystem. The
patch doesn't take any real advantage from the higher level knowledge
of the system, nor do I see much possibility for it to do that. The
main benefit for us is that it's much easier to get a PostgreSQL based
solution deployed.

I'm curious if the community thinks this is a feature worth having?
Even considering that security experts would classify this kind of
encryption as a checkbox feature.

Regards,
Ants Aasma


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: WIP: Data at rest encryption

Peter Eisentraut-6
On 6/12/17 17:11, Ants Aasma wrote:
> I'm curious if the community thinks this is a feature worth having?
> Even considering that security experts would classify this kind of
> encryption as a checkbox feature.

File system encryption already exists and is well-tested.  I don't see
any big advantages in re-implementing all of this one level up.  You
would have to touch every single place in PostgreSQL backend and tool
code where a file is being read or written.  Yikes.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: WIP: Data at rest encryption

Stephen Frost
In reply to this post by Ants Aasma
Ants, all,

* Ants Aasma ([hidden email]) wrote:
> Yes, the plan is to pick it up again, Real Soon Now(tm). There are a
> couple of loose ends for stuff that should be encrypted, but in the
> current state of the patch aren't yet (from the top of my head,
> logical decoding and pg_stat_statements write some files). The code
> handling keys could really take better precautions as Peter pointed
> out in another e-mail. And I expect there to be a bunch of polishing
> work to make the APIs as good as they can be.

Very glad to hear that you're going to be continuing to work on this
effort.

> To answer Peter's question about HSMs, many enterprise deployments are
> on top of shared storage systems. For regulatory reasons or to limit
> security clearance of storage administrators, the data on shared
> storage should be encrypted. Now for there to be any point to this
> endeavor, the key needs to be stored somewhere else. This is where
> hardware security modules come in. They are basically hardware key
> storage appliances that can either output the key when requested, or
> for higher security hold onto the key and perform
> encryption/decryption on behalf of the user. The patch enables the
> user to use a custom shell command to go and fetch the key from the
> HSM, for example using the KMIP protocol. Or a motivated person could
> write an extension that implements the encryption hooks to delegate
> encryption/decryption of blocks to an HSM.
An extension, or perhaps even something built-in, would certainly be
good here but I don't think it's necessary in an initial implementation
as long as it's something we can do later.

> Fundamentally there doesn't seem to be a big benefit of implementing
> the encryption at PostgreSQL level instead of the filesystem. The
> patch doesn't take any real advantage from the higher level knowledge
> of the system, nor do I see much possibility for it to do that. The
> main benefit for us is that it's much easier to get a PostgreSQL based
> solution deployed.

Making it easier to get a PostgreSQL solution deployed is certainly a
very worthwhile goal.

> I'm curious if the community thinks this is a feature worth having?
> Even considering that security experts would classify this kind of
> encryption as a checkbox feature.

I would say that some security experts would consider it a 'checkbox'
feature, while others would say that it's actually a quite useful
capability for a database to have and isn't just for being able to check
a given box.  I tended to lean towards the 'checkbox' camp and
encouraged people to use filesystem encryption also, but there are
use-cases where it'd be really nice to be able to have PG doing the
encryption instead of the filesystem because then you can do things like
backup the database, copy it somewhere else directly, and then restore
it using the regular PG mechanisms, as long as you have access to the
key.  That's not something you can directly do with filesystem-level
encryption (unless you happen to be lucky enough to be able to use ZFS,
which can do exporting, or you can do a block-level exact copy to an
exactly identically sized partition on the remote side or similar..),
and while you could encrypt the PG files during the backup, that
requires that you make sure both sides agree on how that encryption is
done and have the same tools for performing the encryption/decryption.
Possible, certainly, but not nearly as convenient.

+1 for having this capability.

Thanks!

Stephen

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: WIP: Data at rest encryption

Peter Eisentraut-6
On 6/13/17 09:24, Stephen Frost wrote:
> but there are
> use-cases where it'd be really nice to be able to have PG doing the
> encryption instead of the filesystem because then you can do things like
> backup the database, copy it somewhere else directly, and then restore
> it using the regular PG mechanisms, as long as you have access to the
> key.  That's not something you can directly do with filesystem-level
> encryption

Interesting point.

I wonder what the proper extent of "encryption at rest" should be.  If
you encrypt just on a file or block level, then someone looking at the
data directory or a backup can still learn a number of things about the
number of tables, transaction rates, various configuration settings, and
so on.  In the scenario of a sensitive application hosted on a shared
SAN, I don't think that is good enough.

Also, in the use case you describe, if you use pg_basebackup to make a
direct encrypted copy of a data directory, I think that would mean you'd
have to keep using the same key for all copies.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: WIP: Data at rest encryption

Stephen Frost
Peter,

* Peter Eisentraut ([hidden email]) wrote:
> I wonder what the proper extent of "encryption at rest" should be.  If
> you encrypt just on a file or block level, then someone looking at the
> data directory or a backup can still learn a number of things about the
> number of tables, transaction rates, various configuration settings, and
> so on.  In the scenario of a sensitive application hosted on a shared
> SAN, I don't think that is good enough.

If someone has access to the SAN, it'd be very difficult to avoid
revealing some information about transaction rates or I/O throughput.
Being able to have the configuration files encrypted would be good
(thinking particularly about pg_hba.conf/pg_ident.conf) but I don't
know that it's strictly necessary or that it would have to be done in an
initial version.

Certainly, there is a trade-off here when it comes to the information
which someone can learn about the system by looking at the number and
sizes of files from using PG-based encryption vs. what information
someone can learn from being able to look at only an encrypted
filesystem, but that's a trade-off which security experts are good at
making a determination on and will be case-by-case, based on how easy
setting up filesystem-encryption is in a particular environment and what
the use-cases are for the system.

> Also, in the use case you describe, if you use pg_basebackup to make a
> direct encrypted copy of a data directory, I think that would mean you'd
> have to keep using the same key for all copies.

That's true, but that might be acceptable and possibly even desirable in
certain cases.  On the other hand, it would certainly be a useful
feature to have a way to migrate from one key to another.  Perhaps that
would start out as an off-line tool, but maybe we'd be able to work out
a way to support having it done on-line in the future (certainly
non-trivial, but if we supported multiple keys concurrently with a
preference for which key is used to write data back out, and required
that checksums be in place to allow us to test if decrypting with a
specific key worked ... lots more hand-waving here... ).

Thanks!

Stephen

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: WIP: Data at rest encryption

Robert Haas
In reply to this post by Ants Aasma
On Mon, Jun 12, 2017 at 5:11 PM, Ants Aasma <[hidden email]> wrote:
> Fundamentally there doesn't seem to be a big benefit of implementing
> the encryption at PostgreSQL level instead of the filesystem. The
> patch doesn't take any real advantage from the higher level knowledge
> of the system, nor do I see much possibility for it to do that. The
> main benefit for us is that it's much easier to get a PostgreSQL based
> solution deployed.

I agree with all of that, but ease of deployment has some value unto
itself.  I think pretty much every modern operating system has some
way of encrypting a filesystem, but it's different on Linux vs.
Windows vs. macOS vs. BSD, and you probably need to be the system
administrator on any of those systems in order to set it up.
Something built into PostgreSQL could run without administrator
privileges and work the same way on every platform we support.  That
would be useful.

Of course, what would be even more useful is fine-grained encryption -
encrypt these tables (and the corresponding indexes, toast tables, and
WAL records related to any of that) with this key, encrypt these other
tables (and the same list of associated stuff) with this other key,
and leave the rest unencrypted.  The problem with that is that you
probably can't run recovery without all of the keys, and even on a
clean startup there would be a good deal of engineering work involved
in refusing access to tables whose key hadn't been provided yet.  I
don't think we should wait to have this feature until all of those
problems are solved.  In my opinion, something coarse-grained that
just encrypts the whole cluster would be a pretty useful place to
start and would meet the needs of enough people to be worthwhile all
on its own.  Performance is likely to be poor on large databases,
because every time a page transits between shared_buffers and the
buffer cache we've got to en/decrypt, but as long as it's only poor
for the people who opt into the feature I don't see a big problem with
that.

I anticipate that one of the trickier problems here will be handling
encryption of the write-ahead log.  Suppose you encrypt WAL a block at
a time.  In the current system, once you've written and flushed a
block, you can consider it durably committed, but if that block is
encrypted, this is no longer true.  A crash might tear the block,
making it impossible to decrypt.  Replay will therefore stop at the
end of the previous block, not at the last record actually flushed as
would happen today.  So, your synchronous_commit suddenly isn't.  A
similar problem will occur any other page where we choose not to
protect against torn pages using full page writes.  For instance,
unless checksums are enabled or wal_log_hints=on, we'll write a data
page where a single bit has been flipped and assume that the bit will
either make it to disk or not; the page can't really be torn in any
way that hurts us.  But with encryption that's no longer true, because
the hint bit will turn into much more than a single bit flip, and
rereading that page with half old and half new contents will be the
end of the world (TM).  I don't know off-hand whether we're
protecting, say, CLOG page writes with FPWs.: because setting a couple
of bits is idempotent and doesn't depend on the existing page
contents, we might not need it currently, but with encryption, every
bit in the page depends on every other bit in the page, so we
certainly would.  I don't know how many places we've got assumptions
like this baked into the system, but I'm guessing there are a bunch.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: WIP: Data at rest encryption

Bruce Momjian
On Tue, Jun 13, 2017 at 11:35:03AM -0400, Robert Haas wrote:

> I anticipate that one of the trickier problems here will be handling
> encryption of the write-ahead log.  Suppose you encrypt WAL a block at
> a time.  In the current system, once you've written and flushed a
> block, you can consider it durably committed, but if that block is
> encrypted, this is no longer true.  A crash might tear the block,
> making it impossible to decrypt.  Replay will therefore stop at the
> end of the previous block, not at the last record actually flushed as
> would happen today.  So, your synchronous_commit suddenly isn't.  A
> similar problem will occur any other page where we choose not to
> protect against torn pages using full page writes.  For instance,
> unless checksums are enabled or wal_log_hints=on, we'll write a data
> page where a single bit has been flipped and assume that the bit will
> either make it to disk or not; the page can't really be torn in any
> way that hurts us.  But with encryption that's no longer true, because
> the hint bit will turn into much more than a single bit flip, and
> rereading that page with half old and half new contents will be the
> end of the world (TM).  I don't know off-hand whether we're
> protecting, say, CLOG page writes with FPWs.: because setting a couple
> of bits is idempotent and doesn't depend on the existing page
> contents, we might not need it currently, but with encryption, every
> bit in the page depends on every other bit in the page, so we
> certainly would.  I don't know how many places we've got assumptions
> like this baked into the system, but I'm guessing there are a bunch.

That is not necessary true.  You are describing a cipher mode where the
user data goes through the cipher, e.g. AES in CBC mode.  However, if
you are using a stream cipher based on a block cipher, e.g. CTR, GCM,
you XOR the user data with a random bit stream, and in that case one bit
change in user data would be one bit change in the cipher output.

--
  Bruce Momjian  <[hidden email]>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: WIP: Data at rest encryption

Bruce Momjian
In reply to this post by Stephen Frost
On Tue, Jun 13, 2017 at 11:04:21AM -0400, Stephen Frost wrote:

> > Also, in the use case you describe, if you use pg_basebackup to make a
> > direct encrypted copy of a data directory, I think that would mean you'd
> > have to keep using the same key for all copies.
>
> That's true, but that might be acceptable and possibly even desirable in
> certain cases.  On the other hand, it would certainly be a useful
> feature to have a way to migrate from one key to another.  Perhaps that
> would start out as an off-line tool, but maybe we'd be able to work out
> a way to support having it done on-line in the future (certainly
> non-trivial, but if we supported multiple keys concurrently with a
> preference for which key is used to write data back out, and required
> that checksums be in place to allow us to test if decrypting with a
> specific key worked ... lots more hand-waving here... ).

As I understand it, having encryption in the database means the key is
stored in the database, while having encryption in the file system means
the key is stored in the operating system somewhere.  Of course, if the
key stored in the database is visible to someone using the operating
system, we really haven't added much/any security --- I guess my point
is that the OS easily can hide the key from the database, but the
database can't easily hide the key from the operating system.

Of course, if the storage is split from the database server then having
the key on the database server seems like a win.  However, I think a db
server could easily encrypt blocks before sending them to the SAN
server.  This would not work for NAS, of course, since it is file-based.

I have to admit we tend to avoid heavy-API solutions that are designed
just to work around deployment challenges.  Commercial databases are
fine in doing that, but it leads to very complex products.

I think the larger issue is where to store the key.  I would love for us
to come up with a unified solution to that and then build encryption on
that, including all-cluster encryption.

One cool idea I have is using public encryption to store the encryption
key by users who don't know the decryption key, e.g. RSA.  It would be a
write-only encryption option.  Not sure how useful that is, but it
easily possible, and doesn't require us to keep the _encryption_ key
secret, just the decryption one.

--
  Bruce Momjian  <[hidden email]>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: WIP: Data at rest encryption

Stephen Frost
Bruce,

* Bruce Momjian ([hidden email]) wrote:

> On Tue, Jun 13, 2017 at 11:04:21AM -0400, Stephen Frost wrote:
> > > Also, in the use case you describe, if you use pg_basebackup to make a
> > > direct encrypted copy of a data directory, I think that would mean you'd
> > > have to keep using the same key for all copies.
> >
> > That's true, but that might be acceptable and possibly even desirable in
> > certain cases.  On the other hand, it would certainly be a useful
> > feature to have a way to migrate from one key to another.  Perhaps that
> > would start out as an off-line tool, but maybe we'd be able to work out
> > a way to support having it done on-line in the future (certainly
> > non-trivial, but if we supported multiple keys concurrently with a
> > preference for which key is used to write data back out, and required
> > that checksums be in place to allow us to test if decrypting with a
> > specific key worked ... lots more hand-waving here... ).
>
> As I understand it, having encryption in the database means the key is
> stored in the database, while having encryption in the file system means
> the key is stored in the operating system somewhere.  
Key management is an entirely independent discussion from this and the
proposal from Ants, as I understand it, is that the key would *not* be
in the database but could be anywhere that a shell command could get it
from, including possibly a HSM (hardware device).

Having the data encrypted by PostgreSQL does not mean the key is stored
in the database.

> Of course, if the
> key stored in the database is visible to someone using the operating
> system, we really haven't added much/any security --- I guess my point
> is that the OS easily can hide the key from the database, but the
> database can't easily hide the key from the operating system.

This is correct- the key must be available to the PostgreSQL process
and therefore someone with privileged access to the OS would be able to
retrieve the key, but that's also true of filesystem encryption.

Basically, if the server is doing the encryption and you have the
ability to read all memory on the server then you can get the key.  Of
course, if you can read all memory then you can just look at shared
buffers and you don't really need to bother yourself with the key or
the encryption, and it doesn't make any difference if you're encrypting
in the database or in the filesystem.  That attack vector is not one
which this is intending to address.

> Of course, if the storage is split from the database server then having
> the key on the database server seems like a win.  However, I think a db
> server could easily encrypt blocks before sending them to the SAN
> server.  This would not work for NAS, of course, since it is file-based.

The key doesn't necessairly have to be stored anywhere on the server- it
just needs to be kept in memory while the database process is running
and made available to the database at startup, unless an external system
is used to perform the encryption, which might be possible with an
extension, as discussed.  In some environments, it might be acceptable
to have the key stored on the database server, of course, but there's no
requirement for the key to be stored on the database server or in the
database at all.

> I have to admit we tend to avoid heavy-API solutions that are designed
> just to work around deployment challenges.  Commercial databases are
> fine in doing that, but it leads to very complex products.

I'm not following what you mean here.

> I think the larger issue is where to store the key.  I would love for us
> to come up with a unified solution to that and then build encryption on
> that, including all-cluster encryption.

Honestly, key management is something that I'd rather we *not* worry
about in an initial implementation, which is one reason that I liked the
approach discussed here of having a command which runs to provide the
key.  We could certainly look into improving that in the future, but key
management really is a largely independent issue from encryption and
it's much more difficult and complicated and whatever we come up with
would still almost certainly be usable with the approach proposed here.

> One cool idea I have is using public encryption to store the encryption
> key by users who don't know the decryption key, e.g. RSA.  It would be a
> write-only encryption option.  Not sure how useful that is, but it
> easily possible, and doesn't require us to keep the _encryption_ key
> secret, just the decryption one.

The downside here is that asymmetric encryption is much more expensive
than symmetric encryption and that probably makes it a non-starter.  I
do think we'll want to support multiple encryption methods and perhaps
we can have an option where asymmetric encryption is used, but that's
not what I expect will be typically used.

Thanks!

Stephen

signature.asc (836 bytes) Download Attachment
12345