[Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

classic Classic list List threaded Threaded
62 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

[Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

Moon, Insung
Hello Hackers,

This propose a way to develop "Table-level" Transparent Data Encryption (TDE) and Key Management Service (KMS) support in
PostgreSQL.


Issues on data encryption of PostgreSQL
==========
Currently, in PostgreSQL, data encryption can be using pgcrypto Tool.
However, it is inconvenient to use pgcrypto to encrypts data in some cases.

There are two significant inconveniences.

First, if we use pgcrypto to encrypt/decrypt data, we must call pgcrypto functions everywhere we encrypt/decrypt.
Second, we must modify application program code much if we want to do database migration to PostgreSQL from other databases that is
using TDE.

To resolved these inconveniences, many users want to support TDE.
There have also been a few proposals, comments, and questions to support TDE in the PostgreSQL community.

However, currently PostgreSQL does not support TDE, so in development community, there are discussions whether it's necessary to
support TDE or not.

In these discussions, there were requirements necessary to support TDE in PostgreSQL.

1) The performance overhead of encryption and decryption database data must be minimized
2) Need to support WAL encryption.
3) Need to support Key Management Service.

Therefore, I'd like to propose the new design of TDE that deals with both above requirements.
Since this feature will become very large, I'd like to hear opinions from community before starting making the patch.

First, my proposal is table-level TDE which is that user can specify tables begin encrypted.
Indexes, TOAST table and WAL associated with the table that enables TDE are also encrypted.

Moreover, I want to support encryption for large object as well.
But I haven't found a good way for it so far. So I'd like to remain it as future TODO.

My proposal has five characteristics features of "table-level TDE".

1) Buffer-level data encryption and decryption
2) Per-table encryption
3) 2-tier encryption key management
4) Working with external key management services(KMS)
5) WAL encryption

Here are more details for each items.


1. Buffer-level data encryption and decryption
==================
Transparent data encryption and decryption accompany by storage operation
With ordinally way like using pgcrypto, the biggest problem with encrypted data is the performance overhead of decrypting the data
each time the run to queries.

My proposal is to encrypt and decrypt data when performing DISK I/O operation to minimize performance overhead.
Therefore, the data in the shared memory layer is unencrypted so that performance overhead can minimize.

With this design, data encryption/decryption implementations can be developed by modifying the codes of the storage and buffer
manager modules,
which are responsible for performing DISK I/O operation.


2. Per-table encryption
==================
User can enable TDE per table as they want.
I introduce new storage parameter "encryption_enabled" which enables TDE at table-level.

    // Generate  the encryption table
       CREATE TABLE foo WITH ( ENCRYPTION_ENABLED = ON );

    // Change to the non-encryption table
       ALTER TABLE foo SET ( ENCRYPTION_ENABLED = OFF );

This approach minimizes the overhead for tables that do not require encryption options.
For tables that enable TDE, the corresponding table key will be generated with random values, and it's stored into the new system
catalog after being encrypted by the master key.

BTW, I want to support CBC mode encryption[3]. However, I'm not sure how to use the IV in CBC mode for this proposal.
I'd like to hear opinions by security engineer.


3. 2-tier encryption key management
==================
when it comes time to change cryptographic keys, there is a performance overhead to decryption and re-encryption to all data.

To solve this problem we employee 2-tier encryption.
2-tier encryption is All table keys can be stored in the database cluster after being encrypted by the master key, And master keys
must be stored at external of PostgreSQL.

Therefore, without master key, it is impossible to decrypt the table key. Thus, It is impossible to decrypt the database data.

When changing the key, it's not necessary to re-encrypt for all data.
We use the new master key only to decrypt and re-encrypt the table key, these operations for minimizing the performance overhead.

For table keys, all TDE-enabled tables have different table keys.
And for master key, all database have different master keys. Table keys are encrypted by the master key of its own database.
For WAL encryption, we have another cryptographic key. WAL-key is also encrypted by a master key, but it is shared across the
database cluster.


4. Working with external key management services(KMS)
==================
A key management service is an integrated approach for generating, fetching and managing encryption keys for key control.
They may cover all aspects of security from the secure generation of keys, secure storing keys, and secure fetching keys up to
encryption key handling.
Also, various types of KMSs are provided by many companies, and users can choose them.

Therefore I would like to manage the master key using KMS.
Also, my proposal is to create callback APIs(generate_key, fetch_key, store_key) in the form of a plug-in so that users can use many
types of KMS as they want.

In KMIP protocol and most KMS manage keys by string IDs. We can get keys by key ID from KMS.
So in my proposal, all master keys are distinguished by its ID, called "master key ID".
The master key ID is made, for example, using the database oid and a sequence number, like <OID>_<SeqNo>. And they are managed in
PostgreSQL.
   
When database startup, all master key ID is loaded to shared memory, and they are protected by LWLock.

When it comes time to rotate the master keys, run this query.

        ALTER SYSTEM ROTATION MASTER KEY;

In this query, the master key is rotated with the following step.
1. Generate new master key,
2. Change master key IDs and emit corresponding WAL
3. Re-encrypt all table keys on its database

Also during checkpoint, master key IDs on shared memory become a permanent condition.


5. WAL encryption
==================
If we encrypt all WAL records, performance overhead can be significant.
Therefore, this proposes a method to encrypt only WAL record excluding WAL header when writing WAL on the WAL buffer, instead of
encrypting a whole WAL record.
WAL encryption key is generated separately when the TDE-enabled table is created the first time. We use 2-tier encryption for WAL
encryption as well.
So, when it comes time to rotate the WAL encryption key, run this query.

        ALTER SYSTEM ROTATION WAL KEY;

Next, I will explain how to encrypt WAL.

To do this operation, I add a flag to WAL header which indicates whether the subsequent WAL data is encrypted or not.

Then, when we write WAL for encryption table we write "encrypted" WAL on WAL buffer layer.

In recovery, we read WAL header and check the flag of encryption, and judges whether WAL must be decrypted.
In the case of PITR, we use WAL key ID in the backup file.

With this approach, the performance overhead of writing and reading the WAL for unencrypted tables would be almost the same as
before.


==================
I'd like to discuss the design before starting making any change of code.
After a more discussion I want to make a PoC.
Feedback and suggestion are very welcome.

Finally, thank you initial design input for Masahiko Sawada.

Thank you.

[1] What does TDE mean?
    > https://en.wikipedia.org/wiki/Transparent_Data_Encryption

[2] What does KMS mean?
    > https://en.wikipedia.org/wiki/Key_management#Key_Management_System

[3] What does CBC-Mode mean?
    > https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation
   
[4] Recently discussed mail
    https://www.postgresql.org/message-id/CA%2BCSw_tb3bk5i7if6inZFc3yyf%2B9HEVNTy51QFBoeUk7UE_V%3Dw%40mail.gmail.com


Regards.
Moon.
----------------------------------------
Moon, Insung
NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
----------------------------------------




Reply | Threaded
Open this post in threaded view
|

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

Antonin Houska-2
Moon, Insung <[hidden email]> wrote:

This patch seems to implement some of the features you propose, especially
encryption of buffers and WAL. I recommend you to check so that no effort is
duplicated:

> [4] Recently discussed mail
>     https://www.postgresql.org/message-id/CA%2BCSw_tb3bk5i7if6inZFc3yyf%2B9HEVNTy51QFBoeUk7UE_V%3Dw%40mail.gmail.com



--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26, A-2700 Wiener Neustadt
Web: https://www.cybertec-postgresql.com

Reply | Threaded
Open this post in threaded view
|

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

Aleksander Alekseeev
In reply to this post by Moon, Insung
Hello Moon,

I promised to email links to the articles I mentioned during your talk
on the PGCon Unconference to this thread. Here they are:

* http://cryptowiki.net/index.php?title=Order-preserving_encryption
* https://en.wikipedia.org/wiki/Homomorphic_encryption

Also I realized that I was wrong regarding encryption of the indexes
since they will be encrypted on the block level the same way the heap
will be.

--
Best regards,
Aleksander Alekseev

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

Masahiko Sawada
In reply to this post by Moon, Insung
On Fri, May 25, 2018 at 8:41 PM, Moon, Insung
<[hidden email]> wrote:

> Hello Hackers,
>
> This propose a way to develop "Table-level" Transparent Data Encryption (TDE) and Key Management Service (KMS) support in
> PostgreSQL.
>
>
> Issues on data encryption of PostgreSQL
> ==========
> Currently, in PostgreSQL, data encryption can be using pgcrypto Tool.
> However, it is inconvenient to use pgcrypto to encrypts data in some cases.
>
> There are two significant inconveniences.
>
> First, if we use pgcrypto to encrypt/decrypt data, we must call pgcrypto functions everywhere we encrypt/decrypt.
> Second, we must modify application program code much if we want to do database migration to PostgreSQL from other databases that is
> using TDE.
>
> To resolved these inconveniences, many users want to support TDE.
> There have also been a few proposals, comments, and questions to support TDE in the PostgreSQL community.
>
> However, currently PostgreSQL does not support TDE, so in development community, there are discussions whether it's necessary to
> support TDE or not.
>
> In these discussions, there were requirements necessary to support TDE in PostgreSQL.
>
> 1) The performance overhead of encryption and decryption database data must be minimized
> 2) Need to support WAL encryption.
> 3) Need to support Key Management Service.
>
> Therefore, I'd like to propose the new design of TDE that deals with both above requirements.
> Since this feature will become very large, I'd like to hear opinions from community before starting making the patch.
>
> First, my proposal is table-level TDE which is that user can specify tables begin encrypted.
> Indexes, TOAST table and WAL associated with the table that enables TDE are also encrypted.
>
> Moreover, I want to support encryption for large object as well.
> But I haven't found a good way for it so far. So I'd like to remain it as future TODO.
>
> My proposal has five characteristics features of "table-level TDE".
>
> 1) Buffer-level data encryption and decryption
> 2) Per-table encryption
> 3) 2-tier encryption key management
> 4) Working with external key management services(KMS)
> 5) WAL encryption
>
> Here are more details for each items.
>
>
> 1. Buffer-level data encryption and decryption
> ==================
> Transparent data encryption and decryption accompany by storage operation
> With ordinally way like using pgcrypto, the biggest problem with encrypted data is the performance overhead of decrypting the data
> each time the run to queries.
>
> My proposal is to encrypt and decrypt data when performing DISK I/O operation to minimize performance overhead.
> Therefore, the data in the shared memory layer is unencrypted so that performance overhead can minimize.
>
> With this design, data encryption/decryption implementations can be developed by modifying the codes of the storage and buffer
> manager modules,
> which are responsible for performing DISK I/O operation.
>
>
> 2. Per-table encryption
> ==================
> User can enable TDE per table as they want.
> I introduce new storage parameter "encryption_enabled" which enables TDE at table-level.
>
>     // Generate  the encryption table
>        CREATE TABLE foo WITH ( ENCRYPTION_ENABLED = ON );
>
>     // Change to the non-encryption table
>        ALTER TABLE foo SET ( ENCRYPTION_ENABLED = OFF );
>
> This approach minimizes the overhead for tables that do not require encryption options.
> For tables that enable TDE, the corresponding table key will be generated with random values, and it's stored into the new system
> catalog after being encrypted by the master key.
>
> BTW, I want to support CBC mode encryption[3]. However, I'm not sure how to use the IV in CBC mode for this proposal.
> I'd like to hear opinions by security engineer.
>
>
> 3. 2-tier encryption key management
> ==================
> when it comes time to change cryptographic keys, there is a performance overhead to decryption and re-encryption to all data.
>
> To solve this problem we employee 2-tier encryption.
> 2-tier encryption is All table keys can be stored in the database cluster after being encrypted by the master key, And master keys
> must be stored at external of PostgreSQL.
>
> Therefore, without master key, it is impossible to decrypt the table key. Thus, It is impossible to decrypt the database data.
>
> When changing the key, it's not necessary to re-encrypt for all data.
> We use the new master key only to decrypt and re-encrypt the table key, these operations for minimizing the performance overhead.
>
> For table keys, all TDE-enabled tables have different table keys.
> And for master key, all database have different master keys. Table keys are encrypted by the master key of its own database.
> For WAL encryption, we have another cryptographic key. WAL-key is also encrypted by a master key, but it is shared across the
> database cluster.
>
>
> 4. Working with external key management services(KMS)
> ==================
> A key management service is an integrated approach for generating, fetching and managing encryption keys for key control.
> They may cover all aspects of security from the secure generation of keys, secure storing keys, and secure fetching keys up to
> encryption key handling.
> Also, various types of KMSs are provided by many companies, and users can choose them.
>
> Therefore I would like to manage the master key using KMS.
> Also, my proposal is to create callback APIs(generate_key, fetch_key, store_key) in the form of a plug-in so that users can use many
> types of KMS as they want.
>
> In KMIP protocol and most KMS manage keys by string IDs. We can get keys by key ID from KMS.
> So in my proposal, all master keys are distinguished by its ID, called "master key ID".
> The master key ID is made, for example, using the database oid and a sequence number, like <OID>_<SeqNo>. And they are managed in
> PostgreSQL.
>
> When database startup, all master key ID is loaded to shared memory, and they are protected by LWLock.
>
> When it comes time to rotate the master keys, run this query.
>
>         ALTER SYSTEM ROTATION MASTER KEY;
>
> In this query, the master key is rotated with the following step.
> 1. Generate new master key,
> 2. Change master key IDs and emit corresponding WAL
> 3. Re-encrypt all table keys on its database
>
> Also during checkpoint, master key IDs on shared memory become a permanent condition.
>
>
> 5. WAL encryption
> ==================
> If we encrypt all WAL records, performance overhead can be significant.
> Therefore, this proposes a method to encrypt only WAL record excluding WAL header when writing WAL on the WAL buffer, instead of
> encrypting a whole WAL record.
> WAL encryption key is generated separately when the TDE-enabled table is created the first time. We use 2-tier encryption for WAL
> encryption as well.
> So, when it comes time to rotate the WAL encryption key, run this query.
>
>         ALTER SYSTEM ROTATION WAL KEY;
>
> Next, I will explain how to encrypt WAL.
>
> To do this operation, I add a flag to WAL header which indicates whether the subsequent WAL data is encrypted or not.
>
> Then, when we write WAL for encryption table we write "encrypted" WAL on WAL buffer layer.
>
> In recovery, we read WAL header and check the flag of encryption, and judges whether WAL must be decrypted.
> In the case of PITR, we use WAL key ID in the backup file.
>
> With this approach, the performance overhead of writing and reading the WAL for unencrypted tables would be almost the same as
> before.
>
>
> ==================
> I'd like to discuss the design before starting making any change of code.
> After a more discussion I want to make a PoC.
> Feedback and suggestion are very welcome.
>
> Finally, thank you initial design input for Masahiko Sawada.
>
> Thank you.
>
> [1] What does TDE mean?
>     > https://en.wikipedia.org/wiki/Transparent_Data_Encryption
>
> [2] What does KMS mean?
>     > https://en.wikipedia.org/wiki/Key_management#Key_Management_System
>
> [3] What does CBC-Mode mean?
>     > https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation
>
> [4] Recently discussed mail
>     https://www.postgresql.org/message-id/CA%2BCSw_tb3bk5i7if6inZFc3yyf%2B9HEVNTy51QFBoeUk7UE_V%3Dw%40mail.gmail.com
>
>

As per discussion at PGCon unconference, I think that firstly we need
to discuss what threats we want to defend database data against. If
user wants to defend against a threat that is malicious user who
logged in OS or database steals an important data on datbase this
design TDE would not help. Because such user can steal the data by
getting a memory dump or by SQL. That is of course differs depending
on system requirements or security compliance but what threats do you
want to defend database data against? and why?

Also, if I understand correctly, at unconference session there also
were two suggestions about the design other than the suggestion by
Alexander: implementing TDE at column level using POLICY, and
implementing TDE at table-space level. The former was suggested by Joe
but I'm not sure the detail of that suggestion. I'd love to hear the
deal of that suggestion. The latter was suggested by Tsunakawa-san.
Have you considered that?

You mentioned that encryption of temporary data for query processing
and large objects are still under the consideration. But other than
them you should consider the temporary data generated by other
subsystems such as reorderbuffer and transition table as well.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Reply | Threaded
Open this post in threaded view
|

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

Tomas Vondra-4
On 06/11/2018 11:22 AM, Masahiko Sawada wrote:

> On Fri, May 25, 2018 at 8:41 PM, Moon, Insung
> <[hidden email]> wrote:
>> Hello Hackers,
>>
>> This propose a way to develop "Table-level" Transparent Data
>> Encryption (TDE) and Key Management Service (KMS) support in
>> PostgreSQL.
>>
>> ...
>
> As per discussion at PGCon unconference, I think that firstly we
> need to discuss what threats we want to defend database data against.
> If user wants to defend against a threat that is malicious user who
> logged in OS or database steals an important data on datbase this
> design TDE would not help. Because such user can steal the data by
> getting a memory dump or by SQL. That is of course differs depending
> on system requirements or security compliance but what threats do
> you want to defend database data against? and why?
>

I do agree with this - a description of the threat model needs to be
part of the design discussion, otherwise it's not possible to compare it
to alternative solutions (e.g. full-disk encryption using LUKS or using
existing privilege controls and/or RLS).

TDE was proposed/discussed repeatedly in the past, and every time it
died exactly because it was not very clear which issue it was attempting
to solve.

Let me share some of the issues mentioned as possibly addressed by TDE
(I'm not entirely sure TDE actually solves them, I'm just saying those
were mentioned in previous discussions):

1) enterprise requirement - Companies want in-database encryption, for
various reasons (because "enterprise solution" or something).

2) like FDE, but OS/filesystem independent - Same config on any OS and
filesystem, which may make maintenance easier.

3) does not require special OS/filesystem setup - Does not require help
from system adminitrators, setup of LUKS devices or whatever.

4) all filesystem access (basebackups/rsync) is encrypted anyway

5) solves key management (the main challenge with pgcrypto)

6) allows encrypting only some of the data (tables, columns) to minimize
performance impact

IMHO it makes sense to have TDE even if it provides the same "security"
as disk-level encryption, assuming it's more convenient to setup/use
from the database.

> Also, if I understand correctly, at unconference session there also
> were two suggestions about the design other than the suggestion by
> Alexander: implementing TDE at column level using POLICY, and
> implementing TDE at table-space level. The former was suggested by
> Joe but I'm not sure the detail of that suggestion. I'd love to hear
> the deal of that suggestion. The latter was suggested by
> Tsunakawa-san. Have you considered that?
>
> You mentioned that encryption of temporary data for query processing
> and large objects are still under the consideration. But other than
> them you should consider the temporary data generated by other
> subsystems such as reorderbuffer and transition table as well.
>

The severity of those limitations is likely related to the threat model.
I don't think encrypting temporary data would be a big problem, assuming
you know which key to use.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

Tomas Vondra-4
In reply to this post by Moon, Insung
Hi,

On 05/25/2018 01:41 PM, Moon, Insung wrote:
> Hello Hackers,
>
> ...
>
> BTW, I want to support CBC mode encryption[3]. However, I'm not sure
> how to use the IV in CBC mode for this proposal. I'd like to hear
> opinions by security engineer.
>

I'm not a cryptographer either, but this is exactly where you need a
prior discussion about the threat models - there are a couple of
chaining modes, each with different weaknesses.

FWIW it may also matter if data_checksums are enabled, because that may
prevent malleability attacks affecting of the modes. Assuming active
attacker (with the ability to modify the data files) is part of the
threat model, of course.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

Joe Conway
In reply to this post by Masahiko Sawada
On 06/11/2018 05:22 AM, Masahiko Sawada wrote:
> As per discussion at PGCon unconference, I think that firstly we need
> to discuss what threats we want to defend database data against.

Exactly. While certainly there is demand for encryption for the sake of
"checking a box", different designs will defend against different
threats, and we should be clear on which ones we are trying to protect
against for any particular design.

> Also, if I understand correctly, at unconference session there also
> were two suggestions about the design other than the suggestion by
> Alexander: implementing TDE at column level using POLICY, and
> implementing TDE at table-space level. The former was suggested by Joe
> but I'm not sure the detail of that suggestion. I'd love to hear the
> deal of that suggestion.

The idea has not been extensively fleshed out yet, but the thought was
that we create column level POLICY, which would transparently apply some
kind of transform on input and/or output. The transforms would
presumably be expressions, which in turn could use functions (extension
or builtin) to do their work. That would allow encryption/decryption,
DLP (data loss prevention) schemes (masking, redacting), etc. to be
applied based on the policies.

This, in and of itself, would not address key management. There is
probably a separate need for some kind of built in key management --
perhaps a flexible way to integrate with external systems such as Vault
for example, or maybe something self contained, or perhaps both. Or
maybe key management is really tied into the separately discussed effort
to create SQL VARIABLEs somehow.

In any case certainly a lot of room for discussion.

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development

Reply | Threaded
Open this post in threaded view
|

RE: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

Tsunakawa, Takayuki
In reply to this post by Tomas Vondra-4
From: Tomas Vondra [mailto:[hidden email]]
> Let me share some of the issues mentioned as possibly addressed by TDE
> (I'm not entirely sure TDE actually solves them, I'm just saying those
> were mentioned in previous discussions):

FYI, our product provides TDE like Oracle and SQL Server, which enables encryption per tablespace.  Relations, WAL records and temporary files related to encrypted tablespace are encrypted.

http://www.fujitsu.com/global/products/software/middleware/opensource/postgres/

(I wonder why the web site doesn't offer the online manual... I've recognized we need to fix this situation.  Anyway, I guess the downloadable trial version includes the manual.)



> 1) enterprise requirement - Companies want in-database encryption, for
> various reasons (because "enterprise solution" or something).

To assist compliance with PCI DSS, HIPAA, etc.

> 2) like FDE, but OS/filesystem independent - Same config on any OS and
> filesystem, which may make maintenance easier.
>
> 3) does not require special OS/filesystem setup - Does not require help
> from system adminitrators, setup of LUKS devices or whatever.
>
> 4) all filesystem access (basebackups/rsync) is encrypted anyway
>
> 5) solves key management (the main challenge with pgcrypto)
>
> 6) allows encrypting only some of the data (tables, columns) to minimize
> performance impact

All yes.


> IMHO it makes sense to have TDE even if it provides the same "security"
> as disk-level encryption, assuming it's more convenient to setup/use
> from the database.

Agreed.


Regards
Takayuki Tsunakawa



Reply | Threaded
Open this post in threaded view
|

RE: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

Tsunakawa, Takayuki
In reply to this post by Tomas Vondra-4
> From: Tomas Vondra [mailto:[hidden email]]
> On 05/25/2018 01:41 PM, Moon, Insung wrote:
> > BTW, I want to support CBC mode encryption[3]. However, I'm not sure
> > how to use the IV in CBC mode for this proposal. I'd like to hear
> > opinions by security engineer.
> >
>
> I'm not a cryptographer either, but this is exactly where you need a
> prior discussion about the threat models - there are a couple of
> chaining modes, each with different weaknesses.
Our products uses XTS, which recent FDE software like BitLocker and TrueCrypt uses instead of CBC.

https://en.wikipedia.org/wiki/Disk_encryption_theory#XTS

"According to SP 800-38E, "In the absence of authentication or access control, XTS-AES provides more protection than the other approved confidentiality-only modes against unauthorized manipulation of the encrypted data.""



> FWIW it may also matter if data_checksums are enabled, because that may
> prevent malleability attacks affecting of the modes. Assuming active
> attacker (with the ability to modify the data files) is part of the
> threat model, of course.

Encrypt the page after embedding its checksum value.  If a malicious attacker modifies a page on disk, then the decrypted page would be corrupt anyway, which can be detected by checksum.


Regards
Takayuki Tsunakawa


Reply | Threaded
Open this post in threaded view
|

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

Masahiko Sawada
In reply to this post by Tomas Vondra-4
On Wed, Jun 13, 2018 at 10:03 PM, Tomas Vondra
<[hidden email]> wrote:

> On 06/11/2018 11:22 AM, Masahiko Sawada wrote:
>>
>> On Fri, May 25, 2018 at 8:41 PM, Moon, Insung
>> <[hidden email]> wrote:
>>>
>>> Hello Hackers,
>>>
>>> This propose a way to develop "Table-level" Transparent Data Encryption
>>> (TDE) and Key Management Service (KMS) support in PostgreSQL.
>>>
>>> ...
>>
>>
>> As per discussion at PGCon unconference, I think that firstly we
>> need to discuss what threats we want to defend database data against.
>> If user wants to defend against a threat that is malicious user who logged
>> in OS or database steals an important data on datbase this design TDE would
>> not help. Because such user can steal the data by getting a memory dump or
>> by SQL. That is of course differs depending on system requirements or
>> security compliance but what threats do
>> you want to defend database data against? and why?
>>
>
> I do agree with this - a description of the threat model needs to be part of
> the design discussion, otherwise it's not possible to compare it to
> alternative solutions (e.g. full-disk encryption using LUKS or using
> existing privilege controls and/or RLS).
>
> TDE was proposed/discussed repeatedly in the past, and every time it died
> exactly because it was not very clear which issue it was attempting to
> solve.
>
> Let me share some of the issues mentioned as possibly addressed by TDE (I'm
> not entirely sure TDE actually solves them, I'm just saying those were
> mentioned in previous discussions):

Thank you for sharing!

>
> 1) enterprise requirement - Companies want in-database encryption, for
> various reasons (because "enterprise solution" or something).

Yes, I'm often asked it by our customers especially for database
migration from DBMS that supports TDE in order to reduce costs of
migration.

>
> 2) like FDE, but OS/filesystem independent - Same config on any OS and
> filesystem, which may make maintenance easier.
>
> 3) does not require special OS/filesystem setup - Does not require help from
> system adminitrators, setup of LUKS devices or whatever.
>
> 4) all filesystem access (basebackups/rsync) is encrypted anyway
>
> 5) solves key management (the main challenge with pgcrypto)
>
> 6) allows encrypting only some of the data (tables, columns) to minimize
> performance impact
>
> IMHO it makes sense to have TDE even if it provides the same "security" as
> disk-level encryption, assuming it's more convenient to setup/use from the
> database.

Agreed.

>
>> Also, if I understand correctly, at unconference session there also were
>> two suggestions about the design other than the suggestion by Alexander:
>> implementing TDE at column level using POLICY, and implementing TDE at
>> table-space level. The former was suggested by
>> Joe but I'm not sure the detail of that suggestion. I'd love to hear
>> the deal of that suggestion. The latter was suggested by
>> Tsunakawa-san. Have you considered that?
>>
>> You mentioned that encryption of temporary data for query processing and
>> large objects are still under the consideration. But other than them you
>> should consider the temporary data generated by other subsystems such as
>> reorderbuffer and transition table as well.
>>
>
> The severity of those limitations is likely related to the threat model. I
> don't think encrypting temporary data would be a big problem, assuming you
> know which key to use.

Agreed. I thought the possibility of non-encrypted temporary data in
backups but since we don't include them in backups it would not be a
big problem.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Reply | Threaded
Open this post in threaded view
|

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

Masahiko Sawada
In reply to this post by Joe Conway
On Wed, Jun 13, 2018 at 10:20 PM, Joe Conway <[hidden email]> wrote:

> On 06/11/2018 05:22 AM, Masahiko Sawada wrote:
>> As per discussion at PGCon unconference, I think that firstly we need
>> to discuss what threats we want to defend database data against.
>
> Exactly. While certainly there is demand for encryption for the sake of
> "checking a box", different designs will defend against different
> threats, and we should be clear on which ones we are trying to protect
> against for any particular design.
>
>> Also, if I understand correctly, at unconference session there also
>> were two suggestions about the design other than the suggestion by
>> Alexander: implementing TDE at column level using POLICY, and
>> implementing TDE at table-space level. The former was suggested by Joe
>> but I'm not sure the detail of that suggestion. I'd love to hear the
>> deal of that suggestion.
>
> The idea has not been extensively fleshed out yet, but the thought was
> that we create column level POLICY, which would transparently apply some
> kind of transform on input and/or output. The transforms would
> presumably be expressions, which in turn could use functions (extension
> or builtin) to do their work. That would allow encryption/decryption,
> DLP (data loss prevention) schemes (masking, redacting), etc. to be
> applied based on the policies.

It seems good idea. Which does this design encrypt data on, buffer or
both buffer and disk? And does this design (per-column encryption) aim
to satisfy something specific security compliance?

> This, in and of itself, would not address key management. There is
> probably a separate need for some kind of built in key management --
> perhaps a flexible way to integrate with external systems such as Vault
> for example, or maybe something self contained, or perhaps both.

I agree to have a flexible way in order to address different
requirements. I thought that having a GUC parameter to which we store
a shell command to get encryption key is enough but considering
integration with various key managements seamlessly I think that we
need to have APIs for key managements. (fetching key, storing key,
generating key etc)

> Or
> maybe key management is really tied into the separately discussed effort
> to create SQL VARIABLEs somehow.
>

Could you elaborate on how key management is tied into SQL VARIABLEs?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Reply | Threaded
Open this post in threaded view
|

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

Joe Conway
On 06/14/2018 12:19 PM, Masahiko Sawada wrote:

> On Wed, Jun 13, 2018 at 10:20 PM, Joe Conway <[hidden email]> wrote:
>> The idea has not been extensively fleshed out yet, but the thought was
>> that we create column level POLICY, which would transparently apply some
>> kind of transform on input and/or output. The transforms would
>> presumably be expressions, which in turn could use functions (extension
>> or builtin) to do their work. That would allow encryption/decryption,
>> DLP (data loss prevention) schemes (masking, redacting), etc. to be
>> applied based on the policies.
>
> Which does this design encrypt data on, buffer or both buffer and
> disk?


The point of the design is simply to provide a mechanism for input and
output transformation, not to provide the transform function itself.

How you use that transformation would be entirely up to you, but if you
were providing an encryption transform on input the data would be
encrypted both buffer and disk.

> And does this design (per-column encryption) aim to satisfy something
> specific security compliance?


Again, entirely up to you and dependent on what type of transformation
you provide. If, for example you provided input encryption and output
decryption based on some in memory session variable key, that would be
essentially TDE and would satisfy several common sets of compliance
requirements.


>> This, in and of itself, would not address key management. There is
>> probably a separate need for some kind of built in key management --
>> perhaps a flexible way to integrate with external systems such as Vault
>> for example, or maybe something self contained, or perhaps both.
>
> I agree to have a flexible way in order to address different
> requirements. I thought that having a GUC parameter to which we store
> a shell command to get encryption key is enough but considering
> integration with various key managements seamlessly I think that we
> need to have APIs for key managements. (fetching key, storing key,
> generating key etc)


I don't like the idea of yet another path for arbitrary shell code
execution. An API for extension code would be preferable.


>> Or
>> maybe key management is really tied into the separately discussed effort
>> to create SQL VARIABLEs somehow.
>
> Could you elaborate on how key management is tied into SQL VARIABLEs?

Well, the key management probably is not, but the SQL VARIABLE might be
where the key is stored for use.

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development

Reply | Threaded
Open this post in threaded view
|

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

Robert Haas
In reply to this post by Joe Conway
On Wed, Jun 13, 2018 at 9:20 AM, Joe Conway <[hidden email]> wrote:

>> Also, if I understand correctly, at unconference session there also
>> were two suggestions about the design other than the suggestion by
>> Alexander: implementing TDE at column level using POLICY, and
>> implementing TDE at table-space level. The former was suggested by Joe
>> but I'm not sure the detail of that suggestion. I'd love to hear the
>> deal of that suggestion.
>
> The idea has not been extensively fleshed out yet, but the thought was
> that we create column level POLICY, which would transparently apply some
> kind of transform on input and/or output. The transforms would
> presumably be expressions, which in turn could use functions (extension
> or builtin) to do their work. That would allow encryption/decryption,
> DLP (data loss prevention) schemes (masking, redacting), etc. to be
> applied based on the policies.

It seems to me that column-level encryption is a lot less secure than
block-level encryption.  I am supposing here that the attack vector is
stealing the disk.  If all you've got is a bunch of 8192-byte blocks,
it's unlikely you can infer much about the contents.  You know the
size of the relations and that's probably about it.  If you've got
individual values being encrypted, then there's more latitude to
figure stuff out.  You can infer something about the length of
particular values.  Perhaps you can find cases where the same
encrypted value appears multiple times.  If there's a btree index, you
know the ordering of the values under whatever ordering semantics
apply to that index.  It's unclear to me how useful such information
would be in practice or to what extent it might allow you to attack
the underlying cryptography, but it seems like there might be cases
where the information leakage is significant.  For example, suppose
you're trying to determine which partially-encrypted record is that of
Aaron Aardvark... or this guy:
https://en.wikipedia.org/wiki/Hubert_Blaine_Wolfeschlegelsteinhausenbergerdorff,_Sr.

Recently, it was suggested to me that a use case for column-level
encryption might be to prevent casual DBA snooping.  So, you'd want
the data to appear in pg_dump output encrypted, because the DBA might
otherwise look at it, but you wouldn't really be concerned about the
threat of the DBA loading a hostile C module that would steal user
keys and use them to decrypt all the data, because they don't care
that much and would be fired if they were caught doing it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Reply | Threaded
Open this post in threaded view
|

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

Joe Conway
On 06/18/2018 09:49 AM, Robert Haas wrote:

> On Wed, Jun 13, 2018 at 9:20 AM, Joe Conway <[hidden email]> wrote:
>>> Also, if I understand correctly, at unconference session there also
>>> were two suggestions about the design other than the suggestion by
>>> Alexander: implementing TDE at column level using POLICY, and
>>> implementing TDE at table-space level. The former was suggested by Joe
>>> but I'm not sure the detail of that suggestion. I'd love to hear the
>>> deal of that suggestion.
>>
>> The idea has not been extensively fleshed out yet, but the thought was
>> that we create column level POLICY, which would transparently apply some
>> kind of transform on input and/or output. The transforms would
>> presumably be expressions, which in turn could use functions (extension
>> or builtin) to do their work. That would allow encryption/decryption,
>> DLP (data loss prevention) schemes (masking, redacting), etc. to be
>> applied based on the policies.
>
> It seems to me that column-level encryption is a lot less secure than
> block-level encryption.  I am supposing here that the attack vector is
> stealing the disk.  If all you've got is a bunch of 8192-byte blocks,
> it's unlikely you can infer much about the contents.  You know the
> size of the relations and that's probably about it.

Not necessarily. Our pages probably have enough predictable bytes to aid
cryptanalysis, compared to user data in a column which might not be very
predicable.


> If you've got individual values being encrypted, then there's more
> latitude to figure stuff out.  You can infer something about the
> length of particular values.  Perhaps you can find cases where the
> same encrypted value appears multiple times.

This completely depends on the encryption scheme you are using, and the
column level POLICY leaves that entirely up to you.

But in any case most encryption schemes use a random nonce (salt) to
ensure two identical strings do not encrypt to the same result. And
often the encrypted length is padded, so while you might be able to
infer short versus long, you would not usually be able to infer the
exact plaintext length.


> If there's a btree index, you know the ordering of the values under
> whatever ordering semantics apply to that index.  It's unclear to me
> how useful such information would be in practice or to what extent it
> might allow you to attack the underlying cryptography, but it seems
> like there might be cases where the information leakage is
> significant.  For example, suppose you're trying to determine which
> partially-encrypted record is that of Aaron Aardvark... or this guy:
> https://en.wikipedia.org/wiki/Hubert_Blaine_Wolfeschlegelsteinhausenbergerdorff,_Sr.
Again, this only applies if your POLICY uses this type of encryption,
i.e. homomorphic encryption. If you use strong encryption you will not
be indexing those columns at all, which is pretty commonly the case.

> Recently, it was suggested to me that a use case for column-level
> encryption might be to prevent casual DBA snooping.  So, you'd want
> the data to appear in pg_dump output encrypted, because the DBA might
> otherwise look at it, but you wouldn't really be concerned about the
> threat of the DBA loading a hostile C module that would steal user
> keys and use them to decrypt all the data, because they don't care
> that much and would be fired if they were caught doing it.

Again completely dependent on the extension you use to do the encryption
for the input policy. The keys don't need to be stored with the data,
and the decryption can be transparent only for certain users or if
certain session variables exist which the DBA does not have access to.

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development

Reply | Threaded
Open this post in threaded view
|

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

Robert Haas
On Mon, Jun 18, 2018 at 10:12 AM, Joe Conway <[hidden email]> wrote:
> Not necessarily. Our pages probably have enough predictable bytes to aid
> cryptanalysis, compared to user data in a column which might not be very
> predicable.

Really?  I would guess that the amount of entropy in a page is WAY
higher than in an individual column value.

> But in any case most encryption schemes use a random nonce (salt) to
> ensure two identical strings do not encrypt to the same result. And
> often the encrypted length is padded, so while you might be able to
> infer short versus long, you would not usually be able to infer the
> exact plaintext length.

Sure, that could be done, although it means that equality comparisons
must be done unencrypted.

> Again completely dependent on the extension you use to do the encryption
> for the input policy. The keys don't need to be stored with the data,
> and the decryption can be transparent only for certain users or if
> certain session variables exist which the DBA does not have access to.

Not arguing with that.  And to be clear, I'm not trying to attack your
proposal.  I'm just trying to have a discussion about advantages and
disadvantages of different approaches.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Reply | Threaded
Open this post in threaded view
|

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

Joe Conway
On 06/18/2018 10:26 AM, Robert Haas wrote:
> On Mon, Jun 18, 2018 at 10:12 AM, Joe Conway <[hidden email]> wrote:
>> Not necessarily. Our pages probably have enough predictable bytes to aid
>> cryptanalysis, compared to user data in a column which might not be very
>> predicable.
>
> Really?  I would guess that the amount of entropy in a page is WAY
> higher than in an individual column value.

It isn't about the entropy of the page overall, it is about the
predictability of specific bytes at specific locations on the pages. At
least as far as I understand it.

>> But in any case most encryption schemes use a random nonce (salt) to
>> ensure two identical strings do not encrypt to the same result. And
>> often the encrypted length is padded, so while you might be able to
>> infer short versus long, you would not usually be able to infer the
>> exact plaintext length.
>
> Sure, that could be done, although it means that equality comparisons
> must be done unencrypted.

Sure. Typically equality comparisons are done on other unencrypted
attributes. Or if you need to do equality on encrypted columns, you can
store non-reversible cryptographic hashes in a separate column.

>> Again completely dependent on the extension you use to do the encryption
>> for the input policy. The keys don't need to be stored with the data,
>> and the decryption can be transparent only for certain users or if
>> certain session variables exist which the DBA does not have access to.
>
> Not arguing with that.  And to be clear, I'm not trying to attack your
> proposal.  I'm just trying to have a discussion about advantages and
> disadvantages of different approaches.

Understood. Ultimately we might want both page-level encryption and
column level POLICY, as they are each useful for different use-cases.
Personally I believe the former is more generally useful than the
latter, but YMMV.

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development

Reply | Threaded
Open this post in threaded view
|

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

Tom Lane-2
In reply to this post by Robert Haas
Robert Haas <[hidden email]> writes:
> On Mon, Jun 18, 2018 at 10:12 AM, Joe Conway <[hidden email]> wrote:
>> Not necessarily. Our pages probably have enough predictable bytes to aid
>> cryptanalysis, compared to user data in a column which might not be very
>> predicable.

> Really?  I would guess that the amount of entropy in a page is WAY
> higher than in an individual column value.

Depending on the specifics of the encryption scheme, having some amount
of known (or guessable) plaintext may allow breaking the cipher, even
if much of the plaintext is not known.  This is cryptology 101, really.

At the same time, having to have a bunch of independently-decipherable
short field values is not real secure either, especially if they're known
to all be encrypted with the same key.  But what you know or can guess
about the plaintext in such cases would be target-specific, rather than
an attack that could be built once and used against any PG database.

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

Joe Conway
On 06/18/2018 10:52 AM, Tom Lane wrote:

> Robert Haas <[hidden email]> writes:
>> On Mon, Jun 18, 2018 at 10:12 AM, Joe Conway <[hidden email]> wrote:
>>> Not necessarily. Our pages probably have enough predictable bytes to aid
>>> cryptanalysis, compared to user data in a column which might not be very
>>> predicable.
>
>> Really?  I would guess that the amount of entropy in a page is WAY
>> higher than in an individual column value.
>
> Depending on the specifics of the encryption scheme, having some amount
> of known (or guessable) plaintext may allow breaking the cipher, even
> if much of the plaintext is not known.  This is cryptology 101, really.

Exactly

> At the same time, having to have a bunch of independently-decipherable
> short field values is not real secure either, especially if they're known
> to all be encrypted with the same key.  But what you know or can guess
> about the plaintext in such cases would be target-specific, rather than
> an attack that could be built once and used against any PG database.

Again is dependent on the specific solution for encryption. In some
cases you might do something like generate a single use random key,
encrypt the payload with that, encrypt the single use key with the
"global" key, append the two results and store.

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development

Reply | Threaded
Open this post in threaded view
|

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

Tomas Vondra-4


On 06/18/2018 05:06 PM, Joe Conway wrote:

> On 06/18/2018 10:52 AM, Tom Lane wrote:
>> Robert Haas <[hidden email]> writes:
>>> On Mon, Jun 18, 2018 at 10:12 AM, Joe Conway <[hidden email]> wrote:
>>>> Not necessarily. Our pages probably have enough predictable bytes to aid
>>>> cryptanalysis, compared to user data in a column which might not be very
>>>> predicable.
>>
>>> Really?  I would guess that the amount of entropy in a page is WAY
>>> higher than in an individual column value.
>>
>> Depending on the specifics of the encryption scheme, having some
>> amount of known (or guessable) plaintext may allow breaking the
>> cipher, even if much of the plaintext is not known. This is
>> cryptology 101, really.
>
> Exactly
>
>> At the same time, having to have a bunch of
>> independently-decipherable short field values is not real secure
>> either, especially if they're known to all be encrypted with the
>> same key. But what you know or can guess about the plaintext in
>> such cases would be target-specific, rather than an attack that
>> could be built once and used against any PG database.
>
> Again is dependent on the specific solution for encryption. In some
> cases you might do something like generate a single use random key,
> encrypt the payload with that, encrypt the single use key with the
> "global" key, append the two results and store.
>

Yeah, I suppose we could even have per-page keys, for example.

One topic I haven't seen mentioned in this thread yet is indexes. That's
a pretty significant side-channel, when built on encrypted columns. Even
if the indexes are encrypted too, you can often deduce a lot of
information from them.

So what's the plan here? Disallow indexes on encrypted columns? Index
encypted values directly? Something else?

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

Nico Williams
In reply to this post by Masahiko Sawada
On Mon, Jun 11, 2018 at 06:22:22PM +0900, Masahiko Sawada wrote:
> As per discussion at PGCon unconference, I think that firstly we need
> to discuss what threats we want to defend database data against. If

We call that a threat model.  There can be many threat models, of
course.

> user wants to defend against a threat that is malicious user who
> logged in OS or database steals an important data on datbase this
> design TDE would not help. Because such user can steal the data by
> getting a memory dump or by SQL. That is of course differs depending
> on system requirements or security compliance but what threats do you
> want to defend database data against? and why?

This design guards (somewhat) againts the threat of the storage theft
(e.g., because the storage is remote).  It's a fine threat model to
address, but it's also a lot easier to address in the filesystem or
device drivers -- there's no need to do this in PostgreSQL itself except
so as to support it on all platforms regardless of OS capabilities.

Note that unless the pg_catalog is protected against manipulation by
remote storage, then TDE for user tables might be possible to
compromise.  Like so: the attacker manipulates the pg_catalog to
escalate privelege in order to obtain the TDE keys.  This argues for
full database encryption, not just specific tables or columns.  But
again, this is for the threat model where the storage is the threat.

Another similar thread model is dump management, where dumps are sent
off-site where untrusted users might read them, or even edit them in the
hopes that they will be used for restores and thus compromise the
database.  This is most easily addressed by just encrypting the backups
externally to PG.

Threat models where client users are the threat are easily handled by
PG's permissions system.

I think any threat model where DBAs are not the threat is just not that
interesting to address with crypto within postgres itself...

Encryption to public keys for which postgres does not have private keys
would be one way to address DBAs-as-the-thread, but this is easily done
with an extension...  A small amount of syntactic sugar might help:

  CREATE ROLE "bar" WITH (PUBLIC KEY "...");

  CREATE TABLE foo (
    name TEXT PRIMARY KEY,
    payload TEXT ENCRYPTED TO ROLE "bar" BOUND TO name
  );

but this is just syntactic sugar, so not that valuable.  On the other
hand, just a bit of syntactic sugar can help tick a feature checkbox,
which might be very valuable for marketing reasons even if it's not
valuable for any other reason.

Note that encrypting the payload without a binding to the PK (or similar
name) is very dangerous!  So the encryption option would have to support
some way to indicate what other plaintext to bind in (here the "name"
column).

Note also that for key management reasons it would be necessary to be
able to write the payload as ciphertext rather than as to-be-encrypted
TEXT.

Lastly, for a symmetric encryption option one would need a remote oracle
to do the encryption, which seems rather complicated, but in some cases
may well perform faster.

Nico
--

1234