[FEATURE REQUEST] Encrypted indexes over encrypted data

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[FEATURE REQUEST] Encrypted indexes over encrypted data

abcz2.uprola
Hi! Haven't found discussions on possible ways to encrypt indexes. Let's start!

The problem
==========

I'd like to encrypt some columns (first name, last name, SSN, etc...) in a nondeterministic way. This can be done using `pg_crypto`, but then I loose full-text-search (and any other indexing) capabilities on these fields.

Blind indexing also isn't a good enough option.

Obviously we don't want create expression-based indexes, that perform decryption during index build. This will store plaintexts inside index buffers, and decryption key will be included in database dump.

We don't trust full-disk-encryption or any other transparent encryption, because of possible SQL injections.

Solution 1 (possibly can be used even now)
========
- perform full-disk encryption
- perform encryption of column
- add decrypting expression-based index with decryption key
- limit ways on disclosing index internals. Ideally if no one except admin can do that
- limit ways to read index definitions - so it's not possible for application to uncover decryption key from database itself, it should know it on it's own.

Solution 2 (feature request)
========
- full-disk encryption is optional
- data column is encrypted
- index is decrypted by construction, but each it's block is encrypted, even in memory.
- lookups over index do lazy index buffer decrypt and close buffers ASAP
- make every query that has to touch encrypted column or encrypted index require decryption key. This means, SELECT, DELETE, UPDATE, INSERT, VACUUM, CLUSTER, CREATE INDEX, pg_dump, pg_restore all should have decryption key supplied in order to be executed. This also means, that autovacuum daemon can't work.


What do you think about both solutions? Is it hard to implement soluition 2?
Reply | Threaded
Open this post in threaded view
|

Re: [FEATURE REQUEST] Encrypted indexes over encrypted data

Andres Freund


On August 9, 2018 5:30:26 PM GMT+05:30, Danylo Hlynskyi <[hidden email]> wrote:
> ?Is it hard to implement soluition 2?

Yes.

To the point that I'm fairly certain that an implementation would be considered to costly to maintain (vs benefit) of proposed.

Andres

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Reply | Threaded
Open this post in threaded view
|

Re: [FEATURE REQUEST] Encrypted indexes over encrypted data

Bear Giles
There are alternatives. If you know what you want to find, e.g., a search by username or email address, you can store a strong hash of the value as an indexed column. By "strong hash" I mean don't just use md5 or sha1, or even one round with a salt. I can give you more details about how and why offline.

So you might have a record with:

   id serial primary key,
   email_hash text not null indexed,
   first_name_hash text indexed,
   last_name_hash text indexed,
   phone_number_hash text indexed ,
   'wallet'_containing_all_encrypted_values text

and that allows you to search on email, first name, last name, or phone number, or some combination on them. But no expressions. The hashing would be done in your app, not the database. You also probably want to convert everything to lowercase, maybe remove spaces, etc., before computing the hash.

You should be prepared to handle multiple matches. It's unlikely that an email or phone number hash won't be unique but it's safest to always be prepared for more than one match, decrypt the 'wallet', and then do a final comparison. That also gives you a bit of protection from an attacker creating an account and then changing the hash values to match someone else. You can use that to support very limited expressions, e.g., also keep a hash on the first three letters of their last name, but that will compromise your security a bit since it allows an attacker to perform some statistical analysis on the data.

Finally there's the general advice that hashes (and encrypted values) should always have a version number of some sort. It could be something as simple as 3$hash, or it could be a composite column or even a user-defined type. The # indicates is a lookup into a table, perhaps in your app, that tells you which hashing algorithm and salt to use. It makes life a lot easier if the security audit tells you that you need to change your cipher/salt/key/whatever but you can't do it immediately since you don't know everything you need in order to do it, e.g., the password that you need in order to recompute the hash value. With that version number it's easy to continue to accept the existing password so they can log in, and in the background you quietly recompute the hash using the new salt/algorithm/whatever and update their record. I've worked for some pretty knowledgeable companies that have overlooked this.

On Thu, Aug 9, 2018 at 6:05 AM, Andres Freund <[hidden email]> wrote:


On August 9, 2018 5:30:26 PM GMT+05:30, Danylo Hlynskyi <[hidden email]> wrote:
> ?Is it hard to implement soluition 2?

Yes.

To the point that I'm fairly certain that an implementation would be considered to costly to maintain (vs benefit) of proposed.

Andres

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Reply | Threaded
Open this post in threaded view
|

Re: [FEATURE REQUEST] Encrypted indexes over encrypted data

Nico Williams
In reply to this post by abcz2.uprola
On Thu, Aug 09, 2018 at 03:00:26PM +0300, Danylo Hlynskyi wrote:
> The problem
> ==========
>
> [...]
>
> We don't trust full-disk-encryption or any other transparent encryption,
> because of possible SQL injections.

Can you elaborate on this?

> Solution 1 (possibly can be used even now)
> ========
> - perform full-disk encryption
> - perform encryption of column
> - add decrypting expression-based index with decryption key
> - limit ways on disclosing index internals. Ideally if no one except admin
> can do that
> - limit ways to read index definitions - so it's not possible for
> application to uncover decryption key from database itself, it should know
> it on it's own.

But... you have to have the decryption key(s) in memory at all times to
enable any write operations.  And plaintext as well at various times.

What does this gain you that FDE doesn't?

> Solution 2 (feature request)
> ========
> - full-disk encryption is optional
> - data column is encrypted
> - index is decrypted by construction, but each it's block is encrypted,
> even in memory.
> - lookups over index do lazy index buffer decrypt and close buffers ASAP
> - make every query that has to touch encrypted column or encrypted index
> require decryption key. This means, SELECT, DELETE, UPDATE, INSERT, VACUUM,
> CLUSTER, CREATE INDEX, pg_dump, pg_restore all should have decryption key
> supplied in order to be executed. This also means, that autovacuum daemon
> can't work.

Same response.

> What do you think about both solutions? Is it hard to implement soluition 2?

They gain little or nothing over doing filesystem encryption in the OS
or even just plain FDE (see below).  They are not worthwhile.

You need to define your threat model.  What bad actors are you
protecting against?  What threats are you protecting against?

https://www.postgresql.org/message-id/20180622042337.GL4200%40localhost

Here are some threats you might choose to protect against:

1) passive attackers on the wire
2) active  attackers on the wire
3a) theft / compromise of storage devices
3b) compromise of decommissioned storage devices
3c) theft of running server
4) compromised backup storage
5) bad / compromised clients
6) bad / compromised DBAs or sysadmins
7) side channel exploits
8) ??

(1) and (2) are taken care of by TLS.

(3a) is taken care of by FDE in controllers, say, or by physical
security.

(3b) is taken care of by proper decommissioning, but FDE helps.

(3c) you can't protect against if you have keys in memory.  You could
use client-side crypto, but you'll have more clients to worry about than
servers.  Physical security is your best option.  (And really, you don't
get any way to protect against law enforcement taking the devices.)

(4) is taken care of by encrypting backups, which requires no changes to
PG to get.

(5) is taken care of (to some degree) by server-side logic (triggers,
...).

(6)...  You can't protect against sysadmins, really, nor DBAs, but you
can use crypto on the *client*-side to get some protection.  Since the
PG client is very thin and dumb, the PG client can't easily do this.
The idea is to encrypt values and MAC/sign rows to prevent DBAs/
sysadmins seeing sensitive data or tampering with your data.

(7) one deals with by using crypto implementations built with side
channel protection, though, really, this is a very difficult subject in
general, especially since Spectre.

Nico
--

Reply | Threaded
Open this post in threaded view
|

Re: [FEATURE REQUEST] Encrypted indexes over encrypted data

Bear Giles
Some regulatory standards require all UII, even all PII, information be encrypted within the database, not just on encrypted media. That's to reduce exposure even if someone gets access to a live server, e.g., via SQL Injection. (The perennial #1 risk for software vulnerabilities.)

UII is uniquely identifiable information, e.g., SSN.

PII is personally identifiable information, e.g, email address, phone number, address. It doesn't have to be enough to uniquely identify the person, just enough to cut the number of possible individuals down to a handful. That's a surprising large number of fields, e.g., knowing where someone was born and their birthdate will get you close to their SSN for anyone born after the IRS started requiring SSNs for all claimed dependents. Knowing someone's birth date and city of residence will get you down to a handful of individuals, often a single individual depending upon their age and the size of their city. It's remarkably easy to uniquely identify something like 75% of the population if you have the data from a couple different sites and some way to correlate the records. (That's why it's not good enough to just use the sha1 of an email address, etc.)

I know the government required UII encryption in its databases when I last worked on a government contract, and I think they've required PII encryption as well for years. I would be verify surprised if HIPAA doesn't require that as well for PII in addition to the medical info. I definitely know PCI-DSS requires encryption of all information on the credit card itself - you can keep the last few digits (I think 6 are allowed but for is recommended) to facilitate searches. Of course companies could still have the same information unencrypted in other columns or tables (except for the CC number itself - and you *never* keep the CVN in any form on threat of losing your ability to accept credit cards if you're caught) but they were encouraged to encrypt it as well.

Anyway legal requirements is "#0" on that list. Everything else *might* happen but depending upon the nature of the data you *will* be audited for compliance with regulations, either preemptively (e.g., VISA requires periodic audits of anyone making more than $X in transactions per year) or after a breach. One of my other past employers did the type of auditing VISA requires and their promotional material was full of interviews with former small business owners who lost their business after a breach. It wasn't due to the loss itself, it's because any breach automatically requires the strictest auditing for the next (4?) years and that cost far more than the average independent restaurant, auto repair shop, etc., can afford. Obviously their business model is (in part) to scare people but there are plenty of situations where you have to encrypt data within the database and not just rely on encrypted media.


 
Here are some threats you might choose to protect against:

1) passive attackers on the wire
2) active  attackers on the wire
3a) theft / compromise of storage devices
3b) compromise of decommissioned storage devices
3c) theft of running server
4) compromised backup storage
5) bad / compromised clients
6) bad / compromised DBAs or sysadmins
7) side channel exploits
8) ??

(1) and (2) are taken care of by TLS.

(3a) is taken care of by FDE in controllers, say, or by physical
security.

(3b) is taken care of by proper decommissioning, but FDE helps.

(3c) you can't protect against if you have keys in memory.  You could
use client-side crypto, but you'll have more clients to worry about than
servers.  Physical security is your best option.  (And really, you don't
get any way to protect against law enforcement taking the devices.)

(4) is taken care of by encrypting backups, which requires no changes to
PG to get.

(5) is taken care of (to some degree) by server-side logic (triggers,
...).

(6)...  You can't protect against sysadmins, really, nor DBAs, but you
can use crypto on the *client*-side to get some protection.  Since the
PG client is very thin and dumb, the PG client can't easily do this.
The idea is to encrypt values and MAC/sign rows to prevent DBAs/
sysadmins seeing sensitive data or tampering with your data.

(7) one deals with by using crypto implementations built with side
channel protection, though, really, this is a very difficult subject in
general, especially since Spectre.

Nico
--


Reply | Threaded
Open this post in threaded view
|

Re: [FEATURE REQUEST] Encrypted indexes over encrypted data

Nico Williams
On Thu, Aug 09, 2018 at 02:34:07PM -0600, Bear Giles wrote:
> Some regulatory standards require all UII, even all PII, information be
> encrypted within the database, not just on encrypted media. That's to
> reduce exposure even if someone gets access to a live server, e.g., via SQL
> Injection. (The perennial #1 risk for software vulnerabilities.)

My preference for dealing with SQL Injection is to not provide direct
SQL access, but to use PostgREST exporting a schema that has only PG SQL
functions encapsulating all supported queries.  You just can't have
injection with such an access layer because you don't get to send SQL to
the server (because you don't get to send SQL to PostgREST).  It really
helps that PostgREST is written in Haskell.

That said, sure, if you have SQL Injection issues, then encrypting in
the database will do provided that there's no transparent way to access
the data (otherwise you've gained nothing).  That basically means you're
doing all the crypto on the client.

If you're doing all the crypto on the client, then your options for
indexing are very limited indeed.  To avoid offline dictionary attacks
you have to index MAC'ed values, effectively.  You can still do free
text indexing, provided you MAC each word.  MAC == message
authentication code, really, it's a keyed hash function, typically HMAC,
UMAC, or some such.  You could also index ciphertext, provided it has an
authentication tag, but you don't gain anything versus just indexing the
authentication tag.

> I know the government required UII encryption in its databases when I last
> [...]

Usually regulations are not quite as prescriptive as that, though
there's always a discussion to be had with the regulators/auditors when
you deviate from the norm.  You're generally not precluded from having
better solutions than is the norm.

Nico
--

Previous Thread Next Thread