Table partitioning for cloud service?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Table partitioning for cloud service?

Israel Brewster-6
I’m working on my first cloud service, which will be backed by a postgresql database. Currently I only have a single customer, but of course I want to design with the possibility of multiple customers in mind. In that vein, I’m wondering what is “typical” in terms of designing the DB structure to make sure that one customer doesn’t “accidentally" get data for another customer? At the moment I am leaning towards giving each customer their own set of tables, with a unique prefix for each. This would provide a “hard” separation for the data, but would also increase maintenance efforts, as if I needed to add a field I would have to add it to every table. On the other hand, keeping everything in the same set of tables would mean having to be VERY careful with my SQL to make sure no customer could access another’s data.

How is this typically done?

---
Israel Brewster
BrewsterSoft Development
Home of EZPunch and Lyrics Presenter



Reply | Threaded
Open this post in threaded view
|

Re: Table partitioning for cloud service?

Adrian Klaver-4
On 5/21/20 8:29 AM, Israel Brewster wrote:
> I’m working on my first cloud service, which will be backed by a
> postgresql database. Currently I only have a single customer, but of
> course I want to design with the possibility of multiple customers in
> mind. In that vein, I’m wondering what is “typical” in terms of
> designing the DB structure to make sure that one customer doesn’t
> “accidentally" get data for another customer? At the moment I am leaning
> towards giving each customer their own set of tables, with a unique
> prefix for each. This would provide a “hard” separation for the data,
^^^^^^^^^^^^ I think that is called a schema:)
Or set up a separate database for each in the cluster.

> but would also increase maintenance efforts, as if I needed to add a
> field I would have to add it to every table. On the other hand, keeping
> everything in the same set of tables would mean having to be VERY
> careful with my SQL to make sure no customer could access another’s data.
>
> How is this typically done?
>
> ---
> Israel Brewster
> BrewsterSoft Development
> http://www.brewstersoft.com <http://www.brewstersoft.com/>
> Home of EZPunch and Lyrics Presenter
>
>
>


--
Adrian Klaver
[hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: Table partitioning for cloud service?

Israel Brewster-6
On May 21, 2020, at 7:36 AM, Adrian Klaver <[hidden email]> wrote:

On 5/21/20 8:29 AM, Israel Brewster wrote:
I’m working on my first cloud service, which will be backed by a postgresql database. Currently I only have a single customer, but of course I want to design with the possibility of multiple customers in mind. In that vein, I’m wondering what is “typical” in terms of designing the DB structure to make sure that one customer doesn’t “accidentally" get data for another customer? At the moment I am leaning towards giving each customer their own set of tables, with a unique prefix for each. This would provide a “hard” separation for the data,
^^^^^^^^^^^^ I think that is called a schema:)

Ok. That’s probably an option. Although it looks from a cursory perusal that for that to work, I would have to have separate DB users with different permissions. Which would be fine, except that I don’t have permissions to create users.

Or set up a separate database for each in the cluster.

Same as above - no permissions.

At the moment, I am running on Heroku, which gives me a postgresql database, but not full control over the cluster. I may need to move to something more robust, if having completely separate databases is the best option. I was hoping to avoid SysAdmin stuff as much as possible, and focus on the software side, but obviously some sys admin is required.

---
Israel Brewster
BrewsterSoft Development
Home of EZPunch and Lyrics Presenter


but would also increase maintenance efforts, as if I needed to add a field I would have to add it to every table. On the other hand, keeping everything in the same set of tables would mean having to be VERY careful with my SQL to make sure no customer could access another’s data.
How is this typically done?
---
Israel Brewster
BrewsterSoft Development
http://www.brewstersoft.com <http://www.brewstersoft.com/>
Home of EZPunch and Lyrics Presenter


--
Adrian Klaver
[hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: Table partitioning for cloud service?

Adrian Klaver-4
On 5/21/20 8:53 AM, Israel Brewster wrote:

>> On May 21, 2020, at 7:36 AM, Adrian Klaver <[hidden email]
>> <mailto:[hidden email]>> wrote:
>>
>> On 5/21/20 8:29 AM, Israel Brewster wrote:
>>> I’m working on my first cloud service, which will be backed by a
>>> postgresql database. Currently I only have a single customer, but of
>>> course I want to design with the possibility of multiple customers in
>>> mind. In that vein, I’m wondering what is “typical” in terms of
>>> designing the DB structure to make sure that one customer doesn’t
>>> “accidentally" get data for another customer? At the moment I am
>>> leaning towards giving each customer their own set of tables, with a
>>> unique prefix for each. This would provide a “hard” separation for
>>> the data,
>> ^^^^^^^^^^^^ I think that is called a schema:)
>
> Ok. That’s probably an option. Although it looks from a cursory perusal
> that for that to work, I would have to have separate DB users with
> different permissions. Which would be fine, except that I don’t have
> permissions to create users.
>
>> Or set up a separate database for each in the cluster.
>
> Same as above - no permissions.
>
> At the moment, I am running on Heroku, which gives me a postgresql
> database, but not full control over the cluster. I may need to move to
> something more robust, if having completely separate databases is the
> best option. I was hoping to avoid SysAdmin stuff as much as possible,
> and focus on the software side, but obviously some sys admin is required.

You can't use this?:

https://devcenter.heroku.com/articles/heroku-postgresql-credentials

>
> ---
> Israel Brewster
> BrewsterSoft Development
> http://www.brewstersoft.com
> Home of EZPunch and Lyrics Presenter
>
>>
>>> but would also increase maintenance efforts, as if I needed to add a
>>> field I would have to add it to every table. On the other hand,
>>> keeping everything in the same set of tables would mean having to be
>>> VERY careful with my SQL to make sure no customer could access
>>> another’s data.
>>> How is this typically done?
>>> ---
>>> Israel Brewster
>>> BrewsterSoft Development
>>> http://www.brewstersoft.com <http://www.brewstersoft.com/>
>>> Home of EZPunch and Lyrics Presenter
>>
>>
>> --
>> Adrian Klaver
>> [hidden email] <mailto:[hidden email]>
>>
>


--
Adrian Klaver
[hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: Table partitioning for cloud service?

Christopher Browne-3
In reply to this post by Israel Brewster-6
On Thu, 21 May 2020 at 11:53, Israel Brewster <[hidden email]> wrote:
On May 21, 2020, at 7:36 AM, Adrian Klaver <[hidden email]> wrote:

On 5/21/20 8:29 AM, Israel Brewster wrote:
I’m working on my first cloud service, which will be backed by a postgresql database. Currently I only have a single customer, but of course I want to design with the possibility of multiple customers in mind. In that vein, I’m wondering what is “typical” in terms of designing the DB structure to make sure that one customer doesn’t “accidentally" get data for another customer? At the moment I am leaning towards giving each customer their own set of tables, with a unique prefix for each. This would provide a “hard” separation for the data,
^^^^^^^^^^^^ I think that is called a schema:)

Ok. That’s probably an option. Although it looks from a cursory perusal that for that to work, I would have to have separate DB users with different permissions. Which would be fine, except that I don’t have permissions to create users.

Or set up a separate database for each in the cluster.

Same as above - no permissions.

At the moment, I am running on Heroku, which gives me a postgresql database, but not full control over the cluster. I may need to move to something more robust, if having completely separate databases is the best option. I was hoping to avoid SysAdmin stuff as much as possible, and focus on the software side, but obviously some sys admin is required.

There's a whole lot of "that depends" to this.

If there is not much data shared across customers, then it's a pretty good answer to create a database for each one.  This is especially good if they are only occasionally connected.

If there is a LOT of shared data, then "life gets more complicated." 

It's a decently well documented problem out there; I just searched for "multitenant database design" which showed up a number of decent (not-Postgres-specific) answers


Some common patterns include:
- A database per tenant (on Postgres, that means that PGDATABASE and/or connection URIs change for each tenant)
- A schema per tenant (on Postgres, that means each time a tenant is added, you need "CREATE NAMESPACE" to establish the tenancy and "CREATE TABLE" for each table in that tenancy, and connections use "set search_path=tenantname;" to select data from the right tenant)
- Table-based tenancy (e.g. - each table has a "tenant_id" and queries need to specify the tenant)

An interesting answer, if there needs to be shared data, is for the shared data to go in its own database, and use a Foreign Data Wrapper to have each tenants' database access it <https://www.postgresql.org/docs/12/postgres-fdw.html>

There are lots of tradeoffs involved in each case; each of the above patterns has merits and demerits particularly as the number of tenants scales, as well as when you discover there are both tiny and large tenants with differing requirements.

You need to look at it from various perspectives:
- How do application schema changes get handled as the application evolves?
- What are the security concerns about data sharing across tenants?
- What issues come up when managing storage across tenants?  (Some approaches are easier to cope with than others)

If you don't have a fair bit of technical expertise locally, then sophisticated choices will cause you problems that you won't be able to solve.
--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"
Reply | Threaded
Open this post in threaded view
|

Re: Table partitioning for cloud service?

Israel Brewster-6
In reply to this post by Adrian Klaver-4


On May 21, 2020, at 7:57 AM, Adrian Klaver <[hidden email]> wrote:

On 5/21/20 8:53 AM, Israel Brewster wrote:
On May 21, 2020, at 7:36 AM, Adrian Klaver <[hidden email] <[hidden email]>> wrote:

On 5/21/20 8:29 AM, Israel Brewster wrote:
I’m working on my first cloud service, which will be backed by a postgresql database. Currently I only have a single customer, but of course I want to design with the possibility of multiple customers in mind. In that vein, I’m wondering what is “typical” in terms of designing the DB structure to make sure that one customer doesn’t “accidentally" get data for another customer? At the moment I am leaning towards giving each customer their own set of tables, with a unique prefix for each. This would provide a “hard” separation for the data,
^^^^^^^^^^^^ I think that is called a schema:)
Ok. That’s probably an option. Although it looks from a cursory perusal that for that to work, I would have to have separate DB users with different permissions. Which would be fine, except that I don’t have permissions to create users.
Or set up a separate database for each in the cluster.
Same as above - no permissions.
At the moment, I am running on Heroku, which gives me a postgresql database, but not full control over the cluster. I may need to move to something more robust, if having completely separate databases is the best option. I was hoping to avoid SysAdmin stuff as much as possible, and focus on the software side, but obviously some sys admin is required.

You can't use this?:

https://devcenter.heroku.com/articles/heroku-postgresql-credentials

Wasn’t aware of that. I *did* mention this is my first cloud project. Done plenty of DB/web/application development, but not cloud/multi-customer. Thanks for the pointer.


---
Israel Brewster
BrewsterSoft Development
http://www.brewstersoft.com
Home of EZPunch and Lyrics Presenter

but would also increase maintenance efforts, as if I needed to add a field I would have to add it to every table. On the other hand, keeping everything in the same set of tables would mean having to be VERY careful with my SQL to make sure no customer could access another’s data.
How is this typically done?
---
Israel Brewster
BrewsterSoft Development
http://www.brewstersoft.com <http://www.brewstersoft.com/>
Home of EZPunch and Lyrics Presenter


-- 
Adrian Klaver
[hidden email] <[hidden email]>



-- 
Adrian Klaver
[hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Table partitioning for cloud service?

Adam Brusselback
In reply to this post by Christopher Browne-3
>  An interesting answer, if there needs to be shared data, is for the shared data to go in its own database, and use a Foreign Data Wrapper to have each tenants' database access it <https://www.postgresql.org/docs/12/postgres-fdw.html>  

For my application I went the schema-per-tenant route, but I have a need to have a single login which will work for all tenants you've been given access to. Not all tenants are required to be on the same database host, so I broke that piece out into it's own database and used postgres fdw to make it seem local to each tenant.

So i've got first hand experience with this for the past ~5 years, but this approach has serious tradeoffs. Queries that need to access the remote table can just fall on their face sometimes.  You will also need to deal with practically every connection spawning 1-or-more new connections which will stay open taking resources the first time a query is issued that accesses foreign data.

As an optimization I just worked on for my database earlier this week, I decided to logically replicate that table from my main authentication database into a each cluster, and I replaced all references to the FDW for read-only queries to use the logically replicated table. All write queries still hit the FDW as before.

This was acceptable for my use case, and drastically improved performance for some queries where I had previously had to use a CTE to force materialization to get acceptable performance due to the nature of going over the FDW for that data.

It's a very cool tool, just be careful about how it can impact performance if you don't measure for your specific use case.

Reply | Threaded
Open this post in threaded view
|

Re: Table partitioning for cloud service?

Michael Lewis
On Thu, May 21, 2020 at 11:41 AM Adam Brusselback <[hidden email]> wrote:
As an optimization I just worked on for my database earlier this week, I decided to logically replicate that table from my main authentication database into a each cluster, and I replaced all references to the FDW for read-only queries to use the logically replicated table. All write queries still hit the FDW as before.

Perhaps you considered this, but if you had not wanted to deal with the administration side with replication, and the centralized data is changed infrequently from only one application/source perhaps, then updating the source and then refreshing a materialized view on each local db that pulls in the foreign data could be a good option. Some chance of stale data since the refresh must be triggered, but for some use cases it may be the simplest setup.
Reply | Threaded
Open this post in threaded view
|

Re: Table partitioning for cloud service?

Sándor Daku
In reply to this post by Christopher Browne-3
Hi,

On Thu, 21 May 2020 at 18:14, Christopher Browne <[hidden email]> wrote:
On Thu, 21 May 2020 at 11:53, Israel Brewster <[hidden email]> wrote:

- Table-based tenancy (e.g. - each table has a "tenant_id" and queries need to specify the tenant)

The database/schema per tenant solution can be tedious when you want to modify something on the structure and you have numerous tenants.
Therefore I used the "tables with tenant_id" version in a similar situation but with a slight twist. One of the biggest issue of this solution is that if you forget to add the tenant_id to the where clause you are going to reveal one tenant's data to another.
I came up with the solution that the database user have no privileges for accessing the base tables. Instead of that I generate views for each tenant and they can access their own data in the underlying table through these views. Now if forget to address the right tenant in my client code(it still happens sometimes) and try to directly access the base tables I get a strongly worded reminder from the server.


Regards,
Sándor    

      

 
Reply | Threaded
Open this post in threaded view
|

Re: Table partitioning for cloud service?

Israel Brewster-6
On May 21, 2020, at 12:12 PM, Sándor Daku <[hidden email]> wrote:
Hi,

On Thu, 21 May 2020 at 18:14, Christopher Browne <[hidden email]> wrote:
On Thu, 21 May 2020 at 11:53, Israel Brewster <[hidden email]> wrote:

- Table-based tenancy (e.g. - each table has a "tenant_id" and queries need to specify the tenant)

The database/schema per tenant solution can be tedious when you want to modify something on the structure and you have numerous tenants.
Therefore I used the "tables with tenant_id" version in a similar situation but with a slight twist. One of the biggest issue of this solution is that if you forget to add the tenant_id to the where clause you are going to reveal one tenant's data to another.
I came up with the solution that the database user have no privileges for accessing the base tables. Instead of that I generate views for each tenant and they can access their own data in the underlying table through these views. Now if forget to address the right tenant in my client code(it still happens sometimes) and try to directly access the base tables I get a strongly worded reminder from the server.

Nice solution! I think I may go to something like that once I upgrade to a cloud solution that lets me add multiple users to the DB (the free tier of Heroku does not). In the meantime, while I just have the single customer, I can fake it easily enough.

Is there any shortcuts for referencing the proper views, or do you just append/prepend something to every table reference in your SQL? One nice thing about the database/schema approach is that I can just specify the search_path (or database) in the connection command, and then all the table references remain the same for all tenants. Also helps avoid the situation you mentioned where you forget to address the right tenant, since you only have to do it in one place. Of course, as you said, it can be tedious when you want to modify the structure. I’ll have to think about that a bit more.

---
Israel Brewster
BrewsterSoft Development
Home of EZPunch and Lyrics Presenter


Regards,
Sándor    

      

 

Reply | Threaded
Open this post in threaded view
|

Re: Table partitioning for cloud service?

Ravi Krishna-15
In reply to this post by Sándor Daku
>
> The database/schema per tenant solution can be tedious when you want to modify something on the structure and you have numerous tenants.
> Therefore I used the "tables with tenant_id" version in a similar situation but with a slight twist. One of the biggest issue of this solution is that if you forget to add the tenant_id to the where clause you are going to reveal one tenant's data to another.
> I came up with the solution that the database user have no privileges for accessing the base tables. Instead of that I generate views for each tenant and they can access their own data in the underlying table through these views. Now if forget to address the right tenant in my client code(it still happens sometimes) and try to directly access the base tables I get a strongly worded reminder from the server.

1. If you have 50 tables and say 100 tenants, we are talking about 5,000 views. I am not sure
    whether it is any more elegant than having 100 schemas.
2. In your approach you can do any phased DDL upgrade.  It is typical to do rolling upgrades
    in a multi tenant databases, starting with least risky tenant.



Reply | Threaded
Open this post in threaded view
|

Re: Table partitioning for cloud service?

Adrian Klaver-4
In reply to this post by Israel Brewster-6
On 5/21/20 1:23 PM, Israel Brewster wrote:

> On May 21, 2020, at 12:12 PM, Sándor Daku <[hidden email]
> <mailto:[hidden email]>> wrote:
>> Hi,
>>
>> On Thu, 21 May 2020 at 18:14, Christopher Browne <[hidden email]
>> <mailto:[hidden email]>> wrote:
>>
>>     On Thu, 21 May 2020 at 11:53, Israel Brewster
>>     <[hidden email] <mailto:[hidden email]>> wrote:
>>
>>>
>>     - Table-based tenancy (e.g. - each table has a "tenant_id" and
>>     queries need to specify the tenant)
>>
>>
>> The database/schema per tenant solution can be tedious when you want
>> to modify something on the structure and you have numerous tenants.
>> Therefore I used the "tables with tenant_id" version in a similar
>> situation but with a slight twist. One of the biggest issue of this
>> solution is that if you forget to add the tenant_id to the where
>> clause you are going to reveal one tenant's data to another.
>> I came up with the solution that the database user have no privileges
>> for accessing the base tables. Instead of that I generate views for
>> each tenant and they can access their own data in the underlying table
>> through these views. Now if forget to address the right tenant in my
>> client code(it still happens sometimes) and try to directly access the
>> base tables I get a strongly worded reminder from the server.
>
> Nice solution! I think I may go to something like that once I upgrade to
> a cloud solution that lets me add multiple users to the DB (the free
> tier of Heroku does not). In the meantime, while I just have the single
> customer, I can fake it easily enough.
>
> Is there any shortcuts for referencing the proper views, or do you just
> append/prepend something to every table reference in your SQL? One nice
> thing about the database/schema approach is that I can just specify the
> search_path (or database) in the connection command, and then all the
> table references remain the same for all tenants. Also helps avoid the
> situation you mentioned where you forget to address the right tenant,
> since you only have to do it in one place. Of course, as you said, it
> can be tedious when you want to modify the structure. I’ll have to think
> about that a bit more.

If you want to take the tedium out of it take a look at Sqitch:

https://sqitch.org/

Then all you have to do is create the change once and deploy to the
targets.

>
> ---
> Israel Brewster
> BrewsterSoft Development
> http://www.brewstersoft.com <http://www.brewstersoft.com/>
> Home of EZPunch and Lyrics Presenter
>>
>>
>> Regards,
>> Sándor
>>
>>
>


--
Adrian Klaver
[hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: Table partitioning for cloud service?

Edson Richter
In reply to this post by Israel Brewster-6


------ Mensagem original------
De: Ravi Krishna
Data: qui, 21 de mai de 2020 18:50
Para:
Cc:PostgreSQL Mailing Lists;
Assunto:Re: Table partitioning for cloud service?

> 
> The database/schema per tenant solution can be tedious when you want to modify something on the structure and you have numerous tenants.
> Therefore I used the "tables with tenant_id" version in a similar situation but with a slight twist. One of the biggest issue of this solution is that if you forget to add the tenant_id to the where clause you are going to reveal one tenant's data to another.
> I came up with the solution that the database user have no privileges for accessing the base tables. Instead of that I generate views for each tenant and they can access their own data in the underlying table through these views. Now if forget to address the right tenant in my client code(it still happens sometimes) and try to directly access the base tables I get a strongly worded reminder from the server.

1. If you have 50 tables and say 100 tenants, we are talking about 5,000 views. I am not sure 
    whether it is any more elegant than having 100 schemas.
2. In your approach you can do any phased DDL upgrade.  It is typical to do rolling upgrades
    in a multi tenant databases, starting with least risky tenant.


Be carefull to plan your backup/recovery strategy. How do you plan to recover one customer from backup without interfering with the others?
What will be your disaster recover strategy? Have you considered replication?
Large databases take longer to backup.
Keep schemas in sync can be easily solved.

Regards,

Edson