High CPU Usage of "SET ROLE"

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

High CPU Usage of "SET ROLE"

Ulf Lohbrügge
Hi there,

I'm running PostgreSQL 9.6.10 on Debian and reported some performance issues with "SET ROLE" a while ago:

Those long running queries, especially "SET ROLE", still persist until today.

I just recently increased the number of PostgreSQL instances and added an AWS Aurora Cluster (9.6.10 with db.r4.2xlarge). AWS offers a tool named "Performance Insights" and shows some really high CPU usage for "SET ROLE" queries. See attached image or just click here:

The setup is the same as reported in the above mentioned post: I use more than a thousand roles per PostgreSQL instance and set the role for every connection before executing actual statements. My pg_class consists of 1,557,824 rows as every role has its own schema with more than 300 tables.

I'm currently building a simple docker test setup with pg_bench to reproduce the decreased performance when executing "SET ROLE".

I'm aware that AWS Aurora is a proprietary version of PostgreSQL. But I somehow have the feeling that my experienced abnormalities with "SET ROLE" in vanilla PostgreSQL occur in AWS Aurora as well. And the high CPU usage reported by "Performance Insights" may be a hint of a performance issue in PostgreSQL.

I'm quite lost when reading the PostgreSQL source code (due to my inability to read it :)) but maybe one of you guys has an idea about that?

Regards,
Ulf



set-role.png (210K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: High CPU Usage of "SET ROLE"

Tom Lane-2
=?UTF-8?Q?Ulf_Lohbr=C3=BCgge?= <[hidden email]> writes:
> I'm running PostgreSQL 9.6.10 on Debian and reported some performance
> issues with "SET ROLE" a while ago:
> https://www.postgresql.org/message-id/CABZYQR%2BKu%2BiLFhqwY89QrrnKG9wKxckmssDG2rYKESojiohRgQ%40mail.gmail.com
> ...
> The setup is the same as reported in the above mentioned post: I use more
> than a thousand roles per PostgreSQL instance and set the role for every
> connection before executing actual statements. My pg_class consists
> of 1,557,824 rows as every role has its own schema with more than 300
> tables.

It seems plausible to guess that you've hit some behavior that's O(N^2)
in the number of objects (for some object type or other).  Perhaps "perf"
or a similar tool would give some insight into where the bottleneck is.

https://wiki.postgresql.org/wiki/Profiling_with_perf

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: High CPU Usage of "SET ROLE"

Ulf Lohbrügge
It seems plausible to guess that you've hit some behavior that's O(N^2)
in the number of objects (for some object type or other).  Perhaps "perf"
or a similar tool would give some insight into where the bottleneck is.

https://wiki.postgresql.org/wiki/Profiling_with_perf

Thanks for your quick reply!

I haven't used "perf" yet and decided to investigate a bit further with the tools I am more familiar with:

As I mentioned in my other post, I use the combination of "SET ROLE ...;" and "SET search_path = ...;" to switch to the requested tenant before executing actual statements. A typical request to my webapp executes the following sql statements:

1. SET ROLE tenant1;
2. SET search_path = tenant1;
3. -- execute various tenant1 related sql statements here
4. SET search_path = DEAULT;
5. RESET ROLE;

I activated logging of all statements for around 6 minutes in production and analyzed the duration of parse, bind and execute for the statements 1, 2, 4 and 5 above. I just summed parse, bind and execute and calculated the average of them.

"SET ROLE ...;" -> 7.109 ms (!)
"SET search_path = ...;" -> 0.026 ms
"SET search_path = DEAULT;" -> 0.059 ms
"RESET ROLE;" -> 0.026 ms

So "SET ROLE ...;" is more than 260 times slower than "SET search_path = ...;"! 7.109 vs. 0.026 ms.

I was curious to see what happens when I change the order of statements as follows ("SET ROLE ...;" happens after executing "SET search_path = ...;"):

1. SET search_path = tenant1;
2. SET ROLE tenant1;
3. -- execute various tenant1 related sql statements here
4. SET search_path = DEAULT;
5. RESET ROLE;

Logging of all statements was again enabled in production for around 6 minutes. And these were the results:

"SET search_path = ...;" -> 7.444 ms (!)
"SET ROLE ...;" -> 0.141 ms
"SET search_path = DEAULT;" -> 0.036 ms
"RESET ROLE;" -> 0.025 ms

And guess what? Now "SET search_path = ...;" takes more than 7 ms on average is more than 50 times slower than "SET ROLE ...;"! 7.444 vs. 0.141 ms.

I think I have found something here. It looks like that the order of statements is affecting their duration. I somehow have the feeling that the first statement after "RESET ROLE;" experiences a performance degradation.

When I use the psql cli on the same database I can see via "\timing" that the first statement after "RESET ROLE;" is significantly slower. I was even able to strip it down to two statements ("SET ROLE ...;" and "RESET ROLE;"):

mydb=> set role tenant1;
SET
Time: 0.516 ms
mydb=> reset role;
RESET
Time: 0.483 ms
mydb=> set role tenant1; <-- first statement after "reset role;"
SET
Time: 10.177 ms <-- significantly slower
mydb=> reset role;
RESET
Time: 0.523 ms
mydb=> set role tenant1; <-- first statement after "reset role;"
SET
Time: 12.119 ms <-- significantly slower
mydb=> reset role;
RESET
Time: 0.462 ms
mydb=> set role tenant1; <-- first statement after "reset role;"
SET
Time: 19.533 ms <-- significantly slower
mydb=>

Maybe my observations here are already sufficient to find out what happens here? I guess that my setup with 1k rows in pg_roles and 1.5m rows in pg_class is probably the cause.

Does it help when I create a test setup with a docker image that contains a database with that many entries in pg_roles and pg_class and share it here?

Regards,
Ulf
Reply | Threaded
Open this post in threaded view
|

Re: High CPU Usage of "SET ROLE"

Tom Lane-2
=?UTF-8?Q?Ulf_Lohbr=C3=BCgge?= <[hidden email]> writes:
> I think I have found something here. It looks like that the order of
> statements is affecting their duration. I somehow have the feeling that the
> first statement after "RESET ROLE;" experiences a performance degradation.

Hm.  It's well known that the first query executed in a *session* takes
a pretty big performance hit, because of the need to populate the
backend's catalog caches.  I'm not very sure however why "RESET ROLE"
would result in a mass cache flush, if indeed that's what's happening.

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: High CPU Usage of "SET ROLE"

Jeff Janes
In reply to this post by Ulf Lohbrügge
On Tue, Oct 30, 2018 at 3:50 PM Ulf Lohbrügge <[hidden email]> wrote:
 
When I use the psql cli on the same database I can see via "\timing" that the first statement after "RESET ROLE;" is significantly slower. I was even able to strip it down to two statements ("SET ROLE ...;" and "RESET ROLE;"):

... 
Maybe my observations here are already sufficient to find out what happens here? I guess that my setup with 1k rows in pg_roles and 1.5m rows in pg_class is probably the cause.

It would probably be enough if it were reproducible, but I can't reproduce it.

-- set up
perl -le 'print "create user foo$_;" foreach 1..1000'|psql
perl -le 'foreach $r (1..1000) {print "create schema foo$r authorization foo$r;"}'|psql
perl -le 'foreach $r (reverse 1..1000) {print "set role foo$r;"; print "create table foo$r.foo$_ (x serial primary key);" foreach 1..1000;}'|psql > out

-- test
perl -le 'print "set role foo$_;\nreset role;" foreach 1..1000'|psql

Does it help when I create a test setup with a docker image that contains a database with that many entries in pg_roles and pg_class and share it here?

If you have a script to create the database, I'd be more likely to play around with that than with a docker image.  (Which I have to guess would be quite large anyway, with 1.5 rows in pg_class)

Cheers,

Jeff