neither CPU nor IO bound, but throttled performance

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

neither CPU nor IO bound, but throttled performance

Gunter

Hi, I have an Amazon Linux based Postgresql 11 server here on a t2.medium EC2 instance.

It is serving 24 worker processes that read jobs from a queue (thanks to SELECT ... FOR UPDATE SKIP LOCKED!) and do jobs some of which are reading and writing business data to the database, others are only reading, and some don't hit the business data at all, only the queue.

Everything flows quite nicely. Except, I don't understand why I can't max out the CPU or the IO, instead, IO is almost negligible yet the CPU is at 30% hardly hitting 50%.

Here I give you a view of top:

top - 23:17:09 up 45 days,  2:07,  4 users,  load average: 20.32, 18.92, 13.80
Tasks: 338 total,  24 running, 111 sleeping,   0 stopped,   0 zombie
%Cpu(s): 28.7 us,  2.5 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si, 68.7 st
KiB Mem :  4040028 total,  1070368 free,   324460 used,  2645200 buff/cache
KiB Swap:        0 total,        0 free,        0 used.  2223720 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 7678 postgres  20   0 1235072 509744 506356 R   8.7 12.6   1:14.82 postgres: auser integrator 172.31.61.242(25783) BIND
 7998 postgres  20   0 1235108 516480 512772 R   8.7 12.8   1:16.20 postgres: auser integrator 172.31.49.159(51708) SELECT
 2183 postgres  20   0 1261436 985.8m 982544 R   8.5 25.0   0:44.04 postgres: auser integrator [local] SELECT
 7653 postgres  20   0 1235180 501760 497984 R   8.2 12.4   1:13.66 postgres: auser integrator 172.31.54.158(47959) SELECT
 7677 postgres  20   0 1235144 506740 502980 S   8.2 12.5   1:13.54 postgres: auser integrator 172.31.61.242(56510) idle in t+
 7680 postgres  20   0 1234684 484356 481100 R   8.2 12.0   1:13.86 postgres: auser integrator 172.31.61.242(49966) SELECT
 2631 postgres  20   0 1235120 937964 934528 R   7.9 23.2  10:48.39 postgres: auser integrator 172.31.49.159(33522) idle in t+
 7664 postgres  20   0 1235104 524664 520976 R   7.9 13.0   1:13.95 postgres: auser integrator 172.31.57.147(30036) BIND
 7682 postgres  20   0 1234660 496188 492956 R   7.9 12.3   1:15.50 postgres: auser integrator 172.31.61.242(26330) COMMIT
 7687 postgres  20   0 1234876 490104 486656 R   7.9 12.1   1:16.77 postgres: auser integrator 172.31.63.71(25285) BIND
 7660 postgres  20   0 1235100 502004 498596 R   7.6 12.4   1:18.00 postgres: auser integrator 172.31.57.147(46051) PARSE
 7662 postgres  20   0 1235148 503532 500280 R   7.6 12.5   1:14.03 postgres: auser integrator 172.31.57.147(48852) UPDATE
 7681 postgres  20   0 1234688 516864 513596 R   7.6 12.8   1:17.77 postgres: auser integrator 172.31.61.242(48192) SELECT
 7685 postgres  20   0 1235096 515352 511968 R   7.6 12.8   1:16.17 postgres: auser integrator 172.31.63.71(62540) BIND
 7689 postgres  20   0 1235100 509504 505836 S   7.6 12.6   1:14.78 postgres: auser integrator 172.31.63.71(12287) idle in tr+
 7684 postgres  20   0 1235052 500336 496916 R   7.3 12.4   1:14.83 postgres: auser integrator 172.31.63.71(19633) BIND
 7654 postgres  20   0 1235224 514512 511040 S   7.0 12.7   1:18.89 postgres: auser integrator 172.31.57.147(43437) idle in t+
 7656 postgres  20   0 1234684 510900 507636 R   7.0 12.6   1:16.19 postgres: auser integrator 172.31.54.158(30397) idle in t+
 7661 postgres  20   0 1234684 514920 511648 S   7.0 12.7   1:16.27 postgres: auser integrator 172.31.57.147(38243) SELECT
 7679 postgres  20   0 1235112 512228 508544 R   7.0 12.7   1:14.60 postgres: auser integrator 172.31.61.242(34261) PARSE
 7663 postgres  20   0 1234684 517068 513812 R   6.8 12.8   1:17.42 postgres: auser integrator 172.31.57.147(19711) SELECT
 7655 postgres  20   0 1235036 505584 502208 R   6.5 12.5   1:16.17 postgres: auser integrator 172.31.54.158(24401) BIND
 7658 postgres  20   0 1235072 509432 506108 R   6.5 12.6   1:15.90 postgres: auser integrator 172.31.54.158(48566) idle in t+
 7659 postgres  20   0 1234688 497488 494232 S   6.2 12.3   1:15.70 postgres: auser integrator 172.31.54.158(49841) idle in t+
 7686 postgres  20   0 1234748 508412 505084 R   5.9 12.6   1:14.70 postgres: auser integrator 172.31.63.71(54938) BIND
 7688 postgres  20   0 1234708 502416 499204 S   5.9 12.4   1:14.14 postgres: auser integrator 172.31.63.71(49857) SELECT
 7657 postgres  20   0 1234684 513740 510476 R   5.6 12.7   1:15.96 postgres: auser integrator 172.31.54.158(47300) SELECT
 1304 ec2-user  20   0  171400   4648   3840 R   1.7  0.1   1:25.66 top -c
 8492 root      20   0       0      0      0 R   1.7  0.0   0:00.41 [kworker/1:2]
and of iostat:
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          36.41    0.00    3.80    0.00   59.78    0.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvda              0.00     0.00    0.00    7.69     0.00    92.31    24.00     0.02    3.00    0.00    3.00   3.00   2.31
xvdf              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
xvdg              0.00     0.00    0.00   11.54     0.00   130.77    22.67     0.05    4.00    0.00    4.00   0.33   0.38
xvdh              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
xvdi              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
xvdj              0.00     0.00    5.77    0.96    46.15     7.69    16.00     0.01    1.14    0.67    4.00   1.14   0.77
xvdk              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
xvdl              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
xvdm              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
xvdn              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
xvdo              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
xvdp              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
xvdq              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
xvdr              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
xvds              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
xvdt              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
xvdu              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
xvdv              0.00     0.00    0.00    8.65     0.00    76.92    17.78     0.03    4.00    0.00    4.00   0.44   0.38
xvdw              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
xvdx              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
xvdy              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
xvdz              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
xvdaa             0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
xvdab             0.00     0.00    0.00    2.88     0.00    30.77    21.33     0.01    4.00    0.00    4.00   1.33   0.38

previously I had hit 100 %util here, that was when I didn't have the tables so spread out over so many tablespaces. Now I have it spread out like in the olden days where you spread out your tables over many "spindles", and I did this here so I could see which tables or indexes would be bottlenecks.

So how can it be that queries take quite long without the process running at higher CPU%?

Or is there something wrong with the total CPU% estimated by both top and iostat? From the top it looks like I have 24 worker processes use 8% each, most of them in R(unning) state, so that would be 192%, which is divided over the 2 CPUs of the t2.medium instance, really 96%. So I am CPU bound after all?

regards,
-Gunther

Reply | Threaded
Open this post in threaded view
|

Re: neither CPU nor IO bound, but throttled performance

Justin Pryzby
On Wed, Feb 20, 2019 at 06:32:49PM -0500, Gunther wrote:
> Hi, I have an Amazon Linux based Postgresql 11 server here on a t2.medium
> EC2 instance.
>
> Everything flows quite nicely. Except, I don't understand why I can't max
> out the CPU or the IO, instead, IO is almost negligible yet the CPU is at
> 30% hardly hitting 50%.

> avg-cpu: %user %nice %system %iowait %steal %idle 36.41 0.00 3.80 0.00 59.78
> 0.00 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await
> r_await w_await svctm %util xvda 0.00 0.00 0.00 7.69 0.00 92.31 24.00 0.02

This is unreadable, please try to attach it ?

> previously I had hit 100 %util here, that was when I didn't have the tables
> so spread out over so many tablespaces. Now I have it spread out like in the
> olden days where you spread out your tables over many "spindles", and I did
> this here so I could see which tables or indexes would be bottlenecks.

What was the old storage configuration and what is it now ?

> So how can it be that queries take quite long without the process running at
> higher CPU%?

You said everything flows nicely, but take a long time, and "throttled", can
you show an high-level performance change ?  Other than %cpu or %io.

Justin

Reply | Threaded
Open this post in threaded view
|

Re: neither CPU nor IO bound, but throttled performance

Magnus Hagander-2
In reply to this post by Gunter
On Thu, Feb 21, 2019 at 12:34 AM Gunther <[hidden email]> wrote:

Hi, I have an Amazon Linux based Postgresql 11 server here on a t2.medium EC2 instance.

It is serving 24 worker processes that read jobs from a queue (thanks to SELECT ... FOR UPDATE SKIP LOCKED!) and do jobs some of which are reading and writing business data to the database, others are only reading, and some don't hit the business data at all, only the queue.

Everything flows quite nicely. Except, I don't understand why I can't max out the CPU or the IO, instead, IO is almost negligible yet the CPU is at 30% hardly hitting 50%.

Here I give you a view of top:

top - 23:17:09 up 45 days,  2:07,  4 users,  load average: 20.32, 18.92, 13.80
Tasks: 338 total,  24 running, 111 sleeping,   0 stopped,   0 zombie
%Cpu(s): 28.7 us,  2.5 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si, 68.7 st

If I read that right, it's about 70% "steal". The description for this is "the percentage of time spent in involuntary wait by the virtual CPU or CPUs while the hypervisor was servicing another virtual processor.".

So basically, the CPU is spent dealing with other peoples VMs on the same hardware. Welcome to the multi-tenant cloud. 

In particular, I believe T series instances get a limited number of CPU "credits" per hours. My guess is you've hit this limit and are thus being throttled. T series are not intended for persistent workloads. Either way, this is probably a question better asked at the Amazon EC2 forums rather than PostgreSQL as you'll find more people who know the EC2 interactions there.
 
--
Reply | Threaded
Open this post in threaded view
|

Re: neither CPU nor IO bound, but throttled performance

Gunther Schadow

Thank you Magnus. 68% steal. Indeed. You probably hit the target. Yes.

That explains the discrepancy. I need to watch and understand that CPU credits issue.

regards,
-Gunther

On 2/21/2019 4:08, Magnus Hagander wrote:
On Thu, Feb 21, 2019 at 12:34 AM Gunther <[hidden email]> wrote:

Hi, I have an Amazon Linux based Postgresql 11 server here on a t2.medium EC2 instance.

It is serving 24 worker processes that read jobs from a queue (thanks to SELECT ... FOR UPDATE SKIP LOCKED!) and do jobs some of which are reading and writing business data to the database, others are only reading, and some don't hit the business data at all, only the queue.

Everything flows quite nicely. Except, I don't understand why I can't max out the CPU or the IO, instead, IO is almost negligible yet the CPU is at 30% hardly hitting 50%.

Here I give you a view of top:

top - 23:17:09 up 45 days,  2:07,  4 users,  load average: 20.32, 18.92, 13.80
Tasks: 338 total,  24 running, 111 sleeping,   0 stopped,   0 zombie
%Cpu(s): 28.7 us,  2.5 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si, 68.7 st

If I read that right, it's about 70% "steal". The description for this is "the percentage of time spent in involuntary wait by the virtual CPU or CPUs while the hypervisor was servicing another virtual processor.".

So basically, the CPU is spent dealing with other peoples VMs on the same hardware. Welcome to the multi-tenant cloud. 

In particular, I believe T series instances get a limited number of CPU "credits" per hours. My guess is you've hit this limit and are thus being throttled. T series are not intended for persistent workloads. Either way, this is probably a question better asked at the Amazon EC2 forums rather than PostgreSQL as you'll find more people who know the EC2 interactions there.
 
--