PSA: New intel MDS vulnerability mitigations cause measurable slowdown

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

PSA: New intel MDS vulnerability mitigations cause measurable slowdown

Andres Freund
Hi,

There's a new set of CPU vulnerabilities, so far only affecting intel
CPUs. Cribbing from the linux-kernel announcement I'm referring to
https://xenbits.xen.org/xsa/advisory-297.html
for details.

The "fix" is for the OS to perform some extra mitigations:
https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html
https://www.kernel.org/doc/html/latest/x86/mds.html#mds

*And* SMT/hyperthreading needs to be disabled, to be fully safe.

Fun.

I've run a quick pgbench benchmark:

*Without* disabling SMT, for readonly pgbench, I'm seeing regressions
between 7-11%, depending on the size of shared_buffers (and some runtime
variations).  That's just on my laptop, with an i7-6820HQ / Haswell CPU.
I'd be surprised if there weren't adversarial loads with bigger
slowdowns - what gets more expensive with the mitigations is syscalls.


Most OSs / distributions either have rolled these changes out already,
or will do so soon. So it's likely that most of us and our users will be
affected by this soon.  At least on linux the part of the mitigation
that makes syscalls slower (blowing away buffers at the end of a sycall)
is enabled by default, but SMT is not disabled by default.

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: PSA: New intel MDS vulnerability mitigations cause measurable slowdown

Thomas Munro-5
On Wed, May 15, 2019 at 10:31 AM Andres Freund <[hidden email]> wrote:
> *Without* disabling SMT, for readonly pgbench, I'm seeing regressions
> between 7-11%, depending on the size of shared_buffers (and some runtime
> variations).  That's just on my laptop, with an i7-6820HQ / Haswell CPU.
> I'd be surprised if there weren't adversarial loads with bigger
> slowdowns - what gets more expensive with the mitigations is syscalls.

Yikes.  This all in warm shared buffers, right?  So effectively this
is the cost of recvfrom() and sendto() going up?  Did you use -M
prepared?  If not, there would also be a couple of lseek(SEEK_END)
calls in between for planning...  I wonder how many more
syscall-taxing mitigations we need before relation size caching pays
off.

--
Thomas Munro
https://enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: PSA: New intel MDS vulnerability mitigations cause measurable slowdown

Andres Freund
Hi,

On 2019-05-15 12:52:47 +1200, Thomas Munro wrote:
> On Wed, May 15, 2019 at 10:31 AM Andres Freund <[hidden email]> wrote:
> > *Without* disabling SMT, for readonly pgbench, I'm seeing regressions
> > between 7-11%, depending on the size of shared_buffers (and some runtime
> > variations).  That's just on my laptop, with an i7-6820HQ / Haswell CPU.
> > I'd be surprised if there weren't adversarial loads with bigger
> > slowdowns - what gets more expensive with the mitigations is syscalls.
>
> Yikes.  This all in warm shared buffers, right?

Not initially, but it ought to warm up quite quickly. I ran something
boiling down to pgbench -q -i -s 200; psql -c 'vacuum (freeze, analyze,
verbose)'; pgbench -n -S -c 32 -j 32 -S -M prepared -T 100 -P1. As both
pgbench -i's COPY and VACUUM use ringbuffers, initially s_b will
effectively be empty.


> So effectively this is the cost of recvfrom() and sendto() going up?

Plus epoll_wait(). And read(), for the cases where s_b was smaller than
the data.


> Did you use -M prepared?

Yes.


> If not, there would also be a couple of lseek(SEEK_END) calls in
> between for planning...  I wonder how many more syscall-taxing
> mitigations we need before relation size caching pays off.

Yea, I suspect we're going to have to go there soon for a number of
reasons.

- Andres


Reply | Threaded
Open this post in threaded view
|

Re: PSA: New intel MDS vulnerability mitigations cause measurable slowdown

Andres Freund
In reply to this post by Andres Freund
Hi,

On 2019-05-14 15:30:52 -0700, Andres Freund wrote:

> There's a new set of CPU vulnerabilities, so far only affecting intel
> CPUs. Cribbing from the linux-kernel announcement I'm referring to
> https://xenbits.xen.org/xsa/advisory-297.html
> for details.
>
> The "fix" is for the OS to perform some extra mitigations:
> https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html
> https://www.kernel.org/doc/html/latest/x86/mds.html#mds
>
> *And* SMT/hyperthreading needs to be disabled, to be fully safe.
>
> Fun.
>
> I've run a quick pgbench benchmark:
>
> *Without* disabling SMT, for readonly pgbench, I'm seeing regressions
> between 7-11%, depending on the size of shared_buffers (and some runtime
> variations).  That's just on my laptop, with an i7-6820HQ / Haswell CPU.
> I'd be surprised if there weren't adversarial loads with bigger
> slowdowns - what gets more expensive with the mitigations is syscalls.

The profile after the mitigations looks like:

+    3.62%  postgres         [kernel.vmlinux]         [k] do_syscall_64
+    2.99%  postgres         postgres                 [.] _bt_compare
+    2.76%  postgres         postgres                 [.] hash_search_with_hash_value
+    2.33%  postgres         [kernel.vmlinux]         [k] entry_SYSCALL_64
+    1.69%  pgbench          [kernel.vmlinux]         [k] do_syscall_64
+    1.61%  postgres         postgres                 [.] AllocSetAlloc
     1.41%  postgres         postgres                 [.] PostgresMain
+    1.22%  pgbench          [kernel.vmlinux]         [k] entry_SYSCALL_64
+    1.11%  postgres         postgres                 [.] LWLockAcquire
+    0.86%  postgres         postgres                 [.] PinBuffer
+    0.80%  postgres         postgres                 [.] LockAcquireExtended
+    0.78%  postgres         [kernel.vmlinux]         [k] psi_task_change
     0.76%  pgbench          pgbench                  [.] threadRun
     0.69%  postgres         postgres                 [.] LWLockRelease
+    0.69%  postgres         postgres                 [.] SearchCatCache1
     0.66%  postgres         postgres                 [.] LockReleaseAll
+    0.65%  postgres         postgres                 [.] GetSnapshotData
+    0.58%  postgres         postgres                 [.] hash_seq_search
     0.54%  postgres         postgres                 [.] hash_search
+    0.53%  postgres         [kernel.vmlinux]         [k] __switch_to
+    0.53%  postgres         postgres                 [.] hash_any
     0.52%  pgbench          libpq.so.5.12            [.] pqParseInput3
     0.50%  pgbench          [kernel.vmlinux]         [k] do_raw_spin_lock

where do_syscall_64 show this instruction profile:

       │     static __always_inline bool arch_static_branch_jump(struct static_key *key, bool branch)
       │     {
       │             asm_volatile_goto("1:"
  1.58 │     ↓ jmpq   bd
       │     mds_clear_cpu_buffers():
       │              * Works with any segment selector, but a valid writable
       │              * data segment is the fastest variant.
       │              *
       │              * "cc" clobber is required because VERW modifies ZF.
       │              */
       │             asm volatile("verw %[ds]" : : [ds] "m" (ds) : "cc");
 77.38 │       verw   0x13fea53(%rip)        # ffffffff82400ee0 <ds.4768>
       │     do_syscall_64():
       │             }
       │
       │             syscall_return_slowpath(regs);
       │     }
 13.18 │ bd:   pop    %rbx
  0.08 │       pop    %rbp
       │     ← retq
       │                     nr = syscall_trace_enter(regs);
       │ c0:   mov    %rbp,%rdi
       │     → callq  syscall_trace_enter


Where verw is the instruction that was recycled to now have the
side-effect of flushing CPU buffers.

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: PSA: New intel MDS vulnerability mitigations cause measurable slowdown

Thomas Munro-5
On Wed, May 15, 2019 at 1:13 PM Andres Freund <[hidden email]> wrote:
> > I've run a quick pgbench benchmark:
> >
> > *Without* disabling SMT, for readonly pgbench, I'm seeing regressions
> > between 7-11%, depending on the size of shared_buffers (and some runtime
> > variations).  That's just on my laptop, with an i7-6820HQ / Haswell CPU.
> > I'd be surprised if there weren't adversarial loads with bigger
> > slowdowns - what gets more expensive with the mitigations is syscalls.

This stuff landed in my FreeBSD 13.0-CURRENT kernel, so I was curious
to measure it with and without the earlier mitigations.  On my humble
i7-8550U laptop with the new 1.22 microcode installed, with my usual
settings of PTI=on and IBRS=off, so far MDS=VERW gives me ~1.5% loss
of TPS with a single client, up to 4.3% loss of TPS for 16 clients,
but it didn't go higher when I tried 32 clients.  This was a tiny
scale 10 database, though in a quick test it didn't look like it was
worse with scale 100.

With all three mitigations activated, my little dev machine has gone
from being able to do ~11.8 million baseline syscalls per second to
~1.6 million, or ~1.4 million with the AVX variant of the mitigation.

Raw getuid() syscalls per second:

  PTI   IBRS  MDS=off  MDS=VERW MDS=AVX
  ===== ===== ======== ======== ========
  off   off   11798658  4764159  3274043
  off   on     2652564  1941606  1655356
  on    off    4973053  2932906  2339779
  on    on     1988527  1556922  1378798

pgbench read-only transactions per second, 1 client thread:

  PTI   IBRS  MDS=off  MDS=VERW MDS=AVX
  ===== ===== ======== ======== ========
  off   off      19393    18949    18615
  off   on       17946    17586    17323
  on    off      19381    19015    18696
  on    on       18045    17709    17418

pgbench -M prepared read-only transactions per second, 1 client thread:

  PTI   IBRS  MDS=off  MDS=VERW MDS=AVX
  ===== ===== ======== ======== ========
  off   off      35020    34049    33200
  off   on       31658    30902    30229
  on    off      35445    34353    33415
  on    on       32415    31599    30712

pgbench -M prepared read-only transactions per second, 4 client threads:

  PTI   IBRS  MDS=off  MDS=VERW MDS=AVX
  ===== ===== ======== ======== ========
  off   off      79515    76898    76465
  off   on       63608    62220    61952
  on    off      77863    75431    74847
  on    on       62709    60790    60575

pgbench -M prepared read-only transactions per second, 16 client threads:

  PTI   IBRS  MDS=off  MDS=VERW MDS=AVX
  ===== ===== ======== ======== ========
  off   off     125984   121164   120468
  off   on      112884   108346   107984
  on    off     121032   116156   115462
  on    on      108889   104636   104027

time gmake -s check:

  PTI   IBRS  MDS=off  MDS=VERW MDS=AVX
  ===== ===== ======== ======== ========
  off   off      16.78    16.85    17.03
  off   on       18.19    18.81    19.08
  on    off      16.67    16.86    17.33
  on    on       18.58    18.83    18.99

--
Thomas Munro
https://enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: PSA: New intel MDS vulnerability mitigations cause measurable slowdown

Albert Cervera i Areny
Missatge de Thomas Munro <[hidden email]> del dia dj., 16 de
maig 2019 a les 13:09:

>
> On Wed, May 15, 2019 at 1:13 PM Andres Freund <[hidden email]> wrote:
> > > I've run a quick pgbench benchmark:
> > >
> > > *Without* disabling SMT, for readonly pgbench, I'm seeing regressions
> > > between 7-11%, depending on the size of shared_buffers (and some runtime
> > > variations).  That's just on my laptop, with an i7-6820HQ / Haswell CPU.
> > > I'd be surprised if there weren't adversarial loads with bigger
> > > slowdowns - what gets more expensive with the mitigations is syscalls.
>
> This stuff landed in my FreeBSD 13.0-CURRENT kernel, so I was curious
> to measure it with and without the earlier mitigations.  On my humble
> i7-8550U laptop with the new 1.22 microcode installed, with my usual
> settings of PTI=on and IBRS=off, so far MDS=VERW gives me ~1.5% loss
> of TPS with a single client, up to 4.3% loss of TPS for 16 clients,
> but it didn't go higher when I tried 32 clients.  This was a tiny
> scale 10 database, though in a quick test it didn't look like it was
> worse with scale 100.
>
> With all three mitigations activated, my little dev machine has gone
> from being able to do ~11.8 million baseline syscalls per second to

Did you mean "1.8"?

> ~1.6 million, or ~1.4 million with the AVX variant of the mitigation.
>
> Raw getuid() syscalls per second:
>
>   PTI   IBRS  MDS=off  MDS=VERW MDS=AVX
>   ===== ===== ======== ======== ========
>   off   off   11798658  4764159  3274043
>   off   on     2652564  1941606  1655356
>   on    off    4973053  2932906  2339779
>   on    on     1988527  1556922  1378798
>
> pgbench read-only transactions per second, 1 client thread:
>
>   PTI   IBRS  MDS=off  MDS=VERW MDS=AVX
>   ===== ===== ======== ======== ========
>   off   off      19393    18949    18615
>   off   on       17946    17586    17323
>   on    off      19381    19015    18696
>   on    on       18045    17709    17418
>
> pgbench -M prepared read-only transactions per second, 1 client thread:
>
>   PTI   IBRS  MDS=off  MDS=VERW MDS=AVX
>   ===== ===== ======== ======== ========
>   off   off      35020    34049    33200
>   off   on       31658    30902    30229
>   on    off      35445    34353    33415
>   on    on       32415    31599    30712
>
> pgbench -M prepared read-only transactions per second, 4 client threads:
>
>   PTI   IBRS  MDS=off  MDS=VERW MDS=AVX
>   ===== ===== ======== ======== ========
>   off   off      79515    76898    76465
>   off   on       63608    62220    61952
>   on    off      77863    75431    74847
>   on    on       62709    60790    60575
>
> pgbench -M prepared read-only transactions per second, 16 client threads:
>
>   PTI   IBRS  MDS=off  MDS=VERW MDS=AVX
>   ===== ===== ======== ======== ========
>   off   off     125984   121164   120468
>   off   on      112884   108346   107984
>   on    off     121032   116156   115462
>   on    on      108889   104636   104027
>
> time gmake -s check:
>
>   PTI   IBRS  MDS=off  MDS=VERW MDS=AVX
>   ===== ===== ======== ======== ========
>   off   off      16.78    16.85    17.03
>   off   on       18.19    18.81    19.08
>   on    off      16.67    16.86    17.33
>   on    on       18.58    18.83    18.99
>
> --
> Thomas Munro
> https://enterprisedb.com
>
>


--
Albert Cervera i Areny
http://www.NaN-tic.com
Tel. 93 553 18 03


Reply | Threaded
Open this post in threaded view
|

Re: PSA: New intel MDS vulnerability mitigations cause measurable slowdown

Chapman Flack
On 5/16/19 12:24 PM, Albert Cervera i Areny wrote:
> Missatge de Thomas Munro <[hidden email]> del dia dj., 16 de
> maig 2019 a les 13:09:
>> With all three mitigations activated, my little dev machine has gone
>> from being able to do ~11.8 million baseline syscalls per second to
>
> Did you mean "1.8"?

Not in what I thought I saw:

>> ~1.6 million, or ~1.4 million ...
>>
>>   PTI   IBRS  MDS=off  MDS=VERW MDS=AVX
>>   ===== ===== ======== ======== ========
>>   off   off   11798658  4764159  3274043
                 ^^^^^^^^
>>   off   on     2652564  1941606  1655356
>>   on    off    4973053  2932906  2339779
>>   on    on     1988527  1556922  1378798
                           ^^^^^^^  ^^^^^^^

-Chap


Reply | Threaded
Open this post in threaded view
|

Re: PSA: New intel MDS vulnerability mitigations cause measurable slowdown

Thomas Munro-5
On Fri, May 17, 2019 at 5:26 AM Chapman Flack <[hidden email]> wrote:

> On 5/16/19 12:24 PM, Albert Cervera i Areny wrote:
> > Missatge de Thomas Munro <[hidden email]> del dia dj., 16 de
> > maig 2019 a les 13:09:
> >> With all three mitigations activated, my little dev machine has gone
> >> from being able to do ~11.8 million baseline syscalls per second to
> >
> > Did you mean "1.8"?
>
> Not in what I thought I saw:
>
> >> ~1.6 million, or ~1.4 million ...
> >>
> >>   PTI   IBRS  MDS=off  MDS=VERW MDS=AVX
> >>   ===== ===== ======== ======== ========
> >>   off   off   11798658  4764159  3274043
>                  ^^^^^^^^
> >>   off   on     2652564  1941606  1655356
> >>   on    off    4973053  2932906  2339779
> >>   on    on     1988527  1556922  1378798
>                            ^^^^^^^  ^^^^^^^

Right.  Actually it's worse than that -- after I posted I realised
that I had some debug stuff enabled in my kernel that was slowing
things down a bit, so I reran the tests overnight with a production
kernel and here is what I see this morning.  It's actually ~17.8
million syscalls/sec -> ~1.7 million syscalls/sec, if you go from all
mitigations off to all mitigations on, or -> ~3.2 million for just PTI
+ MDS.  And the loss of TPS is ~5% for the case I was most interested
in, just turning on MDS=VERW if you already had PTI on and IBRS off.

Raw getuid() syscalls per second:

  PTI   IBRS  MDS=off  MDS=VERW MDS=AVX
  ===== ===== ======== ======== ========
  off   off   17771744  5372032  3575035
  off   on     3060923  2166527  1817052
  on    off    5622591  3150883  2463934
  on    on     2213190  1687748  1475605

pgbench read-only transactions per second, 1 client thread:

  PTI   IBRS  MDS=off  MDS=VERW MDS=AVX
  ===== ===== ======== ======== ========
  off   off      22414    22103    21571
  off   on       21298    20817    20418
  on    off      22473    22080    21550
  on    on       21286    20850    20386

pgbench -M prepared read-only transactions per second, 1 client thread:

  PTI   IBRS  MDS=off  MDS=VERW MDS=AVX
  ===== ===== ======== ======== ========
  off   off      43508    42476    41123
  off   on       40729    39483    38555
  on    off      44110    42989    42012
  on    on       41143    39990    38798

pgbench -M prepared read-only transactions per second, 4 client threads:

  PTI   IBRS  MDS=off  MDS=VERW MDS=AVX
  ===== ===== ======== ======== ========
  off   off     100735    97689    96662
  off   on       80142    77804    77064
  on    off     100540    97010    95827
  on    on       79492    76976    76226

pgbench -M prepared read-only transactions per second, 16 client threads:

  PTI   IBRS  MDS=off  MDS=VERW MDS=AVX
  ===== ===== ======== ======== ========
  off   off     161015   152978   152556
  off   on      145605   139438   139179
  on    off     155359   147691   146987
  on    on      140976   134978   134177

pgbench -M prepared read-only transactions per second, 16 client threads:

  PTI   IBRS  MDS=off  MDS=VERW MDS=AVX
  ===== ===== ======== ======== ========
  off   off     157986   150132   149436
  off   on      142618   136220   135901
  on    off     153482   146214   145839
  on    on      138650   133074   132142

--
Thomas Munro
https://enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: PSA: New intel MDS vulnerability mitigations cause measurable slowdown

Thomas Munro-5
On Fri, May 17, 2019 at 9:42 AM Thomas Munro <[hidden email]> wrote:
> pgbench -M prepared read-only transactions per second, 16 client threads:

(That second "16 client threads" line should read "32 client threads".)

--
Thomas Munro
https://enterprisedb.com