Postgresql server gets stuck at low load

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Postgresql server gets stuck at low load

kolszew73@gmail.com
I have problem with one of my Postgres production server. Server works fine almost always, but sometimes without any increase of transactions or statements amount, machine gets stuck. Cores goes up to 100%, load up to 160%. When it happens then there are problems with connect to database and even it will succeed, simple queries works several seconds instead of milliseconds.Problem sometimes stops after a period a time (e.g. 35 min), sometimes we must restart Postgres, Linux, or even KVM (which exists as virtualization host).

My hardware
56 cores (Intel Core Processor (Skylake, IBRS))
400 GB RAM
RAID10 with about 40k IOPS

Os
CentOS Linux release 7.7.1908
kernel 3.10.0-1062.18.1.el7.x86_64

Databasesize 100 GB (entirely fit in memory :) )
server_version 10.12
effective_cache_size 192000 MB
maintenance_work_mem 2048 MB
max_connections 150
shared_buffers 64000 MB
work_mem 96 MB

On normal state, i have about 500 tps, 5% usage of cores, about 3% of load, whole database fits in memory, no reads from disk, only writes on about 500 IOPS level, sometimes in spikes on 1500 IOPS level, but on this hardware there is no problem with this values (no iowaits on cores). In normal state this machine does "nothing". Connections to database are created by two app servers based on Java, through connection pools, so connections count is limited by configuration of pools and max is 120, is lower value than in Postgres configuration (150). On normal state there is about 20 connections, when stuck goes into max (120).

In correlation with stucks i see informations in kernel log about
NMI watchdog: BUG: soft lockup - CPU#25 stuck for 23s! [postmaster:33935]
but i don't know this is reason or effect of problem
I made investigation with pgBadger and ... nothing strange happens, just normal statements

Any ideas?

Thanks,
Kris


Reply | Threaded
Open this post in threaded view
|

Re: Postgresql server gets stuck at low load

luis.roberto

De: "Krzysztof Olszewski" <[hidden email]>
Para: [hidden email]
Enviadas: Sexta-feira, 5 de junho de 2020 7:07:02
Assunto: Postgresql server gets stuck at low load

I have problem with one of my Postgres production server. Server works fine almost always, but sometimes without any increase of transactions or statements amount, machine gets stuck. Cores goes up to 100%, load up to 160%. When it happens then there are problems with connect to database and even it will succeed, simple queries works several seconds instead of milliseconds.Problem sometimes stops after a period a time (e.g. 35 min), sometimes we must restart Postgres, Linux, or even KVM (which exists as virtualization host).
My hardware56 cores (Intel Core Processor (Skylake, IBRS))400 GB RAMRAID10 with about 40k IOPS
Os
CentOS Linux release 7.7.1908
kernel 3.10.0-1062.18.1.el7.x86_64 Databasesize 100 GB (entirely fit in memory :) )server_version 10.12effective_cache_size 192000 MBmaintenance_work_mem 2048 MBmax_connections 150 shared_buffers 64000 MBwork_mem 96 MBOn normal state, i have about 500 tps, 5% usage of cores, about 3% of load, whole database fits in memory, no reads from disk, only writes on about 500 IOPS level, sometimes in spikes on 1500 IOPS level, but on this hardware there is no problem with this values (no iowaits on cores). In normal state this machine does "nothing". Connections to database are created by two app servers based on Java, through connection pools, so connections count is limited by configuration of pools and max is 120, is lower value than in Postgres configuration (150). On normal state there is about 20 connections, when stuck goes into max (120).In correlation with stucks i see informations in kernel log aboutNMI watchdog: BUG: soft lockup - CPU#25 stuck for 23s! [postmaster:33935]but i don't know this is reason or effect of problemI made investigation with pgBadger and ... nothing strange happens, just normal statements Any ideas? Thanks,
Kris
Hi Krzysztof!

I would enable pg_stat_statements extension and check if there are long running queries that should be quick.
Reply | Threaded
Open this post in threaded view
|

Re: Postgresql server gets stuck at low load

Pavel Stehule
In reply to this post by kolszew73@gmail.com


pá 5. 6. 2020 v 12:07 odesílatel Krzysztof Olszewski <[hidden email]> napsal:
I have problem with one of my Postgres production server. Server works fine almost always, but sometimes without any increase of transactions or statements amount, machine gets stuck. Cores goes up to 100%, load up to 160%. When it happens then there are problems with connect to database and even it will succeed, simple queries works several seconds instead of milliseconds.Problem sometimes stops after a period a time (e.g. 35 min), sometimes we must restart Postgres, Linux, or even KVM (which exists as virtualization host).

My hardware
56 cores (Intel Core Processor (Skylake, IBRS))
400 GB RAM
RAID10 with about 40k IOPS

Os
CentOS Linux release 7.7.1908
kernel 3.10.0-1062.18.1.el7.x86_64

Databasesize 100 GB (entirely fit in memory :) )
server_version 10.12
effective_cache_size 192000 MB
maintenance_work_mem 2048 MB
max_connections 150
shared_buffers 64000 MB
work_mem 96 MB

On normal state, i have about 500 tps, 5% usage of cores, about 3% of load, whole database fits in memory, no reads from disk, only writes on about 500 IOPS level, sometimes in spikes on 1500 IOPS level, but on this hardware there is no problem with this values (no iowaits on cores). In normal state this machine does "nothing". Connections to database are created by two app servers based on Java, through connection pools, so connections count is limited by configuration of pools and max is 120, is lower value than in Postgres configuration (150). On normal state there is about 20 connections, when stuck goes into max (120).

In correlation with stucks i see informations in kernel log about
NMI watchdog: BUG: soft lockup - CPU#25 stuck for 23s! [postmaster:33935]
but i don't know this is reason or effect of problem
I made investigation with pgBadger and ... nothing strange happens, just normal statements

Any ideas?

you can try to install perf + debug symbols for postgres. When you will have this problem again run "perf top". You can see what routines eat your CPU.

Maybe it can be a spinlock problem


Can be interesting a reply on Merlin's question from mail/.

cat /sys/kernel/mm/redhat_transparent_hugepage/enabled
cat /sys/kernel/mm/redhat_transparent_hugepage/defrag

Regards

Pavel

 

Thanks,
Kris


Reply | Threaded
Open this post in threaded view
|

Re: Postgresql server gets stuck at low load

kolszew73@gmail.com
In reply to this post by luis.roberto
I had log_min_duration_statement set to 0 for a short period, just before stuck and just after, so I have full list of SQL statements, next analyzed in pgBadger, there is no increase of amount of statements, and I can see, all statements are longer processed than before stuck. But following Your advice I'll check the results from pg_stat_statements.

pt., 5 cze 2020 o 13:16 <[hidden email]> napisał(a):

De: "Krzysztof Olszewski" <[hidden email]>
Para: [hidden email]
Enviadas: Sexta-feira, 5 de junho de 2020 7:07:02
Assunto: Postgresql server gets stuck at low load

I have problem with one of my Postgres production server. Server works fine almost always, but sometimes without any increase of transactions or statements amount, machine gets stuck. Cores goes up to 100%, load up to 160%. When it happens then there are problems with connect to database and even it will succeed, simple queries works several seconds instead of milliseconds.Problem sometimes stops after a period a time (e.g. 35 min), sometimes we must restart Postgres, Linux, or even KVM (which exists as virtualization host).
My hardware56 cores (Intel Core Processor (Skylake, IBRS))400 GB RAMRAID10 with about 40k IOPS
Os
CentOS Linux release 7.7.1908
kernel 3.10.0-1062.18.1.el7.x86_64 Databasesize 100 GB (entirely fit in memory :) )server_version 10.12effective_cache_size 192000 MBmaintenance_work_mem 2048 MBmax_connections 150 shared_buffers 64000 MBwork_mem 96 MBOn normal state, i have about 500 tps, 5% usage of cores, about 3% of load, whole database fits in memory, no reads from disk, only writes on about 500 IOPS level, sometimes in spikes on 1500 IOPS level, but on this hardware there is no problem with this values (no iowaits on cores). In normal state this machine does "nothing". Connections to database are created by two app servers based on Java, through connection pools, so connections count is limited by configuration of pools and max is 120, is lower value than in Postgres configuration (150). On normal state there is about 20 connections, when stuck goes into max (120).In correlation with stucks i see informations in kernel log aboutNMI watchdog: BUG: soft lockup - CPU#25 stuck for 23s! [postmaster:33935]but i don't know this is reason or effect of problemI made investigation with pgBadger and ... nothing strange happens, just normal statements Any ideas? Thanks,
Kris
Hi Krzysztof!

I would enable pg_stat_statements extension and check if there are long running queries that should be quick.
Reply | Threaded
Open this post in threaded view
|

Re: Postgresql server gets stuck at low load

kolszew73@gmail.com
In reply to this post by Pavel Stehule
I had hugepage's off and on, problems still occurs,
thanx for "perf top" suggestion,

Retards
Kris



pt., 5 cze 2020 o 13:38 Pavel Stehule <[hidden email]> napisał(a):


pá 5. 6. 2020 v 12:07 odesílatel Krzysztof Olszewski <[hidden email]> napsal:
I have problem with one of my Postgres production server. Server works fine almost always, but sometimes without any increase of transactions or statements amount, machine gets stuck. Cores goes up to 100%, load up to 160%. When it happens then there are problems with connect to database and even it will succeed, simple queries works several seconds instead of milliseconds.Problem sometimes stops after a period a time (e.g. 35 min), sometimes we must restart Postgres, Linux, or even KVM (which exists as virtualization host).

My hardware
56 cores (Intel Core Processor (Skylake, IBRS))
400 GB RAM
RAID10 with about 40k IOPS

Os
CentOS Linux release 7.7.1908
kernel 3.10.0-1062.18.1.el7.x86_64

Databasesize 100 GB (entirely fit in memory :) )
server_version 10.12
effective_cache_size 192000 MB
maintenance_work_mem 2048 MB
max_connections 150
shared_buffers 64000 MB
work_mem 96 MB

On normal state, i have about 500 tps, 5% usage of cores, about 3% of load, whole database fits in memory, no reads from disk, only writes on about 500 IOPS level, sometimes in spikes on 1500 IOPS level, but on this hardware there is no problem with this values (no iowaits on cores). In normal state this machine does "nothing". Connections to database are created by two app servers based on Java, through connection pools, so connections count is limited by configuration of pools and max is 120, is lower value than in Postgres configuration (150). On normal state there is about 20 connections, when stuck goes into max (120).

In correlation with stucks i see informations in kernel log about
NMI watchdog: BUG: soft lockup - CPU#25 stuck for 23s! [postmaster:33935]
but i don't know this is reason or effect of problem
I made investigation with pgBadger and ... nothing strange happens, just normal statements

Any ideas?

you can try to install perf + debug symbols for postgres. When you will have this problem again run "perf top". You can see what routines eat your CPU.

Maybe it can be a spinlock problem


Can be interesting a reply on Merlin's question from mail/.

cat /sys/kernel/mm/redhat_transparent_hugepage/enabled
cat /sys/kernel/mm/redhat_transparent_hugepage/defrag

Regards

Pavel

 

Thanks,
Kris


Reply | Threaded
Open this post in threaded view
|

Re: Postgresql server gets stuck at low load

Avinash Kumar
In reply to this post by kolszew73@gmail.com
Hi,

On Fri, Jun 5, 2020 at 7:07 AM Krzysztof Olszewski <[hidden email]> wrote:
I have problem with one of my Postgres production server. Server works fine almost always, but sometimes without any increase of transactions or statements amount, machine gets stuck. Cores goes up to 100%, load up to 160%. When it happens then there are problems with connect to database and even it will succeed, simple queries works several seconds instead of milliseconds.Problem sometimes stops after a period a time (e.g. 35 min), sometimes we must restart Postgres, Linux, or even KVM (which exists as virtualization host).

My hardware
56 cores (Intel Core Processor (Skylake, IBRS))
400 GB RAM
RAID10 with about 40k IOPS

Os
CentOS Linux release 7.7.1908
kernel 3.10.0-1062.18.1.el7.x86_64

Databasesize 100 GB (entirely fit in memory :) )
server_version 10.12
effective_cache_size 192000 MB
maintenance_work_mem 2048 MB
max_connections 150
shared_buffers 64000 MB
work_mem 96 MB
What is the value set to random_page_cost ? 
Set to 1 (same as default seq_page_cost) for a moment and try it. 

On normal state, i have about 500 tps, 5% usage of cores, about 3% of load, whole database fits in memory, no reads from disk, only writes on about 500 IOPS level, sometimes in spikes on 1500 IOPS level, but on this hardware there is no problem with this values (no iowaits on cores). In normal state this machine does "nothing". Connections to database are created by two app servers based on Java, through connection pools, so connections count is limited by configuration of pools and max is 120, is lower value than in Postgres configuration (150). On normal state there is about 20 connections, when stuck goes into max (120).

In correlation with stucks i see informations in kernel log about
NMI watchdog: BUG: soft lockup - CPU#25 stuck for 23s! [postmaster:33935]
but i don't know this is reason or effect of problem
I made investigation with pgBadger and ... nothing strange happens, just normal statements

Any ideas?

Thanks,
Kris




--
Regards,
Avinash Vallarapu
Reply | Threaded
Open this post in threaded view
|

Re: Postgresql server gets stuck at low load

kolszew73@gmail.com
random_page_cost  == 1.1

wt., 9 cze 2020 o 14:01 Avinash Kumar <[hidden email]> napisał(a):
Hi,

On Fri, Jun 5, 2020 at 7:07 AM Krzysztof Olszewski <[hidden email]> wrote:
I have problem with one of my Postgres production server. Server works fine almost always, but sometimes without any increase of transactions or statements amount, machine gets stuck. Cores goes up to 100%, load up to 160%. When it happens then there are problems with connect to database and even it will succeed, simple queries works several seconds instead of milliseconds.Problem sometimes stops after a period a time (e.g. 35 min), sometimes we must restart Postgres, Linux, or even KVM (which exists as virtualization host).

My hardware
56 cores (Intel Core Processor (Skylake, IBRS))
400 GB RAM
RAID10 with about 40k IOPS

Os
CentOS Linux release 7.7.1908
kernel 3.10.0-1062.18.1.el7.x86_64

Databasesize 100 GB (entirely fit in memory :) )
server_version 10.12
effective_cache_size 192000 MB
maintenance_work_mem 2048 MB
max_connections 150
shared_buffers 64000 MB
work_mem 96 MB
What is the value set to random_page_cost ? 
Set to 1 (same as default seq_page_cost) for a moment and try it. 

On normal state, i have about 500 tps, 5% usage of cores, about 3% of load, whole database fits in memory, no reads from disk, only writes on about 500 IOPS level, sometimes in spikes on 1500 IOPS level, but on this hardware there is no problem with this values (no iowaits on cores). In normal state this machine does "nothing". Connections to database are created by two app servers based on Java, through connection pools, so connections count is limited by configuration of pools and max is 120, is lower value than in Postgres configuration (150). On normal state there is about 20 connections, when stuck goes into max (120).

In correlation with stucks i see informations in kernel log about
NMI watchdog: BUG: soft lockup - CPU#25 stuck for 23s! [postmaster:33935]
but i don't know this is reason or effect of problem
I made investigation with pgBadger and ... nothing strange happens, just normal statements

Any ideas?

Thanks,
Kris




--
Regards,
Avinash Vallarapu
Reply | Threaded
Open this post in threaded view
|

Re: Postgresql server gets stuck at low load

Justin Pryzby
In reply to this post by kolszew73@gmail.com
On Tue, Jun 09, 2020 at 01:54:21PM +0200, Krzysztof Olszewski wrote:
>  I had hugepage's off and on, problems still occurs,
> thanx for "perf top" suggestion,

> pt., 5 cze 2020 o 13:38 Pavel Stehule <[hidden email]> napisał(a):
> > pá 5. 6. 2020 v 12:07 odesílatel Krzysztof Olszewski <[hidden email]> napsal:
> >
> >> I have problem with one of my Postgres production server. Server works
> >> fine almost always, but sometimes without any increase of transactions or
> >> statements amount, machine gets stuck. Cores goes up to 100%, load up to
> >> 160%. When it happens then there are problems with connect to database and
> >> even it will succeed, simple queries works several seconds instead of
> >> milliseconds.Problem sometimes stops after a period a time (e.g. 35 min),
> >> sometimes we must restart Postgres, Linux, or even KVM (which exists as
> >> virtualization host).
> >>
> >> My hardware
> >> 56 cores (Intel Core Processor (Skylake, IBRS))
> >> 400 GB RAM
> >> RAID10 with about 40k IOPS
> >>
> >> shared_buffers 64000 MB
> >>
> >> In correlation with stucks i see informations in kernel log about
> >> NMI watchdog: BUG: soft lockup - CPU#25 stuck for 23s! [postmaster:33935]
> >
> > https://www.postgresql.org/message-id/CAHyXU0yAsVxoab2PcyoCuPjqymtnaE93v7bN4ctv2aNi92fefA%40mail.gmail.com
> >
> > Can be interesting a reply on Merlin's question from mail/.
> >
> > cat /sys/kernel/mm/redhat_transparent_hugepage/enabled
> > cat /sys/kernel/mm/redhat_transparent_hugepage/defrag

try this:
echo 2 |sudo /sys/kernel/mm/ksm/run

https://www.postgresql.org/message-id/20170718180152.GE17566%40telsasoft.com

--
Justin