Service outage: each postgres process use the exact amount of the configured work_mem

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

Service outage: each postgres process use the exact amount of the configured work_mem

Moreno Andreo
Hi all,
About 2 hours and half ago, suddenly (and on the late afternoon of the Easter Friday), customers reported failing connections to our server, or even very slow.
After a bit of checking (that also involved server reboot) I noticed (using top) that every process regarding postgres is using exactly the amout I configured as work_mem (3 GB). And AFAIK it's not good.

30085 postgres 20 0 3370048 156656 153876 S 6.7 0.3 0:00.20 postgres

29833 postgres 20 0 3370000 65260 62416 S 1.7 0.1 0:00.17 postgres
29632 postgres 20 0 3372468 11712 6028 S 0.7 0.0 0:00.60 postgres

What can be happened?
Nothing has been touched....
postgresql 9.5.6 on debian 8 just apt-get upgrade'd

Any help would be appreciated.
Moreno.
Reply | Threaded
Open this post in threaded view
|

Re: Service outage: each postgres process use the exact amount of the configured work_mem

Melvin Davidson-5


On Fri, Apr 14, 2017 at 1:12 PM, Moreno Andreo <[hidden email]> wrote:
Hi all,
About 2 hours and half ago, suddenly (and on the late afternoon of the Easter Friday), customers reported failing connections to our server, or even very slow.
After a bit of checking (that also involved server reboot) I noticed (using top) that every process regarding postgres is using exactly the amout I configured as work_mem (3 GB). And AFAIK it's not good.

30085 postgres 20 0 3370048 156656 153876 S 6.7 0.3 0:00.20 postgres
29833 postgres 20 0 3370000 65260 62416 S 1.7 0.1 0:00.17 postgres
29632 postgres 20 0 3372468 11712 6028 S 0.7 0.0 0:00.60 postgres

What can be happened?
Nothing has been touched....
postgresql 9.5.6 on debian 8 just apt-get upgrade'd

Any help would be appreciated.
Moreno.

>using exactly the amout I configured as work_mem (3 GB).

You are right, that is bad, but that is your own fault. 3GB of work_mem is very bad, Try lowing in to something more reasonable, like 20GB.

https://www.postgresql.org/docs/9.5/static/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-MEMORY
"several running sessions could be doing such operations concurrently. Therefore, the total memory used could be many times the value of work_mem; it is necessary to keep this fact in mind when choosing the value."

--
Melvin Davidson
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.

Reply | Threaded
Open this post in threaded view
|

Re: Service outage: each postgres process use the exact amount of the configured work_mem

Moreno Andreo
Sorry,
 my mistake (I'm a bit nervous...)

that's not work_mem, but shared_buffers

Thanks

Il 14/04/2017 19:33, Melvin Davidson ha scritto:


On Fri, Apr 14, 2017 at 1:12 PM, Moreno Andreo <[hidden email]> wrote:
Hi all,
About 2 hours and half ago, suddenly (and on the late afternoon of the Easter Friday), customers reported failing connections to our server, or even very slow.
After a bit of checking (that also involved server reboot) I noticed (using top) that every process regarding postgres is using exactly the amout I configured as work_mem (3 GB). And AFAIK it's not good.

30085 postgres 20 0 3370048 156656 153876 S 6.7 0.3 0:00.20 postgres
29833 postgres 20 0 3370000 65260 62416 S 1.7 0.1 0:00.17 postgres
29632 postgres 20 0 3372468 11712 6028 S 0.7 0.0 0:00.60 postgres

What can be happened?
Nothing has been touched....
postgresql 9.5.6 on debian 8 just apt-get upgrade'd

Any help would be appreciated.
Moreno.

>using exactly the amout I configured as work_mem (3 GB).

You are right, that is bad, but that is your own fault. 3GB of work_mem is very bad, Try lowing in to something more reasonable, like 20GB.

https://www.postgresql.org/docs/9.5/static/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-MEMORY
"several running sessions could be doing such operations concurrently. Therefore, the total memory used could be many times the value of work_mem; it is necessary to keep this fact in mind when choosing the value."

--
Melvin Davidson
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.


Reply | Threaded
Open this post in threaded view
|

Re: Service outage: each postgres process use the exact amount of the configured work_mem

Melvin Davidson-5

On Fri, Apr 14, 2017 at 1:50 PM, Moreno Andreo <[hidden email]> wrote:
Sorry,
 my mistake (I'm a bit nervous...)

that's not work_mem, but shared_buffers

Thanks


Il 14/04/2017 19:33, Melvin Davidson ha scritto:


On Fri, Apr 14, 2017 at 1:12 PM, Moreno Andreo <[hidden email]> wrote:
Hi all,
About 2 hours and half ago, suddenly (and on the late afternoon of the Easter Friday), customers reported failing connections to our server, or even very slow.
After a bit of checking (that also involved server reboot) I noticed (using top) that every process regarding postgres is using exactly the amout I configured as work_mem (3 GB). And AFAIK it's not good.

30085 postgres 20 0 3370048 156656 153876 S 6.7 0.3 0:00.20 postgres
29833 postgres 20 0 3370000 65260 62416 S 1.7 0.1 0:00.17 postgres
29632 postgres 20 0 3372468 11712 6028 S 0.7 0.0 0:00.60 postgres

What can be happened?
Nothing has been touched....
postgresql 9.5.6 on debian 8 just apt-get upgrade'd

Any help would be appreciated.
Moreno.

>using exactly the amout I configured as work_mem (3 GB).

You are right, that is bad, but that is your own fault. 3GB of work_mem is very bad, Try lowing in to something more reasonable, like 20GB.

https://www.postgresql.org/docs/9.5/static/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-MEMORY
"several running sessions could be doing such operations concurrently. Therefore, the total memory used could be many times the value of work_mem; it is necessary to keep this fact in mind when choosing the value."

--
Melvin Davidson
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.



Moreno,
we are working with minimal information here.
Perhaps if you provided us with the following information it would be more useful,

A. Total SYSTEM MEMORY
B. shared_memory
C. work_memory
D. max_connections
E. How many users were connected when the problem occurred?

--
Melvin Davidson
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.

Reply | Threaded
Open this post in threaded view
|

Re: Service outage: each postgres process use the exact amount of the configured work_mem

Jeff Janes
In reply to this post by Moreno Andreo
On Fri, Apr 14, 2017 at 10:12 AM, Moreno Andreo <[hidden email]> wrote:
Hi all,
About 2 hours and half ago, suddenly (and on the late afternoon of the Easter Friday), customers reported failing connections to our server, or even very slow.
After a bit of checking (that also involved server reboot) I noticed (using top) that every process regarding postgres is using exactly the amout I configured as work_mem (3 GB). And AFAIK it's not good.

30085 postgres 20 0 3370048 156656 153876 S 6.7 0.3 0:00.20 postgres
29833 postgres 20 0 3370000 65260 62416 S 1.7 0.1 0:00.17 postgres
29632 postgres 20 0 3372468 11712 6028 S 0.7 0.0 0:00.60 postgres

They are all sharing the same shared_buffers (not work_mem), and that shared space is being listed for each one of them.  

This is unlikely to be related to your current problems.  Nothing shown in that output is alarming.
 
What can be happened?

What error messages are the customers getting when they fail to connect?  What error messages are showing up in the server log file?  What do you see in pg_stat_actvity?

Cheers,

Jeff
Reply | Threaded
Open this post in threaded view
|

Re: Service outage: each postgres process use the exact amount of the configured work_mem

Christian Mair
In reply to this post by Moreno Andreo
> Sorry,
>   my mistake (I'm a bit nervous...)
>
> that's not work_mem, but shared_buffers

Hi.

The resident set size of the worker processes includes all shared memory blocks they touched.
So it's not that each of those workers allocated their own 3GB...

(in Linux at least)

Bye,
Chris.






--
Sent via pgsql-general mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Reply | Threaded
Open this post in threaded view
|

Re: Service outage: each postgres process use the exact amount of the configured work_mem

Moreno Andreo
In reply to this post by Melvin Davidson-5
Melvin,
    Sorry for top-posting.   
I'm going ahead in troubleshooting. As Jeff said, there's probably nothing wrong with my values (at the end of the message you can find minimal info you requested).
I tried running some queries against psql server and response times are good, so I'm moving my attentions to Windows server, which hosts a WCF service, that is the one that actually server customers.

Thanks for now
Moreno

Il 14/04/2017 20:01, Melvin Davidson ha scritto:

On Fri, Apr 14, 2017 at 1:50 PM, Moreno Andreo <[hidden email]> wrote:
Sorry,
 my mistake (I'm a bit nervous...)

that's not work_mem, but shared_buffers

Thanks


Il 14/04/2017 19:33, Melvin Davidson ha scritto:


On Fri, Apr 14, 2017 at 1:12 PM, Moreno Andreo <[hidden email]> wrote:
Hi all,
About 2 hours and half ago, suddenly (and on the late afternoon of the Easter Friday), customers reported failing connections to our server, or even very slow.
After a bit of checking (that also involved server reboot) I noticed (using top) that every process regarding postgres is using exactly the amout I configured as work_mem (3 GB). And AFAIK it's not good.

30085 postgres 20 0 3370048 156656 153876 S 6.7 0.3 0:00.20 postgres
29833 postgres 20 0 3370000 65260 62416 S 1.7 0.1 0:00.17 postgres
29632 postgres 20 0 3372468 11712 6028 S 0.7 0.0 0:00.60 postgres

What can be happened?
Nothing has been touched....
postgresql 9.5.6 on debian 8 just apt-get upgrade'd

Any help would be appreciated.
Moreno.

>using exactly the amout I configured as work_mem (3 GB).

You are right, that is bad, but that is your own fault. 3GB of work_mem is very bad, Try lowing in to something more reasonable, like 20GB.

https://www.postgresql.org/docs/9.5/static/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-MEMORY
"several running sessions could be doing such operations concurrently. Therefore, the total memory used could be many times the value of work_mem; it is necessary to keep this fact in mind when choosing the value."

--
Melvin Davidson
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.



Moreno,
we are working with minimal information here.
Perhaps if you provided us with the following information it would be more useful,

A. Total SYSTEM MEMORY
52 GB
B. shared_memory
3 GB (was 13 GB)
C. work_memory
default
D. max_connections
1000
E. How many users were connected when the problem occurred?
About 350 connections

Thanks


--
Melvin Davidson
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.


Reply | Threaded
Open this post in threaded view
|

Re: Service outage: each postgres process use the exact amount of the configured work_mem

Melvin Davidson-5


On Fri, Apr 14, 2017 at 2:28 PM, Moreno Andreo <[hidden email]> wrote:
Melvin,
    Sorry for top-posting.   
I'm going ahead in troubleshooting. As Jeff said, there's probably nothing wrong with my values (at the end of the message you can find minimal info you requested).
I tried running some queries against psql server and response times are good, so I'm moving my attentions to Windows server, which hosts a WCF service, that is the one that actually server customers.

Thanks for now
Moreno


Il 14/04/2017 20:01, Melvin Davidson ha scritto:

On Fri, Apr 14, 2017 at 1:50 PM, Moreno Andreo <[hidden email]> wrote:
Sorry,
 my mistake (I'm a bit nervous...)

that's not work_mem, but shared_buffers

Thanks


Il 14/04/2017 19:33, Melvin Davidson ha scritto:


On Fri, Apr 14, 2017 at 1:12 PM, Moreno Andreo <[hidden email]> wrote:
Hi all,
About 2 hours and half ago, suddenly (and on the late afternoon of the Easter Friday), customers reported failing connections to our server, or even very slow.
After a bit of checking (that also involved server reboot) I noticed (using top) that every process regarding postgres is using exactly the amout I configured as work_mem (3 GB). And AFAIK it's not good.

30085 postgres 20 0 3370048 156656 153876 S 6.7 0.3 0:00.20 postgres
29833 postgres 20 0 3370000 65260 62416 S 1.7 0.1 0:00.17 postgres
29632 postgres 20 0 3372468 11712 6028 S 0.7 0.0 0:00.60 postgres

What can be happened?
Nothing has been touched....
postgresql 9.5.6 on debian 8 just apt-get upgrade'd

Any help would be appreciated.
Moreno.

>using exactly the amout I configured as work_mem (3 GB).

You are right, that is bad, but that is your own fault. 3GB of work_mem is very bad, Try lowing in to something more reasonable, like 20GB.

https://www.postgresql.org/docs/9.5/static/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-MEMORY
"several running sessions could be doing such operations concurrently. Therefore, the total memory used could be many times the value of work_mem; it is necessary to keep this fact in mind when choosing the value."

--
Melvin Davidson
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.



Moreno,
we are working with minimal information here.
Perhaps if you provided us with the following information it would be more useful,

A. Total SYSTEM MEMORY
52 GB
B. shared_memory
3 GB (was 13 GB)
C. work_memory
default
D. max_connections
1000
E. How many users were connected when the problem occurred?
About 350 connections

Thanks


--
Melvin Davidson
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.


Moreno,
I understand you will troubleshoot on your own, but based on total system memory, you should increase
shared_memory to 40GB. General philosphy is to allocate 80% of system memory to shared_memory. Of course you will have to do a PostgreSQL restart
for that to take effect.

Also, with 1000 max_connections, you would be much better off with a connection balancer like PgBouncer https://pgbouncer.github.io/downloads/


--
Melvin Davidson
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.

Reply | Threaded
Open this post in threaded view
|

Re: Service outage: each postgres process use the exact amount of the configured work_mem

Igor Neyman
In reply to this post by Moreno Andreo

 

E. How many users were connected when the problem occurred?

About 350 connections

Thanks

______________________________________________________________________________________

 

Probably that is your problem, if you don’t have connection pooler.

I’d recommend to start with PgBouncer, very light-weight and easy to configure.

 

Regards,

Igor Neyman

 

Reply | Threaded
Open this post in threaded view
|

Re: Service outage: each postgres process use the exact amount of the configured work_mem

Christian Mair
In reply to this post by Melvin Davidson-5
> you should increase shared_memory to 40GB. General philosphy is to allocate 80% of system memory to shared_memory

Uhm...

80% is too much, likely:

https://www.postgresql.org/docs/9.5/static/runtime-config-resource.html

Bye,
Chris.




--
Sent via pgsql-general mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Reply | Threaded
Open this post in threaded view
|

Re: Service outage: each postgres process use the exact amount of the configured work_mem

Melvin Davidson-5


On Fri, Apr 14, 2017 at 2:42 PM, Chris Mair <[hidden email]> wrote:
you should increase shared_memory to 40GB. General philosphy is to allocate 80% of system memory to shared_memory

Uhm...

80% is too much, likely:

https://www.postgresql.org/docs/9.5/static/runtime-config-resource.html

Bye,
Chris.



Sorry, I had a brain fart, shared_buffers should be 25% system memory, or 12GB


--
Melvin Davidson
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.

Reply | Threaded
Open this post in threaded view
|

Re: Service outage: each postgres process use the exact amount of the configured work_mem

Moreno Andreo
In reply to this post by Igor Neyman
Il 14/04/2017 20:40, Igor Neyman ha scritto:

 

E. How many users were connected when the problem occurred?

About 350 connections

Thanks

______________________________________________________________________________________

 

Probably that is your problem, if you don’t have connection pooler.

I’d recommend to start with PgBouncer, very light-weight and easy to configure.

 

Regards,

Igor Neyman

 

I'm building such a system, but I don't think that's the problem because now, with 30 active connections, it still does not work (actually it's extremely slow).

Both servers are far from being under heavy load (so they were before...)

Reply | Threaded
Open this post in threaded view
|

Re: Service outage: each postgres process use the exact amount of the configured work_mem

Moreno Andreo
In reply to this post by Melvin Davidson-5
Il 14/04/2017 20:55, Melvin Davidson ha scritto:


On Fri, Apr 14, 2017 at 2:42 PM, Chris Mair <[hidden email]> wrote:
you should increase shared_memory to 40GB. General philosphy is to allocate 80% of system memory to shared_memory

Uhm...

80% is too much, likely:

https://www.postgresql.org/docs/9.5/static/runtime-config-resource.html

Bye,
Chris.



Sorry, I had a brain fart, shared_buffers should be 25% system memory, or 12GB
The original set was 13 GB (I lowered to 3 GB to take a test), but probably I never acknowledged it with top...


--
Melvin Davidson
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.


Reply | Threaded
Open this post in threaded view
|

Re: Service outage: each postgres process use the exact amount of the configured work_mem

Igor Neyman
In reply to this post by Moreno Andreo

 

From: Moreno Andreo [mailto:[hidden email]]
Sent: Friday, April 14, 2017 3:00 PM
To: Igor Neyman <[hidden email]>; Melvin Davidson <[hidden email]>
Cc: [hidden email]
Subject: Re: [GENERAL] Service outage: each postgres process use the exact amount of the configured work_mem

 

Attention: This email was sent from someone outside of Perceptron. Always exercise caution when opening attachments or clicking links from unknown senders or when receiving unexpected emails.

 

Il 14/04/2017 20:40, Igor Neyman ha scritto:

 

E. How many users were connected when the problem occurred?

About 350 connections

Thanks

______________________________________________________________________________________

 

Probably that is your problem, if you don’t have connection pooler.

I’d recommend to start with PgBouncer, very light-weight and easy to configure.

 

Regards,

Igor Neyman

 

I'm building such a system, but I don't think that's the problem because now, with 30 active connections, it still does not work (actually it's extremely slow).

Both servers are far from being under heavy load (so they were before...)

Having total of 350 connections and only 30 of them being active, is the exact case where connection pooler helps.

Igor Neyman

 

 

Reply | Threaded
Open this post in threaded view
|

Re: Service outage: each postgres process use the exact amount of the configured work_mem

Moreno Andreo
Il 14/04/2017 21:06, Igor Neyman ha scritto:

 

From: Moreno Andreo [[hidden email]]
Sent: Friday, April 14, 2017 3:00 PM
To: Igor Neyman [hidden email]; Melvin Davidson [hidden email]
Cc: [hidden email]
Subject: Re: [GENERAL] Service outage: each postgres process use the exact amount of the configured work_mem

 

Attention: This email was sent from someone outside of Perceptron. Always exercise caution when opening attachments or clicking links from unknown senders or when receiving unexpected emails.

 

Il 14/04/2017 20:40, Igor Neyman ha scritto:

 

E. How many users were connected when the problem occurred?

About 350 connections

Thanks

______________________________________________________________________________________

 

Probably that is your problem, if you don’t have connection pooler.

I’d recommend to start with PgBouncer, very light-weight and easy to configure.

 

Regards,

Igor Neyman

 

I'm building such a system, but I don't think that's the problem because now, with 30 active connections, it still does not work (actually it's extremely slow).

Both servers are far from being under heavy load (so they were before...)

Having total of 350 connections and only 30 of them being active, is the exact case where connection pooler helps.

Igor Neyman


Sorry, I misexplained.
4 hours ago we were in rush hour and the server was running with its usual load (graphs do not show spikes in server load), and we had about 350 connections.
To troubleshoot the whole thing, I needed to restart the servers, in the meantime customers disconnected.
Now I have 30 of them (most of them alive), select * from pg_stat_activity shows 30 connections.
That's why I don't think it's a connection # problem.

 


Reply | Threaded
Open this post in threaded view
|

Re: Service outage: each postgres process use the exact amount of the configured work_mem

Igor Neyman

 

From: Moreno Andreo [mailto:[hidden email]]
Sent: Friday, April 14, 2017 3:11 PM
To: Igor Neyman <[hidden email]>; Melvin Davidson <[hidden email]>
Cc: [hidden email]
Subject: Re: [GENERAL] Service outage: each postgres process use the exact amount of the configured work_mem

 

Attention: This email was sent from someone outside of Perceptron. Always exercise caution when opening attachments or clicking links from unknown senders or when receiving unexpected emails.

 

Il 14/04/2017 21:06, Igor Neyman ha scritto:

 

From: Moreno Andreo [[hidden email]]
Sent: Friday, April 14, 2017 3:00 PM
To: Igor Neyman [hidden email]; Melvin Davidson [hidden email]
Cc: [hidden email]
Subject: Re: [GENERAL] Service outage: each postgres process use the exact amount of the configured work_mem

 

Attention: This email was sent from someone outside of Perceptron. Always exercise caution when opening attachments or clicking links from unknown senders or when receiving unexpected emails.

 

Il 14/04/2017 20:40, Igor Neyman ha scritto:

 

E. How many users were connected when the problem occurred?

About 350 connections

Thanks

______________________________________________________________________________________

 

Probably that is your problem, if you don’t have connection pooler.

I’d recommend to start with PgBouncer, very light-weight and easy to configure.

 

Regards,

Igor Neyman

 

I'm building such a system, but I don't think that's the problem because now, with 30 active connections, it still does not work (actually it's extremely slow).

Both servers are far from being under heavy load (so they were before...)

Having total of 350 connections and only 30 of them being active, is the exact case where connection pooler helps.

Igor Neyman

 

Sorry, I misexplained.
4 hours ago we were in rush hour and the server was running with its usual load (graphs do not show spikes in server load), and we had about 350 connections.
To troubleshoot the whole thing, I needed to restart the servers, in the meantime customers disconnected.
Now I have 30 of them (most of them alive), select * from pg_stat_activity shows 30 connections.
That's why I don't think it's a connection # problem.
____________________________________________________________________________________________________________________

 

Right.

But you had 300 connections when the problem occurred, and leads me to believe that pgBouncer is needed.

 

Igor Neyman

 

 

Reply | Threaded
Open this post in threaded view
|

Re: Service outage: each postgres process use the exact amount of the configured work_mem [(AUTO-RE)SOLVED...]

Moreno Andreo
Il 14/04/2017 21:15, Igor Neyman ha scritto:

 

From: Moreno Andreo [[hidden email]]
Sent: Friday, April 14, 2017 3:11 PM
To: Igor Neyman [hidden email]; Melvin Davidson [hidden email]
Cc: [hidden email]
Subject: Re: [GENERAL] Service outage: each postgres process use the exact amount of the configured work_mem

 

Attention: This email was sent from someone outside of Perceptron. Always exercise caution when opening attachments or clicking links from unknown senders or when receiving unexpected emails.

 

Il 14/04/2017 21:06, Igor Neyman ha scritto:

 

From: Moreno Andreo [[hidden email]]
Sent: Friday, April 14, 2017 3:00 PM
To: Igor Neyman [hidden email]; Melvin Davidson [hidden email]
Cc: [hidden email]
Subject: Re: [GENERAL] Service outage: each postgres process use the exact amount of the configured work_mem

 

Attention: This email was sent from someone outside of Perceptron. Always exercise caution when opening attachments or clicking links from unknown senders or when receiving unexpected emails.

 

Il 14/04/2017 20:40, Igor Neyman ha scritto:

 

E. How many users were connected when the problem occurred?

About 350 connections

Thanks

______________________________________________________________________________________

 

Probably that is your problem, if you don’t have connection pooler.

I’d recommend to start with PgBouncer, very light-weight and easy to configure.

 

Regards,

Igor Neyman

 

I'm building such a system, but I don't think that's the problem because now, with 30 active connections, it still does not work (actually it's extremely slow).

Both servers are far from being under heavy load (so they were before...)

Having total of 350 connections and only 30 of them being active, is the exact case where connection pooler helps.

Igor Neyman

 

Sorry, I misexplained.
4 hours ago we were in rush hour and the server was running with its usual load (graphs do not show spikes in server load), and we had about 350 connections.
To troubleshoot the whole thing, I needed to restart the servers, in the meantime customers disconnected.
Now I have 30 of them (most of them alive), select * from pg_stat_activity shows 30 connections.
That's why I don't think it's a connection # problem.
____________________________________________________________________________________________________________________

 

Right.

But you had 300 connections when the problem occurred, and leads me to believe that pgBouncer is needed.

 

Igor Neyman

 

For whom may be in interest.....
After creating another VM for the Windows instance, using a snapshot from past night (and choosing a good client to test... our customer's has some other problems, I think........) everything worked.
No lags, no delays.
Fast as usual.
So we switched back to the original server.
Everything worked.

This is the WORST situation.
Honestly, I didn't understand which server and module was the problem, neither did figure out the cause. I found *nothing* in logs.
It can happen again.
And I'll be in the same situation.

I need to hurry up with the new structure (cluster primary/standby with pgpool II for pooling and load balancing and barman for backup), that is growing in the test lab.

Being on Google Cloud Platform, this makes me suspect of some "strange" problem (next Tuesday I'll ask them) on their structure.... but on the other side I think they'd be alerting us on some issues they're experiencing.

We designed this application 6 years ago, in a hurry (long story short: we needed to build a product to be *absolutely* sold before fall 2012, and we started in July 2011) and went a bit superficial with some modules of the program. One of them was database design. I was using Postgresql since 2005, but I never used it in scenarios that go over having 2 or 3 databases that collect logs and do some reporting, so I was lacking experience in *battle* scenarios like this, so I went with a design in which every customer has its own role, which makes him authenticate, and with this role he can connect to his database and to some of his colleagues' (imagine a group of 10... if everyone connects to everyone, it's 100 connections!). Now we're updating this, so only one role is used for everyone, and authentication is made in other ways, and I think that in this situation pgbouncer will help a little, but with the new design a pooler will lend us a great hand.

Thanks for your patience
Happy Easter
Moreno.