Heavy LWLockTranche buffer_mapping in Postgres 9.6.10 server

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Heavy LWLockTranche buffer_mapping in Postgres 9.6.10 server

Justin Lu
Experts,

We are seeing very heavy LWLockTranche buffer_mapping in db recently. 

There server had 24 core, 128GB of RAM, SSD data file system, on Unbuntu 16.04.6.
The shared_buffers was at 32GB. 1/4 of over RAM size. No issue on checkpoints (avg time 29 min apart).

After seeing the heavy wait, we added 64GB more RAM and increased shared_buffers to 48GB, effective_cache_size to 90GB. But it seems there is no impact on the buffer mapping waits at all. 

There doesn't seem to be big changes in the db load pattern, but since the heavy buffer mapping appeared, the cpu load jumped 3 - 4 times.

file system I/O times/timing has been increasing as the load on the db increased over the last few months. But not by any means jumped.

temp files generated by user sorting activities also have been increasing, and increase of the work_mem after adding RAM doesn't seem to help much on this.

Any suggestions on what to look for?

thanks,

Justin
Reply | Threaded
Open this post in threaded view
|

Re: Heavy LWLockTranche buffer_mapping in Postgres 9.6.10 server

Purav Chovatia
believe, Transparent Huge Pages is disabled and Huge Pages are enabled to the tune of shared_buffers.

HTH

On Sun, 2 Feb 2020 at 04:46, Justin Lu <[hidden email]> wrote:
Experts,

We are seeing very heavy LWLockTranche buffer_mapping in db recently. 

There server had 24 core, 128GB of RAM, SSD data file system, on Unbuntu 16.04.6.
The shared_buffers was at 32GB. 1/4 of over RAM size. No issue on checkpoints (avg time 29 min apart).

After seeing the heavy wait, we added 64GB more RAM and increased shared_buffers to 48GB, effective_cache_size to 90GB. But it seems there is no impact on the buffer mapping waits at all. 

There doesn't seem to be big changes in the db load pattern, but since the heavy buffer mapping appeared, the cpu load jumped 3 - 4 times.

file system I/O times/timing has been increasing as the load on the db increased over the last few months. But not by any means jumped.

temp files generated by user sorting activities also have been increasing, and increase of the work_mem after adding RAM doesn't seem to help much on this.

Any suggestions on what to look for?

thanks,

Justin
Reply | Threaded
Open this post in threaded view
|

Re: Heavy LWLockTranche buffer_mapping in Postgres 9.6.10 server

Purav Chovatia
one more suggestion would be to try increasing shared_buffers and see if that helps. It need not be strictly restricted to 25% of the RAM on the server. In your case, since RAM is 192GB and shared_buffers is 48GB, try increasing shared_buffers to 64GB or more and see if "LWLockTranche buffer_mapping" and CPU usage decreases.

HTH

On Sun, 2 Feb 2020 at 14:49, Purav Chovatia <[hidden email]> wrote:
believe, Transparent Huge Pages is disabled and Huge Pages are enabled to the tune of shared_buffers.

HTH

On Sun, 2 Feb 2020 at 04:46, Justin Lu <[hidden email]> wrote:
Experts,

We are seeing very heavy LWLockTranche buffer_mapping in db recently. 

There server had 24 core, 128GB of RAM, SSD data file system, on Unbuntu 16.04.6.
The shared_buffers was at 32GB. 1/4 of over RAM size. No issue on checkpoints (avg time 29 min apart).

After seeing the heavy wait, we added 64GB more RAM and increased shared_buffers to 48GB, effective_cache_size to 90GB. But it seems there is no impact on the buffer mapping waits at all. 

There doesn't seem to be big changes in the db load pattern, but since the heavy buffer mapping appeared, the cpu load jumped 3 - 4 times.

file system I/O times/timing has been increasing as the load on the db increased over the last few months. But not by any means jumped.

temp files generated by user sorting activities also have been increasing, and increase of the work_mem after adding RAM doesn't seem to help much on this.

Any suggestions on what to look for?

thanks,

Justin
Reply | Threaded
Open this post in threaded view
|

Re: Heavy LWLockTranche buffer_mapping in Postgres 9.6.10 server

Andres Freund
In reply to this post by Justin Lu
Hi,

On 2020-02-01 16:17:13 -0700, Justin Lu wrote:

> We are seeing very heavy LWLockTranche buffer_mapping in db recently.
>
> There server had 24 core, 128GB of RAM, SSD data file system, on Unbuntu
> 16.04.6.
> The shared_buffers was at 32GB. 1/4 of over RAM size. No issue on
> checkpoints (avg time 29 min apart).
>
> After seeing the heavy wait, we added 64GB more RAM and increased
> shared_buffers to 48GB, effective_cache_size to 90GB. But it seems there is
> no impact on the buffer mapping waits at all.

I suggest doing a perf profile with --call-graph dwarf, to see where
this is mostly coming from.

One thing I've seen causing symptoms like this before, is if there's
suddenly a larger amount of table truncations, dropping, etc - dropping
/ truncating a table / index needs to scan all of shared buffers...

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: Heavy LWLockTranche buffer_mapping in Postgres 9.6.10 server

Justin Lu
Thanks for those pitched in. I finally resolved the issue. It turns out that the table bloat was the culprit. Those bloat couldn't be dealt with by manual or auto vacuum. It somehow reached a point that many queries had trouble finish on time even though they used good query plans. The final solution was just do a full vacuum on some of the key tables, and that immediately quiet down the database.

thanks,

Justin

On Sun, Feb 2, 2020, 8:15 AM Andres Freund <[hidden email]> wrote:
Hi,

On 2020-02-01 16:17:13 -0700, Justin Lu wrote:
> We are seeing very heavy LWLockTranche buffer_mapping in db recently.
>
> There server had 24 core, 128GB of RAM, SSD data file system, on Unbuntu
> 16.04.6.
> The shared_buffers was at 32GB. 1/4 of over RAM size. No issue on
> checkpoints (avg time 29 min apart).
>
> After seeing the heavy wait, we added 64GB more RAM and increased
> shared_buffers to 48GB, effective_cache_size to 90GB. But it seems there is
> no impact on the buffer mapping waits at all.

I suggest doing a perf profile with --call-graph dwarf, to see where
this is mostly coming from.

One thing I've seen causing symptoms like this before, is if there's
suddenly a larger amount of table truncations, dropping, etc - dropping
/ truncating a table / index needs to scan all of shared buffers...

Greetings,

Andres Freund