BUG #16620: Autovacuum does not process certain databases after migration from postgresql 10

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

BUG #16620: Autovacuum does not process certain databases after migration from postgresql 10

PG Doc comments form
The following bug has been logged on the website:

Bug reference:      16620
Logged by:          Max Vikharev
Email address:      [hidden email]
PostgreSQL version: 12.3
Operating system:   Linux
Description:        

Hi,

There is a cluster with 50 databases. The sizes of tables are very
different, from megabytes to terabytes.

We have been successfully used the aggressive autovacuum for a long time.
After upgrade from 10 to 12.3 we've noticed that autovacuum always process
relations from one database but does not reach others.

I've used
https://www.depesz.com/2020/01/29/which-tables-should-be-auto-vacuumed-or-auto-analyzed/comment-page-1/
to detect existance of relations that should be autovacuumed. There are a
lot of them. At the same time i see that there are free autovacuum workers
that should do this work. But tables are not processed for a long time.

I have logs (log_autovacuum_min_duration = 0) that confirms that only single
database is constantly being autovacuumed. Within pg_stat_user_tables i see
that last time autovacuum / autoanalyze for  other databases happened in the
same day when "vacuum to prevent wraparound" worked.

Please help to investigate the problem. Why my relations does not processed
and how can i fix it.

Example table
-[ RECORD 1 ]-------+------------------------------
n_tup_ins           | 25599733
n_tup_upd           | 2206685374
n_tup_del           | 19938580
n_tup_hot_upd       | 1863307770
n_live_tup          | 186473676
n_dead_tup          | 18177988
n_mod_since_analyze | 3250337
last_autovacuum     | 2020-08-26 16:43:02.321971+03
last_autoanalyze    | 2020-08-26 14:04:19.132566+03
vacuum_count        | 1
autovacuum_count    | 4078
analyze_count       | 8
autoanalyze_count   | 357

Autovacuum settings
"autovacuum_max_workers" => 5,
"autovacuum_naptime" => "30s",
"autovacuum_vacuum_threshold" => 5000
"autovacuum_vacuum_scale_factor" => 0.00001
"autovacuum_vacuum_cost_delay" => "10ms"
"autovacuum_vacuum_cost_limit" => 6000,
"autovacuum_analyze_threshold" => 5000,
"autovacuum_analyze_scale_factor" => "0.01"

Reply | Threaded
Open this post in threaded view
|

Re: BUG #16620: Autovacuum does not process certain databases after migration from postgresql 10

Max Vikharev
Problem gone after restarting the cluster. 
Autovacuum started to process relations in other databases.

I dont know how to reproduce the issue, we will monitor it. 
If there any way to debug it when it occurs again - let me know.

ср, 16 сент. 2020 г. в 16:40, PG Bug reporting form <[hidden email]>:
The following bug has been logged on the website:

Bug reference:      16620
Logged by:          Max Vikharev
Email address:      [hidden email]
PostgreSQL version: 12.3
Operating system:   Linux
Description:       

Hi,

There is a cluster with 50 databases. The sizes of tables are very
different, from megabytes to terabytes.

We have been successfully used the aggressive autovacuum for a long time.
After upgrade from 10 to 12.3 we've noticed that autovacuum always process
relations from one database but does not reach others.

I've used
https://www.depesz.com/2020/01/29/which-tables-should-be-auto-vacuumed-or-auto-analyzed/comment-page-1/
to detect existance of relations that should be autovacuumed. There are a
lot of them. At the same time i see that there are free autovacuum workers
that should do this work. But tables are not processed for a long time.

I have logs (log_autovacuum_min_duration = 0) that confirms that only single
database is constantly being autovacuumed. Within pg_stat_user_tables i see
that last time autovacuum / autoanalyze for  other databases happened in the
same day when "vacuum to prevent wraparound" worked.

Please help to investigate the problem. Why my relations does not processed
and how can i fix it.

Example table
-[ RECORD 1 ]-------+------------------------------
n_tup_ins           | 25599733
n_tup_upd           | 2206685374
n_tup_del           | 19938580
n_tup_hot_upd       | 1863307770
n_live_tup          | 186473676
n_dead_tup          | 18177988
n_mod_since_analyze | 3250337
last_autovacuum     | 2020-08-26 16:43:02.321971+03
last_autoanalyze    | 2020-08-26 14:04:19.132566+03
vacuum_count        | 1
autovacuum_count    | 4078
analyze_count       | 8
autoanalyze_count   | 357

Autovacuum settings
"autovacuum_max_workers" => 5,
"autovacuum_naptime" => "30s",
"autovacuum_vacuum_threshold" => 5000
"autovacuum_vacuum_scale_factor" => 0.00001
"autovacuum_vacuum_cost_delay" => "10ms"
"autovacuum_vacuum_cost_limit" => 6000,
"autovacuum_analyze_threshold" => 5000,
"autovacuum_analyze_scale_factor" => "0.01"

Reply | Threaded
Open this post in threaded view
|

Re: BUG #16620: Autovacuum does not process certain databases after migration from postgresql 10

Tom Lane-2
Max Vikharev <[hidden email]> writes:
> Problem gone after restarting the cluster.
> Autovacuum started to process relations in other databases.

Hmm, interesting.

> I dont know how to reproduce the issue, we will monitor it.
> If there any way to debug it when it occurs again - let me know.

Did you by any chance capture the contents of pg_database.datfrozenxid
and datminmxid and compare them to the pg_class.relfrozenxid and
relminmxid fields in the problematic databases?

It's not hard to imagine that if the pg_database fields somehow
didn't get updated correctly during pg_upgrade, that would prevent
autovacuum from processing some databases to prevent wraparound.
However, that doesn't explain failure to examine those databases
at all, so I'm a bit at a loss.

Another thing to check is whether the stats collector is working.
Specifically look at whether counts in pg_stat_all_tables are
incrementing in the problem databases.

My guess is that somehow pg_upgrade left something in a slightly
hosed state, and that restarting de-hosed it, so that you aren't
going to see this again ... at least not till your next upgrade.
But I don't know exactly what the something could be.

                        regards, tom lane