Does pg_stat_get_live_tuples() matter?

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Does pg_stat_get_live_tuples() matter?

Sherrylyn Branchaw
If a table has 1.8M rows in reality, yet pg_stat_get_live_tuples() returns 1.8K, does it matter to the performance of the database, as long as pg_class.reltuples is the right order of magnitude?

The query planner seems to use the accurate estimate, but what about the autovacuum daemon? Or anything else?

In short, is pg_stat_get_live_tuples() (and the views that invoke it) used by any built-in tooling, or is it just there for the user to do monitoring?

Thanks,
Sherrylyn
Reply | Threaded
Open this post in threaded view
|

Re: Does pg_stat_get_live_tuples() matter?

Adrian Klaver-4
On 4/9/19 9:45 AM, Sherrylyn Branchaw wrote:
> If a table has 1.8M rows in reality, yet pg_stat_get_live_tuples()
> returns 1.8K, does it matter to the performance of the database, as long
> as pg_class.reltuples is the right order of magnitude?

What version of Postgres?

How are you getting at the 1.8M number?

>
> The query planner seems to use the accurate estimate, but what about the
> autovacuum daemon? Or anything else?
>
> In short, is pg_stat_get_live_tuples() (and the views that invoke it)
> used by any built-in tooling, or is it just there for the user to do
> monitoring?
>
> Thanks,
> Sherrylyn


--
Adrian Klaver
[hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: Does pg_stat_get_live_tuples() matter?

Sherrylyn Branchaw
What version of Postgres?

Sorry, can't believe I forgot to include that! Postgres 9.6.9.

How are you getting at the 1.8M number?

SELECT COUNT(*)
Reply | Threaded
Open this post in threaded view
|

Re: Does pg_stat_get_live_tuples() matter?

Tom Lane-2
In reply to this post by Sherrylyn Branchaw
Sherrylyn Branchaw <[hidden email]> writes:
> If a table has 1.8M rows in reality, yet pg_stat_get_live_tuples() returns
> 1.8K, does it matter to the performance of the database, as long as
> pg_class.reltuples is the right order of magnitude?

Hmmm ... what was in reltuples again?  Looking at the current code,
it looks like vacuum or autovacuum should set reltuples and n_live_tuples
to the same thing.  n_live_tuples gets adjusted incrementally by
subsequent transactions, and it wouldn't be too surprising if it drifted
away from reality, but it's a bit hard to believe that it could get to be
off by 1000X.  Have you suppressed autovacuum on this table?

We have fooled around with the logic that maintains these numbers, so
maybe it was different in 9.6.9.

Anyway, to answer your question, I don't see anything in the current core
code that pays attention to n_live_tuples.  reltuples definitely does
matter to the planner, and some of the sibling counters like n_dead_tuples
drive autovacuum, but nothing is examining n_live_tuples AFAICS.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Does pg_stat_get_live_tuples() matter?

Sherrylyn Branchaw
> Hmmm ... what was in reltuples again?

Reltuples had the correct number, and the query plans were showing the correct estimates.

> it's a bit hard to believe that it could get to be off by 1000X.  Have you suppressed autovacuum on this table?

No, but here are some things I've observed: 

1)  Any time pg_stat_reset() gets run, pg_stat_user_tables.n_live_tup gets reset to 0. pg_class.reltuples is untouched. 

2) If new tuples get inserted or deleted after pg_stat_reset(), pg_stat_user_tables.n_live_tup will match pg_stat_user_tables.n_tup_ins - pg_stat_user_tables.n_tup_del up until the next analyze or autoanalyze.

3) Once the next (auto)analyze happens, pg_stat_user_tables.n_live_tup will be updated to match pg_class.reltuples.

4) If a table is very large, it may be a while before autovacuum_analyze_scale_factor is reached. We have ours set to 0.01, which is an order of magnitude larger than 1.8K on a 1.8M row table. (I would like to tune our settings more, but I have a list of priorities from management on which this isn't high.)

5) Given 1-4, pg_stat_user_tables.n_live_tup may spend quite a long time matching pg_stat_user_tables.n_tup_ins - pg_stat_user_tables.n_tup_del instead of pg_class.reltuples. For example, if a table has 1.8 million rows, and you insert 5 and delete 4 after a stats reset, n_live_tup will report that the table has 1 tuple.

6)  Failovers, at least in Aurora, apparently cause pg_stat_reset() to be run, at least judging by the timestamp I'm seeing in pg_stat_bgwriter.stat_reset. We haven't done a failover in the data center in a while, and it's less trivial for me to test there atm, so I'm not certain whether open-source Postgres failovers also reset statistics.

> I don't see anything in the current core code that pays attention to n_live_tuples.  reltuples definitely does matter to the planner, and some of the sibling counters like n_dead_tuples drive autovacuum, but nothing is examining n_live_tuples AFAICS.

That's what I thought, but I wanted to make sure I wasn't missing anything obvious. Thanks!

> some of the sibling counters like n_dead_tuples drive autovacuum

So that's interesting. I knew that, but hadn't thought about the implications. If pg_stat_reset() is executed by failovers, and a failover happens just before a table is ready to be vacuumed--say it has 0.009 * reltuples dead tuples (I'm simplifying the formula here)--then n_dead_tup gets reset to 0 and the counting starts all over again. Regular failovers could thus increase bloat by delaying the autovacuum daemon from recognizing that a table needs to be vacuumed, am I right?

Is it recommended practice to manually VACUUM ANALYZE the whole database after a failover? Or is resetting stats after a failover just an Aurora thing? I'm sorry I'm asking the latter question instead of testing, but I've been ordered not to spend time on improving our vacuuming and statistics until 5 other large projects are done, and I'm spending a minimal amount of time anyway just to see how badly frequent failovers might be affecting us and if there's any action we need to take.

Thanks,
Sherrylyn
Reply | Threaded
Open this post in threaded view
|

Re: Does pg_stat_get_live_tuples() matter?

Jeremy Schneider-2

> 6)  Failovers, at least in Aurora, apparently cause pg_stat_reset() to be run, at least judging by the timestamp I'm seeing in pg_stat_bgwriter.stat_reset. We haven't done a failover in the data center in a while, and it's less trivial for me to test there atm, so I'm not certain whether open-source Postgres failovers also reset statistics.

Just wanted to chime in here - on this version of Aurora, there’s nothing special or different from open source community PostgreSQL that I know of. I believe that a failover in RDS MAZ PostgreSQL or in your own data center with repmgr would have the same characteristics. Actually nothing is calling pg_stat_reset() in any of these cases - it’s just that these statistics are maintained only in memory and not replicated through the WAL stream. (Aurora is slightly different w shared storage between the nodes, but same principle applies.) So after failover, you’re on a new physical machine which needs to start over fresh on populating the memory structures.

> Is it recommended practice to manually VACUUM ANALYZE the whole database after a failover? Or is resetting stats after a failover just an Aurora thing? I'm sorry I'm asking the latter question instead of testing, but I've been ordered not to spend time on improving our vacuuming and statistics until 5 other large projects are done, and I'm spending a minimal amount of time anyway just to see how badly frequent failovers might be affecting us and if there's any action we need to take.

If you’re seeing frequent failovers - whether in your own data center or somebody else’s - I think you should investigate it. I don’t think you should accept that from your own staff or from any vendor.

That said, I don’t think the impact here is really that severe on bloat or statistics; any frequently modified table will still get vacuumed - albeit with one slightly longer interval due to a failover. It won’t exceed double the normal interval unless there are  multiple failovers in quick succession. And a table that isn’t getting changed wasn’t going to get vacuumed anyway.

-Jeremy