Can't we give better table bloat stats easily?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Can't we give better table bloat stats easily?

Greg Stark
Everywhere I've worked I've seen people struggle with table bloat. It's hard to even measure how much of it you have or where, let alone actually fix it.

If you search online you'll find dozens of different queries estimating how much empty space are in your tables and indexes based on pg_stats statistics and suppositions about header lengths and padding and plugging them into formulas of varying credibility. 

But isn't this all just silliiness these days? We could actually sum up the space recorded in the fsm and get a much more trustworthy number in milliseconds. 

I rigged up a quick proof of concept and the code seems super simple and quick. There's one or two tables where the number is a bit suspect and there's no fsm if vacuum hasn't run but that seems pretty small potatoes for such a huge help in reducing user pain.


Reply | Threaded
Open this post in threaded view
|

Re: Can't we give better table bloat stats easily?

Andres Freund
Hi,

On 2019-08-16 20:39:21 -0400, Greg Stark wrote:
> But isn't this all just silliiness these days? We could actually sum up the
> space recorded in the fsm and get a much more trustworthy number in
> milliseconds.

You mean like pgstattuple_approx()?

https://www.postgresql.org/docs/current/pgstattuple.html

Or something different?

> I rigged up a quick proof of concept and the code seems super simple and
> quick. There's one or two tables where the number is a bit suspect and
> there's no fsm if vacuum hasn't run but that seems pretty small potatoes
> for such a huge help in reducing user pain.

Hard to comment on what you propose, without more details. But note that
you can't just look at the FSM, because in a lot of workloads it is
often hugely out of date. And fairly obviously it doesn't provide you
with information about how much space is currently occupied by dead
tuples.  What pgstattuple_approx does is to use the FSM for blocks that
are all-visible, and look at the page otherwise.

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: Can't we give better table bloat stats easily?

Masahiko Sawada
On Sat, Aug 17, 2019 at 9:59 AM Andres Freund <[hidden email]> wrote:

>
> Hi,
>
> On 2019-08-16 20:39:21 -0400, Greg Stark wrote:
> > But isn't this all just silliiness these days? We could actually sum up the
> > space recorded in the fsm and get a much more trustworthy number in
> > milliseconds.
>
> You mean like pgstattuple_approx()?
>
> https://www.postgresql.org/docs/current/pgstattuple.html
>
> Or something different?
>
> > I rigged up a quick proof of concept and the code seems super simple and
> > quick. There's one or two tables where the number is a bit suspect and
> > there's no fsm if vacuum hasn't run but that seems pretty small potatoes
> > for such a huge help in reducing user pain.
>
> Hard to comment on what you propose, without more details. But note that
> you can't just look at the FSM, because in a lot of workloads it is
> often hugely out of date. And fairly obviously it doesn't provide you
> with information about how much space is currently occupied by dead
> tuples.  What pgstattuple_approx does is to use the FSM for blocks that
> are all-visible, and look at the page otherwise.
>

It's just an idea but we could have pgstattuple_approx use sample scan
to estimate the table bloat more faster.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Reply | Threaded
Open this post in threaded view
|

Re: Can't we give better table bloat stats easily?

Jeff Janes
In reply to this post by Greg Stark
On Fri, Aug 16, 2019 at 8:39 PM Greg Stark <[hidden email]> wrote:
Everywhere I've worked I've seen people struggle with table bloat. It's hard to even measure how much of it you have or where, let alone actually fix it.

If you search online you'll find dozens of different queries estimating how much empty space are in your tables and indexes based on pg_stats statistics and suppositions about header lengths and padding and plugging them into formulas of varying credibility. 

There is not much we can do to suppress bad advice that people post on their own blogs.  If wiki.postgresql.org is hosting bad advice, by all means we should fix that.
 
But isn't this all just silliiness these days? We could actually sum up the space recorded in the fsm and get a much more trustworthy number in milliseconds. 

If you have bloat problems, then you probably have vacuuming problems.  If you have vacuuming problems, how much can you trust fsm anyway?

Cheers,

Jeff