Does the work made by vaccum in the current pass is lost when interrupted?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Does the work made by vaccum in the current pass is lost when interrupted?

Jean-Marc Lessard

I have a large table (billions of records) which has not been vacuum and bloated.

Vacuum scale factor was left at the default.

 

I ran a vacuum on a DEV system and it makes several passes (scanning heap/vacuuming indexes/vacumming heap) which take more than an hour each.

On a PROD system, I may have to kill the job midway.

Should I reduce the autovacuum_work_mem of my session? Currently 1GB

Does the work made in the current pass is lost when interrupted?

Correct me if I am not right, but vacuumed indexes and heap made by the current pass will go to disk, but the heap should be rescan on the next vacuum.

I guess that the visibility map is updated only at the end of each pass.

 

My best regards

 

Thank you

 


Jean-Marc Lessard
Administrateur de base de données / Database Administrator



Ultra Electronics Forensic Technology Inc.
T +1 514 489 4247 x4164
www.ultra-forensictechnology.com

 

Reply | Threaded
Open this post in threaded view
|

Re: Does the work made by vaccum in the current pass is lost when interrupted?

Allan Kamau-3


On Thu, Oct 8, 2020 at 10:16 PM Jean-Marc Lessard <[hidden email]> wrote:

I have a large table (billions of records) which has not been vacuum and bloated.

Vacuum scale factor was left at the default.

 

I ran a vacuum on a DEV system and it makes several passes (scanning heap/vacuuming indexes/vacumming heap) which take more than an hour each.

On a PROD system, I may have to kill the job midway.

Should I reduce the autovacuum_work_mem of my session? Currently 1GB

Does the work made in the current pass is lost when interrupted?

Correct me if I am not right, but vacuumed indexes and heap made by the current pass will go to disk, but the heap should be rescan on the next vacuum.

I guess that the visibility map is updated only at the end of each pass.

 

My best regards

 

Thank you

 


Jean-Marc Lessard
Administrateur de base de données / Database Administrator



Ultra Electronics Forensic Technology Inc.
T +1 514 489 4247 x4164
www.ultra-forensictechnology.com

 





A suggestion, considering that you have a table with billions of tuples, you could look into table partitioning, see "https://www.postgresql.org/docs/current/ddl-partitioning.html".

First you may need to identify a criteria by which to partition your data, maybe by record population timestamp binned into yyyymm resolution.


One a test environment, you may construct a table similar to the one currently having the many records but with a different name.

Then write code which will construct the child tables, run the sql to construct the child tables.
If your child tables are based on the record date, you may construct child tables for the timestamps of the record you have as well as several months or timepoints in the future. Maybe have a cron job to construct new tables of future timepoints.

Then write code to populate the new tables directly in piecemeal by the use of WHERE clause with data from the current production table.

You may choose to write the above code to use "COPY" or "INSERT" to populate the specific partition table.


Clone your current application and modify the code such that it inserts directly to the specific child table or leave the writing to the specific child table to be done by the on insert trigger. 

Test the data population via the application to see if the child tables are being populated accordingly.

If all is well. Schedule downtime where you can implement these changes to your production environment.



Allan.

 
Reply | Threaded
Open this post in threaded view
|

Re: Does the work made by vaccum in the current pass is lost when interrupted?

Michael Lewis
In reply to this post by Jean-Marc Lessard


On Thu, Oct 8, 2020, 1:16 PM Jean-Marc Lessard <[hidden email]> wrote:

I have a large table (billions of records) which has not been vacuum and bloated.

Vacuum scale factor was left at the default.

 

I ran a vacuum on a DEV system and it makes several passes (scanning heap/vacuuming indexes/vacumming heap) which take more than an hour each.

On a PROD system, I may have to kill the job midway.

Should I reduce the autovacuum_work_mem of my session? Currently 1GB


Increase if you can. You want to maximize the work being done before it needs to pause.

Have you tuned any settings related to vacuum? If your I/O system can handle it, turning cost delay very low, especially PG12+ where it can be less than 1ms, may be helpful. Otherwise you might reduce to 1ms and also increase cost limit so you do more work before stopping.

Reply | Threaded
Open this post in threaded view
|

RE: Does the work made by vaccum in the current pass is lost when interrupted?

Jean-Marc Lessard

Thank you Michael for your suggestion,

I will reduce vacuum_cost_delay and increasing vacuum_cost_limit to speedup vacuum.

But the I/O system is very solicitated on PROD system and I do not want to impact end user performance.

If aggressive cost_delay and vacuum_cost_limit slow down significatively end user performance, I may have to kill the vacuum, but would like to keep as much vacuum work as possible.

 

My concern is about the size of the max_dead_tuples buffer determined by the autovacuum_work_mem

  • autovacuum_work_mem is currently 1GB (default to maintenance_work_mem)

 

When I ran a vacuum on a DEV system and it made a single “scanning heap/vacuuming indexes/vacuuming heap” cycle of about 12 hours.

The vacuuming indexes phase is half the total time (6 hrs).

 

Q1: I guess that if I have to kill the vacuum midway, I will lose all the work it did during the scanning heap step? Right?

Q2: If I reduce autovacuum_work_mem to 32MB, is there any side effect performance on the vacuum?

 

thanks