The following bug has been logged on the website:
Bug reference: 16691 Logged by: Max Vikharev Email address: [hidden email] PostgreSQL version: 12.3 Operating system: Ubuntu 16.04.5 LTS Description: Hi, Previously i've reported BUG # 16620. But there is no answer on my new mails, so i have to open this new report. Currently we have to restart postgresql 12.3 every 5-7 days to make it work properly on 5 terabyte cluster with ~ 50 databases. The problem is that autovacuum stops to process certain databases and cycles on the only one... Until we restart cluster. We've made a trigger and now detecting very situation when it happen. And it happen every 5-7 day. So postgresql is loosing actual statistic, we see not effecttive plans, we need to restart postgresql and also we have to run vacuum full explicitly frequently. So this all is huge headake. Please help us investigate the roots of problem. |
čt 29. 10. 2020 v 22:44 odesílatel PG Bug reporting form <[hidden email]> napsal: The following bug has been logged on the website: please, check open transactions. Restart, close all sessions, all transactions. Did you check pg_stat_activity if there are some sessions in state "idle in transactions" before restart? Regards Pavel
|
He Pavel, > Did you check pg_stat_activity if there are some sessions in state "idle in transactions" before restart? Of course, we discovered all known widely possible reasons: 1. We monitor long transactions and there are no active transactions or opened sessions with idle in transaction state. Also we have statement_timeout 2. There are no active transactons on slave and hot_standby_feedback=off 3. We do not use prepared transactions 4. There are no repliaction_slots. пт, 30 окт. 2020 г. в 07:48, Pavel Stehule <[hidden email]>:
|
pá 30. 10. 2020 v 8:23 odesílatel Max Vikharev <[hidden email]> napsal:
maybe there are some lock leak - this is probably postgres bug, but unfortunately it can be detected only in your environment - if you are not able to prepare reproducer test. can you attach autovacuum process by gdb and read whot this process does?
|
Ok, next time it happens I'll do this debug.
Maybe anything more that i can collect? Last time it happened i enabled DEBUG logging and that's what autovacuum process do (see attach or https://gist.github.com/mvikharev/62e2aa690cd3d256fa1880d44b169313) пт, 30 окт. 2020 г. в 10:29, Pavel Stehule <[hidden email]>:
|
pá 30. 10. 2020 v 10:13 odesílatel Max Vikharev <[hidden email]> napsal:
you can check a pg_locks - if table are not locked
|
In reply to this post by apt.postgresql.org Repository Update
On Thu, Oct 29, 2020 at 5:44 PM PG Bug reporting form <[hidden email]> wrote: The following bug has been logged on the website: There is a known problem (https://www.postgresql.org/message-id/CAMkU=1yE4YyCC00W_GcNoOZ4X2qxF7x5DUAR_kMt-Ta=YPyFPQ@...) that looks something like this, but restarting the database should not fix it (or at least, not for long), and it should stop on its own once the one database being vacuumed for wraparound has finished doing so. Is it vacuuming the same table over and over again, or is finishing that table and moving to another in the same database, or what? Cheers, Jeff |
Hi Thank you for response > finishing that table and moving to another in the same database Yes, that case. Vacuum processes relations in one database In cycle сб, 31 окт. 2020 г. в 21:55, Jeff Janes <[hidden email]>:
|
Max Vikharev <[hidden email]> writes:
>> finishing that table and moving to another in the same database > Yes, that case. Vacuum processes relations in one database In cycle It sounds like it's failing to advance that database's datfrozenxid and/or datminmxid. You might try to figure out which table is holding those values back by comparing pg_class.relfrozenxid and relminmxid to those pg_database columns. Then look to see if autovacuum is failing on that table, and/or try a manual vacuum to see what happens. regards, tom lane |
We are facing a problem right now. In logs we see that autovacuum is processing mostly one table in database in 1 second cycle automatic vacuum of table "DATABASE.pg_catalog.pg_shdepend" open("global/679366931", O_RDWR) = 6 lseek(6, 0, SEEK_END) = 16384 kill(11798, SIGUSR1) = 0 open("pg_stat_tmp/global.stat", O_RDONLY) = 7 fstat(7, {st_mode=S_IFREG|0600, st_size=11292, ...}) = 0 read(7, "\235\274\245\1\312\177n\322.V\2\0'\v\0\0\0\0\0\0\4U\0\0\0\0\0\0u\326\313\26"..., 4096) = 4096 read(7, "iZ\272\17\344\0\0\0w\337\334}\343\0\0\0000\362\255\0\0\0\0\0\265\241\332\0\0\0\0\0"..., 4096) = 4096 read(7, "\0\0\0\0\0\0\0\0\0\0\0\0U\2\0\0\0\0\0\0\0\0B\277\0\0\0\0\1\0\0\0"..., 4096) = 3100 close(7) = 0 sendto(11, "\1\0\0\0 \0\0\0\6\270v\322.V\2\0\346\26o\322.V\2\0\0\0\0\0\1\0\0\0", 32, 0, NULL, 0) = 32 select(0, NULL, NULL, NULL, {0, 10000}) = ? ERESTARTNOHAND (To be restarted if no handler) --- SIGUSR2 {si_signo=SIGUSR2, si_code=SI_USER, si_pid=12052, si_uid=114} --- rt_sigreturn({mask=[]}) = -1 EINTR (Interrupted system call) open("pg_stat_tmp/global.stat", O_RDONLY) = 7 fstat(7, {st_mode=S_IFREG|0600, st_size=11292, ...}) = 0 read(7, "\235\274\245\1z\270v\322.V\2\0'\v\0\0\0\0\0\0\4U\0\0\0\0\0\0u\326\313\26"..., 4096) = 4096 read(7, "iZ\272\17\344\0\0\0w\337\334}\343\0\0\0000\362\255\0\0\0\0\0\265\241\332\0\0\0\0\0"..., 4096) = 4096 read(7, "\0\0\0\0\0\0\0\0\0\0\0\0U\2\0\0\0\0\0\0\0\0B\277\0\0\0\0\1\0\0\0"..., 4096) = 3100 close(7) = 0 open("pg_stat_tmp/global.stat", O_RDONLY) = 7 fstat(7, {st_mode=S_IFREG|0600, st_size=11292, ...}) = 0 read(7, "\235\274\245\1z\270v\322.V\2\0'\v\0\0\0\0\0\0\4U\0\0\0\0\0\0u\326\313\26"..., 4096) = 4096 read(7, "iZ\272\17\344\0\0\0w\337\334}\343\0\0\0000\362\255\0\0\0\0\0\265\241\332\0\0\0\0\0"..., 4096) = 4096 read(7, "\0\0\0\0\0\0\0\0\0\0\0\0U\2\0\0\0\0\0\0\0\0B\277\0\0\0\0\1\0\0\0"..., 4096) = 3100 close(7) = 0 close(6) = 0 open("global/679366931", O_RDWR) = 6 lseek(6, 0, SEEK_END) = 16384 open("pg_stat_tmp/global.stat", O_RDONLY) = 7 fstat(7, {st_mode=S_IFREG|0600, st_size=11292, ...}) = 0 read(7, "\235\274\245\1z\270v\322.V\2\0'\v\0\0\0\0\0\0\4U\0\0\0\0\0\0u\326\313\26"..., 4096) = 4096 read(7, "iZ\272\17\344\0\0\0w\337\334}\343\0\0\0000\362\255\0\0\0\0\0\265\241\332\0\0\0\0\0"..., 4096) = 4096 read(7, "\0\0\0\0\0\0\0\0\0\0\0\0U\2\0\0\0\0\0\0\0\0B\277\0\0\0\0\1\0\0\0"..., 4096) = 3100 close(7) = 0 open("pg_stat_tmp/global.stat", O_RDONLY) = 7 fstat(7, {st_mode=S_IFREG|0600, st_size=11292, ...}) = 0 read(7, "\235\274\245\1z\270v\322.V\2\0'\v\0\0\0\0\0\0\4U\0\0\0\0\0\0u\326\313\26"..., 4096) = 4096 read(7, "iZ\272\17\344\0\0\0w\337\334}\343\0\0\0000\362\255\0\0\0\0\0\265\241\332\0\0\0\0\0"..., 4096) = 4096 read(7, "\0\0\0\0\0\0\0\0\0\0\0\0U\2\0\0\0\0\0\0\0\0B\277\0\0\0\0\1\0\0\0"..., 4096) = 3100 close(7) = 0 epoll_create1(EPOLL_CLOEXEC) = 7 epoll_ctl(7, EPOLL_CTL_ADD, 12, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=772071960, u64=94859419771416}}) = 0 epoll_ctl(7, EPOLL_CTL_ADD, 9, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=772071984, u64=94859419771440}}) = 0 close(7) = 0 epoll_create1(EPOLL_CLOEXEC) = 7 epoll_ctl(7, EPOLL_CTL_ADD, 12, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=772071960, u64=94859419771416}}) = 0 epoll_ctl(7, EPOLL_CTL_ADD, 9, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=772071984, u64=94859419771440}}) = 0 epoll_wait(7, 0x56462e04e260, 1, 554) = -1 EINTR (Interrupted system call) lsof postgres 12202 postgres 6u REG 8,3 16384 47329946 /var/lib/postgresql/12/alytics_main/global/679366931 postgres 30607 postgres 239u REG 8,3 16384 47329946 /var/lib/postgresql/12/alytics_main/global/679366931 DATABASE=# SELECT pg_relation_filepath('pg_database');
pg_relation_filepath
----------------------
global/679366931 > and/or datminmxid. You might try to figure out which table is holding > those values back by comparing pg_class.relfrozenxid and relminmxid > to those pg_database columns. Then look to see if autovacuum is > failing on that table, and/or try a manual vacuum to see what happens. We found tables with pg_database.datfrozenxid = pg_class.relfrozenxid or pg_database.datminmxid = pg_class.relminmxid. We executed "vacuum verbose" on that tables and it did the work correctly, in output we see that there were 0 dead tuples in all tables and there are logs with ERROR/WARNING log level. In postgresql logs we see that after pg_toast were vacuumed autovacuum started to process next tables 2020-11-03 12:54:31 MSK DATABASE [local] [38649]: [2] STATEMENT: vacuum verbose pg_toast.pg_toast_687578781; But it still processes only one database in cycle. I mean that there are tables in other databases that should be processed and there are free autovacuum workers, but queued tables are not processed. вс, 1 нояб. 2020 г. в 20:01, Tom Lane <[hidden email]>: Max Vikharev <[hidden email]> writes: |
Hi, Dear PostgreSQL Developers. We are still facing this issue. We reboot the cluster every 5-7 days.Any help plz? вт, 3 нояб. 2020 г. в 14:14, Max Vikharev <[hidden email]>:
|
Hi, everybody. Finally we discovered that the problem is related to https://www.postgresql.org/message-id/flat/0A3221C70F24FB45833433255569204D1F8A4DC6%40G01JPEXMBYT05 Autovacuum hangs on the database which contains orphaned temp tables. When I deleted the schema with orphaned temporary tables - autovacuum continued to work without reboot. чт, 17 дек. 2020 г. в 16:26, Max Vikharev <[hidden email]>:
|
On Tue, Dec 29, 2020 at 02:17:51PM +0300, Max Vikharev wrote:
> Finally we discovered that the problem is related to > https://www.postgresql.org/message-id/flat/0A3221C70F24FB45833433255569204D1F8A4DC6%40G01JPEXMBYT05 > > Autovacuum hangs on the database which contains orphaned temp tables. When > I deleted the schema with orphaned temporary tables - autovacuum continued > to work without reboot. Just wondering. Are you sure that this instance is based on 12.3? The removal of orphaned temp tables should be more aggressive as we also track down in autovacuum if a specific temp schema OID is in use by a backend, and the initialization of a temp namespace for a backend cleans up temp objects from past connections that were leaving around orphaned objects because of a crash (see around RemoveTempRelations() in InitTempTableNamespace()). -- Michael |
Free forum by Nabble | Edit this page |