admin control over cancelling autovacuum when blocked by a lock

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

admin control over cancelling autovacuum when blocked by a lock

johnlumby
If a regular backend worker becomes blocked waiting for a lock held by an autovacuum worker,
then ,  depending on some decision made by deadlock detector,
the autovacuum worker may be signalled to terminate

This results in messages such as these in the statement log


ERROR:  canceling autovacuum task
CONTEXT:  automatic vacuum of table "somedb.aschema.thistable"

Is there any configuration parameter which controls this?
including deferring or even completely eliminating this cancelling?
so that,  in the eliminate case,  the waiter would simply go on waiting
until the autovacuum eventually finishes "thistable"?

I have not seen any parameter to do this.
Note that I don't mean some parameter which affects all locking behaviour,
only this autovacuum case.

Cheers,   John Lumby

Reply | Threaded
Open this post in threaded view
|

Re: admin control over cancelling autovacuum when blocked by a lock

Simon Riggs
On Mon, 29 Apr 2019 at 22:02, John Lumby <[hidden email]> wrote:
If a regular backend worker becomes blocked waiting for a lock held by an autovacuum worker,
then ,  depending on some decision made by deadlock detector,
the autovacuum worker may be signalled to terminate

This results in messages such as these in the statement log


ERROR:  canceling autovacuum task
CONTEXT:  automatic vacuum of table "somedb.aschema.thistable"

Is there any configuration parameter which controls this?
including deferring or even completely eliminating this cancelling?
so that,  in the eliminate case,  the waiter would simply go on waiting
until the autovacuum eventually finishes "thistable"?

I have not seen any parameter to do this.
Note that I don't mean some parameter which affects all locking behaviour,
only this autovacuum case.

Why would you want this?

If the autovacuum is cancelled, it will re-execute again in the near future, once your interrupting SQL has released locks. 

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Reply | Threaded
Open this post in threaded view
|

Re: admin control over cancelling autovacuum when blocked by a lock

johnlumby

On 04/29/2019 07:15 PM, Simon Riggs wrote:

> On Mon, 29 Apr 2019 at 22:02, John Lumby <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     If a regular backend worker becomes blocked waiting for a lock
>     held by an autovacuum worker,
>     then ,  depending on some decision made by deadlock detector,
>     the autovacuum worker may be signalled to terminate
>
>     [ ... ]
>
>
>
>     Is there any configuration parameter which controls this?
>
>      [ ... ]
>
>
> Why would you want this?
Because it greatly reduces rate of growth of certain indexes in some
workloads
(based on test in which I modified the code to change the cancel to a
plain WARNING)
>
> If the autovacuum is cancelled, it will re-execute again in the near
> future, once your interrupting SQL has released locks.
In the particular workload I'm investigating, which involves a great
deal of repetitive
insert and delete activity on one table with several indexes,
even with the most aggressive autovacuum settings I can find,
it ends up being cancelled nearly 99% of the time,  and (question)
I *think* when an autovacuum worker is cancelled,  all or most of the work
it did on its last table is undone  --  true?     I am basing that on
observing
growth of dead rows,  not on the code,  so not sure.
But changing the cancel to a WARNING certainly prevents
growth of dead rows without slowing the workload down unacceptably.

John Lumby
>
> --
> Simon Riggs http://www.2ndQuadrant.com/ <http://www.2ndquadrant.com/>
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: admin control over cancelling autovacuum when blocked by a lock

Tom Lane-2
In reply to this post by Simon Riggs
Simon Riggs <[hidden email]> writes:
> On Mon, 29 Apr 2019 at 22:02, John Lumby <[hidden email]> wrote:
>> If a regular backend worker becomes blocked waiting for a lock held by an
>> autovacuum worker,
>> then ,  depending on some decision made by deadlock detector,
>> the autovacuum worker may be signalled to terminate
>> Is there any configuration parameter which controls this?
>> including deferring or even completely eliminating this cancelling?

> Why would you want this?
> If the autovacuum is cancelled, it will re-execute again in the near
> future, once your interrupting SQL has released locks.

I think the OP is looking for the opposite, ie prevent autovacuum from
being kicked off the lock.  It already is resistant to that if we're
doing an anti-wraparound vacuum, so maybe the simplest answer is to
tighten up autovacuum_freeze_max_age.  (Whether that's a good idea
is a different question.)  Or you could just launch a manual vacuum.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: admin control over cancelling autovacuum when blocked by a lock

Tom Lane-2
In reply to this post by johnlumby
John Lumby <[hidden email]> writes:
> On 04/29/2019 07:15 PM, Simon Riggs wrote:
>> Why would you want this?

> Because it greatly reduces rate of growth of certain indexes in some
> workloads
> (based on test in which I modified the code to change the cancel to a
> plain WARNING)

[ raised eyebrow... ]  Shouldn't do so, I think.  What exactly are you
doing that's causing the cancel?

> I *think* when an autovacuum worker is cancelled,  all or most of the work
> it did on its last table is undone  --  true?

If what's canceled is the final step of truncating off empty end pages
in the table, you shouldn't lose anything except the actual truncation.
And that's normally the only part that gets canceled, because it's the
only part that takes a strong lock.  If you're losing index cleanup
work, it suggests that you're repeatedly executing DDL that requires
exclusive lock on the table.  Maybe you could avoid doing that?

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: admin control over cancelling autovacuum when blocked by a lock

Simon Riggs
On Tue, 30 Apr 2019 at 02:44, Tom Lane <[hidden email]> wrote:
John Lumby <[hidden email]> writes:
> On 04/29/2019 07:15 PM, Simon Riggs wrote:
>> Why would you want this?

> Because it greatly reduces rate of growth of certain indexes in some
> workloads
> (based on test in which I modified the code to change the cancel to a
> plain WARNING)

[ raised eyebrow... ]  Shouldn't do so, I think.  What exactly are you
doing that's causing the cancel?

> I *think* when an autovacuum worker is cancelled,  all or most of the work
> it did on its last table is undone  --  true?

...
 
If you're losing index cleanup
work, it suggests that you're repeatedly executing DDL that requires
exclusive lock on the table.

Exactly the point of my question.

But further than this, if we changed things as requested the DDL being executed would be forced to wait behind the autovacuum, frustrating the timely execution of the DDL.

Hence: Why would you want this?
 
Maybe you could avoid doing that?

Seems like the only way out of this dilemma. 

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Reply | Threaded
Open this post in threaded view
|

Re: admin control over cancelling autovacuum when blocked by a lock

johnlumby
Thanks both for all the thoughts.  some replies interspersed.

On 04/30/2019 02:27 AM, Simon Riggs wrote:

> On Tue, 30 Apr 2019 at 02:44, Tom Lane <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     John Lumby <[hidden email] <mailto:[hidden email]>>
>     writes:
>     > On 04/29/2019 07:15 PM, Simon Riggs wrote:
>     >> Why would you want this?
>
>     > Because it greatly reduces rate of growth of certain indexes in
>     some
>     > workloads
>     > (based on test in which I modified the code to change the cancel
>     to a
>     > plain WARNING)
>
>     [ raised eyebrow... ]  Shouldn't do so, I think.  What exactly are you
>     doing that's causing the cancel?
>

Only SELECTs, DELETEs, INSERTs and ANALYZE.   I have pasted the SQL
commands I'm using below.
This is intended to simulate a customer's real workload.     Note there are
intentionally no pauses between each command and the thing runs
continuously.
Maybe adding pauses would "help" but I don't want to help it in that way.

>
>     > I *think* when an autovacuum worker is cancelled, all or most of
>     the work
>     > it did on its last table is undone  --  true?
>
>
> ...
>
>     If you're losing index cleanup
>     work, it suggests that you're repeatedly executing DDL that requires
>     exclusive lock on the table.
>

No CREATE/DROP DDL at all  -
but it does do ANALYZE  -  in this context is that "DDL"? (I hope not ...).
Also pg_catalog.setval() function  -  ditto.

>
> Exactly the point of my question.
>
> But further than this, if we changed things as requested the DDL being
> executed would be forced to wait behind the autovacuum, frustrating
> the timely execution of the DDL.
>
> Hence: Why would you want this?

What I am looking for is an *automatic* way of limiting growth of dead
rows (and number of index pages)
to something controllable and much less than what happens today when the
autovacuum is almost
always cancelled.      I am willing to accept some degree of slowdown of
the workload.

So regarding Tom's suggestion in his first append
                      Or you could just launch a manual vacuum.
Yes,    that certainly achieves the limiting-growth objective if the
workload is occasionally halted,
but that is hard to automate in an effective way.

>     Maybe you could avoid doing that?
>
>
> Seems like the only way out of this dilemma.
>
> --
> Simon Riggs http://www.2ndQuadrant.com/ <http://www.2ndquadrant.com/>
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

I would add that,  to date,  all the tests I have run have been on
version 9.4,  for customer-related
reasons.  My original post was merely a question as to whether a
suitable config parm exists -
answer is clearly no.     Since the discussion has expanded into exactly
why autovacuum is being
cancelled in my workload,  I'll re-run on vanilla 11.2 just to see if
maybe things work differently.
I will also collect pg_locks information and try to get more information
on exactly what is happening.


The workload   (a reconstruction of part of what one real customer does do)
consists of an endless loop of these.
     The three interpolation variables are typically set to values such as
num_ins_del = 512    num_text = 1232   num_ttl = 2579

Note that the workload updates only one table,  this table,
and is chosen so that the total number of rows in thistable remains
constant,
and the counts are there to verify that.   So all growth is solely from
splits,  dead rows,  etc etc.

#  preliminary counts
  "select max(id) as max_mv_id from thistable;"

  "select count(*) as num_dynamic_addresses from thistable MV inner join
entity_ip4a IPAD on IPAD.id = MV.owner_id and MV.field_id = 1966090;"

  "select count(*) as num_text from thistable MV inner join entity_rrs
RRREC on RRREC.id = MV.owner_id and MV.field_id = 65548;"

  "select count(*) as num_ttl from thistable MV inner join entity_rrs
RRREC on RRREC.id = MV.owner_id and MV.field_id = 65543;"

#  analyze and then preliminary stats
"ANALYZE VERBOSE thistable;"

  "select S.nspname as tbschema, T.relname as tbname, T.relpages as
tbpages, T.reltuples as tbtuples, X.nspname as ixschema, N.relname as
ixname, D.indisunique as isuniq, D.indisclustered as isclustered ,
D.indisprimary as ispkey , UTB.n_live_tup as livetuples , UTB.n_dead_tup
as deadtuples , STIX.avg_leaf_density , N.relpages as ixpages from
pg_class T inner join pg_class N on T.relkind = 'r' and N.relkind = 'i'
and T.relname = 'thistable' inner join pg_index D on T.oid = D.indrelid
and N.oid = D.indexrelid inner join pg_namespace S on S.oid =
T.relnamespace and S.nspname = 'public' inner join pg_namespace X on
X.oid = N.relnamespace inner join pg_stat_user_tables UTB on UTB.relid =
T.oid cross join LATERAL ( SELECT avg_leaf_density FROM
pgstatindex(N.relname) ) as STIX(avg_leaf_density) order by T.relname,
N.relname;";

#  now do the work
"WITH max_sess_id(id) as ( select max(id) from session_info )
          insert into history select nextval('history_id_seq') , 0 ,
'delete ${num_ins_del} kpIPAddressComment metadata values from current
owners and insert under new MV-less owners' , 'system' , 2 ,
LOCALTIMESTAMP , max_sess_id.id , null FROM max_sess_id;
     WITH INDEXED_NO_MV_IP(ip_index , ip_id) as (
          select row_number() over ( order by IPAD.id ) as ip_index ,
IPAD.id from entity_ip4a IPAD left join thistable MV on IPAD.id =
MV.owner_id where MV.id IS NULL order by 1 limit ${num_ins_del}
      ) , INDEXED_HAS_MV_IP(mv_index , mv_id , version , owner_id ,
floatnumber , text , longnumber , \"boolean\" , \"timestamp\" , field_id
) as (
      select row_number() over ( ORDER BY HMV.id ) as mv_index , HMV.id
, HMV.version , HMV.owner_id , HMV.floatnumber , HMV.text ,
HMV.longnumber , HMV.\"boolean\" , HMV.\"timestamp\" , HMV.field_id from
thistable HMV inner join entity_ip4a HIP on HIP.id = HMV.owner_id and
HMV.field_id = 1966090 order by 1 limit ${num_ins_del}
  ) , DEL_MV_IP(mv_index , mv_id , version , owner_id , floatnumber ,
text , longnumber , \"boolean\" , \"timestamp\" , field_id) as (
      delete from thistable DLMV USING INDEXED_HAS_MV_IP where DLMV.id =
INDEXED_HAS_MV_IP.mv_id returning INDEXED_HAS_MV_IP.mv_index , DLMV.*
  ) , HIBSEQ(nextval) as ( select ( (
(nextval('public.hibernate_unique_sequence'::regclass) - 1) * 32768 ) + 1 )
  ) INSERT INTO thistable select HIBSEQ.nextval + INMI.ip_index ,
DMI.version , INMI.ip_id , DMI.floatnumber , DMI.text , DMI.longnumber ,
DMI.\"boolean\" , DMI.\"timestamp\" , DMI.field_id from INDEXED_NO_MV_IP
INMI inner join DEL_MV_IP DMI on DMI.mv_index = INMI.ip_index CROSS JOIN
HIBSEQ;
    WITH MAXES (mxi) as ( VALUES ( (select max(id) from access_rights) )
, ( (select max(id) from metadata_field ) ) , ( (select max(id) from
thistable ) ) , ( (select max(id) from entity_file ) ) , ( (select
max(id) from filter ) ) , ( (select max(id) from locale ) ) ) SELECT
pg_catalog.setval('hibernate_unique_sequence', ((max(mxi)/32768)+1),
true) from MAXES;
          COMMIT WORK;";

  "WITH max_sess_id(id) as ( select max(id) from session_info )
          insert into history select nextval('history_id_seq') , 0 ,
'delete ${num_text} text metadata values from current owners and insert
under new text-less owners' , 'system' , 2 , LOCALTIMESTAMP ,
max_sess_id.id , null FROM max_sess_id;
     WITH INDEXED_NO_MV_RR(rr_index , rr_id) as (
          select row_number() over ( order by RRREC.id ) as rr_index ,
RRREC.id from entity_rrs RRREC left join thistable MV on RRREC.id =
MV.owner_id and MV.field_id = 65548 where MV.id IS NULL order by 1 limit
${num_text}
      ) , INDEXED_HAS_MV_RR(mv_index , mv_id , version , owner_id ,
floatnumber , text , longnumber , \"boolean\" , \"timestamp\" , field_id
) as (
      select row_number() over ( ORDER BY HMV.id ) as mv_index , HMV.id
, HMV.version , HMV.owner_id , HMV.floatnumber , HMV.text ,
HMV.longnumber , HMV.\"boolean\" , HMV.\"timestamp\" , HMV.field_id from
thistable HMV inner join entity_rrs HIP on HIP.id = HMV.owner_id and
HMV.field_id = 65548 order by 1 limit ${num_text}
  ) , DEL_MV_RR(mv_index , mv_id , version , owner_id , floatnumber ,
text , longnumber , \"boolean\" , \"timestamp\" , field_id) as (
      delete from thistable DLMV USING INDEXED_HAS_MV_RR where DLMV.id =
INDEXED_HAS_MV_RR.mv_id returning INDEXED_HAS_MV_RR.mv_index , DLMV.*
  ) , HIBSEQ(nextval) as ( select ( (
(nextval('public.hibernate_unique_sequence'::regclass) - 1) * 32768 ) + 1 )
  ) INSERT INTO thistable select HIBSEQ.nextval + INMI.rr_index ,
DMI.version , INMI.rr_id , DMI.floatnumber , DMI.text , DMI.longnumber ,
DMI.\"boolean\" , DMI.\"timestamp\" , DMI.field_id from INDEXED_NO_MV_RR
INMI inner join DEL_MV_RR DMI on DMI.mv_index = INMI.rr_index CROSS JOIN
HIBSEQ;
    WITH MAXES (mxi) as ( VALUES ( (select max(id) from access_rights) )
, ( (select max(id) from metadata_field ) ) , ( (select max(id) from
thistable ) ) , ( (select max(id) from entity_file ) ) , ( (select
max(id) from filter ) ) , ( (select max(id) from locale ) ) ) SELECT
pg_catalog.setval('hibernate_unique_sequence', ((max(mxi)/32768)+1),
true) from MAXES;
          COMMIT WORK;";

  "WITH max_sess_id(id) as ( select max(id) from session_info )
          insert into history select nextval('history_id_seq') , 0 ,
'delete ${num_ttl} ttl metadata values from current owners and insert
under new ttl-less owners' , 'system' , 2 , LOCALTIMESTAMP ,
max_sess_id.id , null FROM max_sess_id;
     WITH INDEXED_NO_MV_RR(rr_index , rr_id) as (
          select row_number() over ( order by RRREC.id ) as rr_index ,
RRREC.id from entity_rrs RRREC left join thistable MV on RRREC.id =
MV.owner_id where MV.id IS NULL order by 1 limit ${num_ttl}
      ) , INDEXED_HAS_MV_RR(mv_index , mv_id , version , owner_id ,
floatnumber , text , longnumber , \"boolean\" , \"timestamp\" , field_id
) as (
      select row_number() over ( ORDER BY HMV.id ) as mv_index , HMV.id
, HMV.version , HMV.owner_id , HMV.floatnumber , HMV.text ,
HMV.longnumber , HMV.\"boolean\" , HMV.\"timestamp\" , HMV.field_id from
thistable HMV inner join entity_rrs HIP on HIP.id = HMV.owner_id and
HMV.field_id = 65543 order by 1 limit ${num_ttl}
  ) , DEL_MV_RR(mv_index , mv_id , version , owner_id , floatnumber ,
text , longnumber , \"boolean\" , \"timestamp\" , field_id) as (
      delete from thistable DLMV USING INDEXED_HAS_MV_RR where DLMV.id =
INDEXED_HAS_MV_RR.mv_id returning INDEXED_HAS_MV_RR.mv_index , DLMV.*
  ) , HIBSEQ(nextval) as ( select ( (
(nextval('public.hibernate_unique_sequence'::regclass) - 1) * 32768 ) + 1 )
  ) INSERT INTO thistable select HIBSEQ.nextval + INMI.rr_index ,
DMI.version , INMI.rr_id , DMI.floatnumber , DMI.text , DMI.longnumber ,
DMI.\"boolean\" , DMI.\"timestamp\" , DMI.field_id from INDEXED_NO_MV_RR
INMI inner join DEL_MV_RR DMI on DMI.mv_index = INMI.rr_index CROSS JOIN
HIBSEQ;
    WITH MAXES (mxi) as ( VALUES ( (select max(id) from access_rights) )
, ( (select max(id) from metadata_field ) ) , ( (select max(id) from
thistable ) ) , ( (select max(id) from entity_file ) ) , ( (select
max(id) from filter ) ) , ( (select max(id) from locale ) ) ) SELECT
pg_catalog.setval('hibernate_unique_sequence', ((max(mxi)/32768)+1),
true) from MAXES;
          COMMIT WORK;"
Reply | Threaded
Open this post in threaded view
|

Re: admin control over cancelling autovacuum when blocked by a lock

Rui DeSousa


On Apr 30, 2019, at 9:27 AM, John Lumby <[hidden email]> wrote:

 it does do ANALYZE  -  in this context is that "DDL"?

I would consider that DDL and the application should avoid doing it; otherwise, you end up with blocking issues where both readers and writers are blocked unnecessarily.  The application issuing analyze on its own temp tables is fine and non blocking.  If you must analyze tables then I would limit which processes have access to those tables to avoid blocking issues.

I would look at your auto vacuum/analyze settings as oppose to the application issuing analyze commands as those are going to be a non blocking option.

Reply | Threaded
Open this post in threaded view
|

Re: admin control over cancelling autovacuum when blocked by a lock

Michaeldba@sqlexec.com
As I remember, it used to be "in the old days" that autovacuum processes acted like primary ones blocking user requests.  But now, I think PG got it right making them act like secondary type processes that must be "outed" when conflicting with user requests.

What I find is the best vacuuming solution is a combination of 3 things:
1. separate autovacuum tuning for high and low load times (triggered via crontab)
2. ALTER table commands to specify autovacuum parms at the table level based on sql loads on particular tables.
3. Scheduled manual vacuums to help take the stress off of autovacuum daemons.

Regards,
Michael Vitale



Rui DeSousa wrote on 4/30/2019 10:12 AM:


On Apr 30, 2019, at 9:27 AM, John Lumby <[hidden email]> wrote:

 it does do ANALYZE  -  in this context is that "DDL"?

I would consider that DDL and the application should avoid doing it; otherwise, you end up with blocking issues where both readers and writers are blocked unnecessarily.  The application issuing analyze on its own temp tables is fine and non blocking.  If you must analyze tables then I would limit which processes have access to those tables to avoid blocking issues.

I would look at your auto vacuum/analyze settings as oppose to the application issuing analyze commands as those are going to be a non blocking option.


Reply | Threaded
Open this post in threaded view
|

Re: admin control over cancelling autovacuum when blocked by a lock

Rui DeSousa


On Apr 30, 2019, at 11:04 AM, MichaelDBA <[hidden email]> wrote:

 it used to be "in the old days" that autovacuum processes acted like primary ones blocking user requests.

Hmm, haven’t seen auto vacuum block; but I believe auto vacuum freeze is still a blocking event.

Reply | Threaded
Open this post in threaded view
|

Re: admin control over cancelling autovacuum when blocked by a lock

johnlumby
In reply to this post by Rui DeSousa
On 04/30/2019 10:12 AM, Rui DeSousa wrote:

>
>
>> On Apr 30, 2019, at 9:27 AM, John Lumby <[hidden email]
>> <mailto:[hidden email]>> wrote:
>>
>> it does do ANALYZE  -  in this context is that "DDL"?
>
> I would consider that DDL and the application should avoid doing it;
> otherwise, you end up with blocking issues where both readers and
> writers are blocked unnecessarily.  The application issuing analyze on
> its own temp tables is fine and non blocking.  If you must analyze
> tables then I would limit which processes have access to those tables
> to avoid blocking issues.

It turns out you are absolutely right  -  the ANALYZE being done on
every loop was the culprit.
Changing my workload script to make the ANALYZE done less often or when
signalled,
and switching to vanilla postgresql (11.2 ),  I see autovacuum not being
cancelled except for
exactly the times when the workload runs an ANALYZE.      There is a
near-one-to-one
correlation between workload ANALYZE and autovacuum being cancelled.

The only reason I had included the ANALYZE on every loop was so that I
could see the growth,
not an essential part of the workload.    So I think this gives me what
I need to limit growth
and occasionally monitor it.

But there is a bit of an irony in that I can either
       limit growth but not watch it continuously
       or not limit it and watch it grow
This may be a bit wild,  but,  I wonder if there would be any value in a
new configuration parameter
to address the very specific scenario of worker doing ANALYZE blocked by
autovacuum of same table.
After all,   in this scenario,  surely the user would prefer that the
ANALYZE would wait until after
the autovacuum has finished,    rather than cancel the autovacuum and
see un-vacuumed stats?

>
> I would look at your auto vacuum/analyze settings as oppose to the
> application issuing analyze commands as those are going to be a non
> blocking option.
>



Reply | Threaded
Open this post in threaded view
|

Re: admin control over cancelling autovacuum when blocked by a lock

Rui DeSousa


On Apr 30, 2019, at 4:04 PM, John Lumby <[hidden email]> wrote:

surely the user would prefer that the 
ANALYZE would wait until after
the autovacuum has finished

Not really… analyze takes an exclusive lock; I believe.  The result is that readers/analyze will block other readers and writes which is bad for concurrency.  Readers should never be blocked :)… 

I had the follow problem occur; which I hope will illustrate why you wouldn’t what do what you are recommending. 

I explicitly do not allow the application to issue DDL including the analyze command against the base tables; however, developers are developers and they will try anything at least once.  The application was coded with an analyze command and the result was huge blocking issues.

1. Application issue analyze
2. Analyze waits on exclusive lock
3. As, there is already a long running query running against the table
4. New readers; get blocked and are now waiting behind the analyze command.
5. Writes are also blocked and waiting behind the analyze command.
6. Long running query completes.
7. Analyze command fails due to lack of permissions.
8. Application retries; rinse and repeat.

* The real response is we had to kill the session attempting to issue analyze and issue a hot fix otherwise the blocking would cause all sorts of application issues.  



Reply | Threaded
Open this post in threaded view
|

Re: admin control over cancelling autovacuum when blocked by a lock

johnlumby
On 04/30/2019 04:34 PM, Rui DeSousa wrote:

>
>
>> On Apr 30, 2019, at 4:04 PM, John Lumby <[hidden email]
>> <mailto:[hidden email]>> wrote:
>>
>> surely the user would prefer that the
>> ANALYZE would wait until after
>> the autovacuum has finished
>
> Not really… analyze takes an exclusive lock; I believe.  The result is
> that readers/analyze will block other readers and writes which is bad
> for concurrency.  Readers should never be blocked :)…

Apparently not  -   at least,  not on the table being analyzed  : from
the 11.2 Reference :

|ANALYZE| requires only a read lock on the target table, so it can run
in parallel with other activity on the table.

Maybe it takes an exclusive lock on the catalog(?),   but (if it does)
that should be brief at the very end
and not conflict with queries except rarely with the planner reading the
catalog.

>
> I had the follow problem occur; which I hope will illustrate why you
> wouldn’t what do what you are recommending.
>
> [ ... ]
> * The real response is we had to kill the session attempting to issue
> analyze and issue a hot fix otherwise the blocking would cause all
> sorts of application issues.
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: admin control over cancelling autovacuum when blocked by a lock

Tom Lane-2
John Lumby <[hidden email]> writes:
> On 04/30/2019 04:34 PM, Rui DeSousa wrote:
>> Not really… analyze takes an exclusive lock; I believe.  The result is
>> that readers/analyze will block other readers and writes which is bad
>> for concurrency.  Readers should never be blocked :)…

> Apparently not  -   at least,  not on the table being analyzed  : from
> the 11.2 Reference :
> |ANALYZE| requires only a read lock on the target table, so it can run
> in parallel with other activity on the table.

That's kind of inaccurate.  A moment's experimentation will show you
that what it really takes is ShareUpdateExclusiveLock, which it does
mostly to ensure that no other ANALYZE is running concurrently on the
same table.  That lock type doesn't block ordinary reads or writes
on the table.  It probably conflicts with autovacuum though ...

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: admin control over cancelling autovacuum when blocked by a lock

Rui DeSousa


> On Apr 30, 2019, at 5:03 PM, Tom Lane <[hidden email]> wrote:
>
> John Lumby <[hidden email]> writes:
>> On 04/30/2019 04:34 PM, Rui DeSousa wrote:
>>> Not really… analyze takes an exclusive lock; I believe.  The result is
>>> that readers/analyze will block other readers and writes which is bad
>>> for concurrency.  Readers should never be blocked :)…
>
>> Apparently not  -   at least,  not on the table being analyzed  : from
>> the 11.2 Reference :
>> |ANALYZE| requires only a read lock on the target table, so it can run
>> in parallel with other activity on the table.
>
> That's kind of inaccurate.  A moment's experimentation will show you
> that what it really takes is ShareUpdateExclusiveLock, which it does
> mostly to ensure that no other ANALYZE is running concurrently on the
> same table.  That lock type doesn't block ordinary reads or writes
> on the table.  It probably conflicts with autovacuum though ...
>
> regards, tom lane


Looking back at some notes from 2017.  It was certain readers/writes that where being blocked as they we all issuing analyze on the same set of tables.

Nov  1 08:06:35 pgdb02 postgres[27386]: [2232-1] dbc1-LOG:  process 27386 acquired ShareUpdateExclusiveLock on relation 419539 of database 417843 after 1001.837 ms
Nov  1 08:06:35 pgdb02 postgres[27386]: [2232-2] dbc1-CONTEXT:  SQL statement "ANALYZE xxx.t1"
...
Nov  1 08:06:35 pgdb02 postgres[27386]: [2233-1] dbc1-WARNING:  skipping "yyy" --- only table or database owner can analyze it