Hot Standby Conflict on pg_attribute

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Hot Standby Conflict on pg_attribute

Erik Jones-2
Hello,

A client has recently had a couple of hot standby query conflict pile-ups around AccessShare lock waits on pg_attribute.  Here is an example from the log:

Mar 27 12:06:37 ip-10-0-125-5 7dc68e48_fbd9_41d7_9ab1_65599036dd75[118946]: [9-1]  sql_error_code = 00000 LOG:  process 118946 still waiting for AccessShareLock on relation 1249 of database 16401 after 1000.127 ms at character 92
Mar 27 12:06:37 ip-10-0-125-5 7dc68e48_fbd9_41d7_9ab1_65599036dd75[118946]: [9-2]  sql_error_code = 00000 DETAIL:  Process holding the lock: 9. Wait queue: 118948, 118950, 118708, 118818, 118886, 118961, 118960, 118806, 118963, 118959, 118881, 118887, 118878, 118896, 118964, 118965, 118945, 118949, 118946, 118743, 118966, 118947, 118967, 118968.
Mar 27 12:06:37 ip-10-0-125-5 7dc68e48_fbd9_41d7_9ab1_65599036dd75[118946]: [9-3]  sql_error_code = 00000 STATEMENT:  SELECT uc.id,
Mar 27 12:06:37 ip-10-0-125-5 7dc68e48_fbd9_41d7_9ab1_65599036dd75[118946]: [9-4]       uc.some_id,
Mar 27 12:06:37 ip-10-0-125-5 7dc68e48_fbd9_41d7_9ab1_65599036dd75[118946]: [9-5]       uc.utr_id,
Mar 27 12:06:37 ip-10-0-125-5 7dc68e48_fbd9_41d7_9ab1_65599036dd75[118946]: [9-6]       utr.name
Mar 27 12:06:37 ip-10-0-125-5 7dc68e48_fbd9_41d7_9ab1_65599036dd75[118946]: [9-7]     FROM usertable1 uc
Mar 27 12:06:37 ip-10-0-125-5 7dc68e48_fbd9_41d7_9ab1_65599036dd75[118946]: [9-8]     INNER JOIN usertable2 utr
Mar 27 12:06:37 ip-10-0-125-5 7dc68e48_fbd9_41d7_9ab1_65599036dd75[118946]: [9-9]       ON uc.utr_id = utr.id
Mar 27 12:06:37 ip-10-0-125-5 7dc68e48_fbd9_41d7_9ab1_65599036dd75[118946]: [9-10]     WHERE uc.some_id = $1
Mar 27 12:06:37 ip-10-0-125-5 7dc68e48_fbd9_41d7_9ab1_65599036dd75[118946]: [9-11]     ORDER BY name

Relation 1249 is pg_attribute and process 9 that was holding the lock was RecoveryWalAll process.  I've confirmed that autovacuum had removed some pages from pg_attribute shortly before this, which happens somewhat regularly since this client runs a couple thousand REFERSH MATARIALIZED VIEW queries per day which look to cause inserts and deletes there so it having an exclusive lock on pg_attribute makes sense.

The question then is: Why would these user queries be waiting on an AccessShare lock on pg_attribute?  Thus far we've been unable to recreate any transacitons with the above query (and others) that show any pg_attribute locks.  There is no ORM in play here and these queries are being sent as single query transactions via this Node.js postgres adapter: https://github.com/brianc/node-postgres which is pretty bare bones.

--
Erik Jones
[hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Hot Standby Conflict on pg_attribute

Andres Freund
Hi,

On 2019-05-09 13:03:50 -0700, Erik Jones wrote:
> The question then is: Why would these user queries be waiting on an
> AccessShare lock on pg_attribute?  Thus far we've been unable to recreate
> any transacitons with the above query (and others) that show any
> pg_attribute locks.  There is no ORM in play here and these queries are
> being sent as single query transactions via this Node.js postgres adapter:
> https://github.com/brianc/node-postgres which is pretty bare bones.

Queries that access a table for the *first* time after DDL happened
(including truncating the relation), need an AccessShareLock on
pg_attribute (and pg_class, pg_index, ...) for a short time.

You can reproduce that fairly easily:

S1: CREATE TABLE foo();
S2: BEGIN; LOCK pg_attribute;
S1: SELECT * FROM foo;
S2: COMMIT;

S1 could execute the select, because it has a cached view of the way the
relation looks.

S2: ALTER TABLE foo ADD COLUMN bar INT;
S2: BEGIN; LOCK pg_attribute;
S1: SELECT * FROM foo;

Here S1 is blocked, because it needs to look at pg_attribute to figure
out the "shape" of the table, but it's currently locked.

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: Hot Standby Conflict on pg_attribute

Erik Jones-2
Hi Andres,

Thank you very much!  That's exactly what I needed.

On Fri, May 10, 2019 at 12:14 PM Andres Freund <[hidden email]> wrote:
Hi,

On 2019-05-09 13:03:50 -0700, Erik Jones wrote:
> The question then is: Why would these user queries be waiting on an
> AccessShare lock on pg_attribute?  Thus far we've been unable to recreate
> any transacitons with the above query (and others) that show any
> pg_attribute locks.  There is no ORM in play here and these queries are
> being sent as single query transactions via this Node.js postgres adapter:
> https://github.com/brianc/node-postgres which is pretty bare bones.

Queries that access a table for the *first* time after DDL happened
(including truncating the relation), need an AccessShareLock on
pg_attribute (and pg_class, pg_index, ...) for a short time.

You can reproduce that fairly easily:

S1: CREATE TABLE foo();
S2: BEGIN; LOCK pg_attribute;
S1: SELECT * FROM foo;
S2: COMMIT;

S1 could execute the select, because it has a cached view of the way the
relation looks.

S2: ALTER TABLE foo ADD COLUMN bar INT;
S2: BEGIN; LOCK pg_attribute;
S1: SELECT * FROM foo;

Here S1 is blocked, because it needs to look at pg_attribute to figure
out the "shape" of the table, but it's currently locked.

Greetings,

Andres Freund


--
Erik Jones
[hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Hot Standby Conflict on pg_attribute

Tom Lane-2
In reply to this post by Andres Freund
Andres Freund <[hidden email]> writes:
> On 2019-05-09 13:03:50 -0700, Erik Jones wrote:
>> The question then is: Why would these user queries be waiting on an
>> AccessShare lock on pg_attribute?

> Queries that access a table for the *first* time after DDL happened
> (including truncating the relation), need an AccessShareLock on
> pg_attribute (and pg_class, pg_index, ...) for a short time.

Also, it seems likely that what's really triggering the issue is
autovacuum on pg_attribute trying to truncate off empty pages
in pg_attribute (after a bunch of dead rows were generated there
by DDL activity).  That requires exclusive lock on pg_attribute,
which would propagate down to the standby.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Hot Standby Conflict on pg_attribute

Erik Jones-2
On Fri, May 10, 2019 at 12:41 PM Tom Lane <[hidden email]> wrote:
Andres Freund <[hidden email]> writes:
> On 2019-05-09 13:03:50 -0700, Erik Jones wrote:
>> The question then is: Why would these user queries be waiting on an
>> AccessShare lock on pg_attribute?

> Queries that access a table for the *first* time after DDL happened
> (including truncating the relation), need an AccessShareLock on
> pg_attribute (and pg_class, pg_index, ...) for a short time.

Also, it seems likely that what's really triggering the issue is
autovacuum on pg_attribute trying to truncate off empty pages
in pg_attribute (after a bunch of dead rows were generated there
by DDL activity).  That requires exclusive lock on pg_attribute,
which would propagate down to the standby.

                        regards, tom lane

Right, that part I understood after checking out pg_attribute's insert/delete counts in pg_stat_sys_tables before and after some REFRESH MATERIALIZED VIEW runs on an otherwise idle server.  With them running 2k+ refreshes per day autovac is regularly working on their catalog tables.

Thanks!
--
Erik Jones
[hidden email]
jer
Reply | Threaded
Open this post in threaded view
|

Re: Hot Standby Conflict on pg_attribute

jer
Just a quick footnote: If autovac truncations are frequently causing replica lag, and if this is a problem for you, IIUC one way you can stop autovac from doing the truncations even on older versions is setting old_snapshot_threshold to any value at all besides zero.  (On 12+ you can directly control the truncation behavior.)

-Jeremy

Sent from my TI-83

On May 10, 2019, at 12:46, Erik Jones <[hidden email]> wrote:

On Fri, May 10, 2019 at 12:41 PM Tom Lane <[hidden email]> wrote:
Andres Freund <[hidden email]> writes:
> On 2019-05-09 13:03:50 -0700, Erik Jones wrote:
>> The question then is: Why would these user queries be waiting on an
>> AccessShare lock on pg_attribute?

> Queries that access a table for the *first* time after DDL happened
> (including truncating the relation), need an AccessShareLock on
> pg_attribute (and pg_class, pg_index, ...) for a short time.

Also, it seems likely that what's really triggering the issue is
autovacuum on pg_attribute trying to truncate off empty pages
in pg_attribute (after a bunch of dead rows were generated there
by DDL activity).  That requires exclusive lock on pg_attribute,
which would propagate down to the standby.

                        regards, tom lane

Right, that part I understood after checking out pg_attribute's insert/delete counts in pg_stat_sys_tables before and after some REFRESH MATERIALIZED VIEW runs on an otherwise idle server.  With them running 2k+ refreshes per day autovac is regularly working on their catalog tables.

Thanks!
--
Erik Jones
[hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Hot Standby Conflict on pg_attribute

Erik Jones-2
Thanks for the ti

On Sat, May 11, 2019 at 9:15 AM Jeremy Schneider <[hidden email]> wrote:
Just a quick footnote: If autovac truncations are frequently causing replica lag, and if this is a problem for you, IIUC one way you can stop autovac from doing the truncations even on older versions is setting old_snapshot_threshold to any value at all besides zero.  (On 12+ you can directly control the truncation behavior.)

-Jeremy

Thanks for the tip!

--
Erik Jones
[hidden email]