Unexpected serialization error

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Unexpected serialization error

Luka Žitnik
Hi guys,

I had no luck over at general slack channel. So I'm beginning to treat this as a bug. Here's a test case that unexpectedly fails at last insert. I expect it not to fail because the rows that the two transactions act on are unrelated to one another.

CREATE TABLE t1 (
    class integer NOT NULL
);

CREATE INDEX ON t1 (class);

CREATE TABLE t2 (
    class integer NOT NULL
);

CREATE INDEX ON t2 (class);

ALTER SYSTEM SET enable_seqscan TO off;

-- Session 1
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SELECT * FROM t2 WHERE class=1;

-- Session 2
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SELECT * FROM t1 WHERE class=2;
INSERT INTO t2 VALUES(2);
COMMIT;

-- Session 1
INSERT INTO t1 VALUES(1);
COMMIT;
Postgres version is PostgreSQL 12.5 on x86_64-pc-linux-musl, compiled by gcc (Alpine 9.3.0) 9.3.0, 64-bit.

-- Lule

Reply | Threaded
Open this post in threaded view
|

Re: Unexpected serialization error

Tom Lane-2
=?UTF-8?B?THVrYSDFvWl0bmlr?= <[hidden email]> writes:
> I had no luck over at general slack channel. So I'm beginning to treat this
> as a bug. Here's a test case that unexpectedly fails at last insert. I
> expect it not to fail because the rows that the two transactions act on are
> unrelated to one another.

The bug is that you're assuming exact tracking of the SSI serialization
rules. It's not done that way, because it'd be prohibitively expensive.
(I've not dug into this example in any detail, but I suspect it's
acquiring page-level not tuple-level predicate locks, thus the
transactions conflict because each tries to update a page the other one
already read.)

The short answer is that ANY application that's depending on serializable
mode MUST be prepared to retry serialization failures.  You don't get to
skip that just because there theoretically shouldn't be a failure.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Unexpected serialization error

Luka Žitnik
Got it, thanks. If this particular test case implied worse than expected performance (e.g. significantly more than the 0.03% of failures, http://vldb.org/pvldb/vol5/p1850_danrkports_vldb2012.pdf, RUBiS example), it probably would have been taken into consideration.

Btw, more out of curiosity, is it possible to nudge the test to pass by changing some of these settings?
ssi_test=# select name from pg_settings where name ~ 'pred';
              name              
--------------------------------
 max_pred_locks_per_page
 max_pred_locks_per_relation
 max_pred_locks_per_transaction
(3 rows)

Here's the lock types I see right before the last insert:
ssi_test=# SELECT mode, locktype, relation::regclass, page, tuple
FROM pg_locks WHERE mode = 'SIReadLock';
    mode    | locktype |   relation   | page | tuple
------------+----------+--------------+------+-------
 SIReadLock | page     | t1_class_idx |    1 |      
 SIReadLock | page     | t2_class_idx |    1 |      
(2 rows)

On Sat, Feb 13, 2021 at 6:21 PM Tom Lane <[hidden email]> wrote:
=?UTF-8?B?THVrYSDFvWl0bmlr?= <[hidden email]> writes:
> I had no luck over at general slack channel. So I'm beginning to treat this
> as a bug. Here's a test case that unexpectedly fails at last insert. I
> expect it not to fail because the rows that the two transactions act on are
> unrelated to one another.

The bug is that you're assuming exact tracking of the SSI serialization
rules. It's not done that way, because it'd be prohibitively expensive.
(I've not dug into this example in any detail, but I suspect it's
acquiring page-level not tuple-level predicate locks, thus the
transactions conflict because each tries to update a page the other one
already read.)

The short answer is that ANY application that's depending on serializable
mode MUST be prepared to retry serialization failures.  You don't get to
skip that just because there theoretically shouldn't be a failure.

                        regards, tom lane
Reply | Threaded
Open this post in threaded view
|

Re: Unexpected serialization error

Andres Freund
In reply to this post by Luka Žitnik
Hi,

On 2021-02-12 19:14:17 +0100, Luka Žitnik wrote:

> I had no luck over at general slack channel. So I'm beginning to treat this
> as a bug. Here's a test case that unexpectedly fails at last insert. I
> expect it not to fail because the rows that the two transactions act on are
> unrelated to one another.
>
> CREATE TABLE t1 (
>     class integer NOT NULL
> );
>
> CREATE INDEX ON t1 (class);
>
> CREATE TABLE t2 (
>     class integer NOT NULL
> );
>
> CREATE INDEX ON t2 (class);
>
> ALTER SYSTEM SET enable_seqscan TO off;
>
> -- Session 1
> BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
> SELECT * FROM t2 WHERE class=1;
>
> -- Session 2
> BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
> SELECT * FROM t1 WHERE class=2;
> INSERT INTO t2 VALUES(2);
> COMMIT;
>
> -- Session 1
> INSERT INTO t1 VALUES(1);
> COMMIT;

I think you're hitting multiple issues here... For one, initially your
table is empty, in which case we mark the entire index to be a conflict
during for the SELECT * FROM t2 WHERE class=1;

Second, even if the indexes weren't empty, the granularity of the index
predicate locking is a page - which means that S1's predicate lock for
SELECT * FROM t1 WHERE class=2; will conflict with S1's INSERT INTO t1
VALUES(1).

If the SELECTs actual hit rows (and the inserts were updates), you'd
presumably have the sequence suceed. Same with an index where the
inserted values weren't on the same page as the looked up values.

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: Unexpected serialization error

Luka Žitnik
Thank you, Andres. I appreciate your thoughts. So, I do notice the lock level changes from table to page as soon as there are various rows in the table. But what's really astonishing to me is to start looking at pages as predicate granularity and, consequently, the possibility of the same test passing with certain amount of data in the tables. This is a great find for me.

On Mon, 22 Feb 2021, 02:28 Andres Freund, <[hidden email]> wrote:
Hi,

On 2021-02-12 19:14:17 +0100, Luka Žitnik wrote:
> I had no luck over at general slack channel. So I'm beginning to treat this
> as a bug. Here's a test case that unexpectedly fails at last insert. I
> expect it not to fail because the rows that the two transactions act on are
> unrelated to one another.
>
> CREATE TABLE t1 (
>     class integer NOT NULL
> );
>
> CREATE INDEX ON t1 (class);
>
> CREATE TABLE t2 (
>     class integer NOT NULL
> );
>
> CREATE INDEX ON t2 (class);
>
> ALTER SYSTEM SET enable_seqscan TO off;
>
> -- Session 1
> BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
> SELECT * FROM t2 WHERE class=1;
>
> -- Session 2
> BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
> SELECT * FROM t1 WHERE class=2;
> INSERT INTO t2 VALUES(2);
> COMMIT;
>
> -- Session 1
> INSERT INTO t1 VALUES(1);
> COMMIT;

I think you're hitting multiple issues here... For one, initially your
table is empty, in which case we mark the entire index to be a conflict
during for the SELECT * FROM t2 WHERE class=1;

Second, even if the indexes weren't empty, the granularity of the index
predicate locking is a page - which means that S1's predicate lock for
SELECT * FROM t1 WHERE class=2; will conflict with S1's INSERT INTO t1
VALUES(1).

If the SELECTs actual hit rows (and the inserts were updates), you'd
presumably have the sequence suceed. Same with an index where the
inserted values weren't on the same page as the looked up values.

Greetings,

Andres Freund