BUG #15896: pg_upgrade from 10-or-earlier: TRAP: FailedAssertion(»!(metad->btm_version >= 3)«

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

BUG #15896: pg_upgrade from 10-or-earlier: TRAP: FailedAssertion(»!(metad->btm_version >= 3)«

PG Bug reporting form
The following bug has been logged on the website:

Bug reference:      15896
Logged by:          Christoph Berg
Email address:      [hidden email]
PostgreSQL version: 12beta2
Operating system:   Debian
Description:        

When pg_upgrading from 10-or-earlier to 12beta2 or 13devel, an assertion is
raised. (Spotted by Debian's postgresql-common upgrade tests. Previously
missed because we were only testing version+1 upgrades.)

TRAP: FailedAssertion(»!(metad->btm_version >= 3)«, Datei:
»/build/postgresql-12-3URvLF/postgresql-12-12~beta2/build/../src/backend/access/nbtree/nbtpage.c«,
Zeile: 665)

Reproducer:
10.9:
CREATE TABLE phone (name varchar(255) PRIMARY KEY, tel int NOT NULL);
INSERT INTO phone VALUES ('Alice', 2);
CREATE USER foo;
GRANT INSERT ON phone TO foo;
INSERT INTO phone VALUES ('Bob', 1);

pg_upgradecluster -m upgrade 10 foo
(runs /usr/lib/postgresql/12/bin/initdb -D /var/lib/postgresql/12/foo
--auth-local peer --auth-host md5 --encoding UTF8 --lc-collate de_DE.utf8
--lc-ctype de_DE.utf8
/usr/lib/postgresql/12/bin/pg_upgrade -b /usr/lib/postgresql/10/bin -B
/usr/lib/postgresql/12/bin -p 5434 -P 5435 -d /etc/postgresql/10/foo -D
/etc/postgresql/12/foo)

12beta2:
SELECT * FROM phone ORDER BY name;
TRAP: FailedAssertion(»!(metad->btm_version >= 3)«, Datei:
»/build/postgresql-12-3URvLF/postgresql-12-12~beta2/build/../src/backend/access/nbtree/nbtpage.c«,
Zeile: 665)
2019-07-05 11:34:35.040 CEST [14325] LOG:  Serverprozess (PID 14376) wurde
von Signal 6 beendet: Abgebrochen
2019-07-05 11:34:35.040 CEST [14325] DETAIL:  Der fehlgeschlagene Prozess
führte aus: SELECT * FROM phone ORDER BY name;

Combinations tested:
bad: 10->12 9.6->12 9.6->13
good: 10->11 11->12 12->13

Reply | Threaded
Open this post in threaded view
|

Re: BUG #15896: pg_upgrade from 10-or-earlier: TRAP: FailedAssertion(»!(metad->btm_version >= 3)«

Bernd Helmle
Am Freitag, den 05.07.2019, 09:59 +0000 schrieb PG Bug reporting form:

> 12beta2:
> SELECT * FROM phone ORDER BY name;
> TRAP: FailedAssertion(»!(metad->btm_version >= 3)«, Datei:
> »/build/postgresql-12-3URvLF/postgresql-12-
> 12~beta2/build/../src/backend/access/nbtree/nbtpage.c«,
> Zeile: 665)
> 2019-07-05 11:34:35.040 CEST [14325] LOG:  Serverprozess (PID 14376)
> wurde
> von Signal 6 beendet: Abgebrochen
> 2019-07-05 11:34:35.040 CEST [14325] DETAIL:  Der fehlgeschlagene
> Prozess
> führte aus: SELECT * FROM phone ORDER BY name;

Additional note from me after a short test:

It works if you upgrade to PG11 first _and_ have touched the index in a
way it has rewritten the metapage to version 3, e.g. adding new pages
to the index.

Thanks

        Bernd




Reply | Threaded
Open this post in threaded view
|

Re: BUG #15896: pg_upgrade from 10-or-earlier: TRAP: FailedAssertion(»!(metad->btm_version >= 3)«

Alvaro Herrera-9
In reply to this post by PG Bug reporting form
Adding Peter G. to CC.

On 2019-Jul-05, PG Bug reporting form wrote:

> When pg_upgrading from 10-or-earlier to 12beta2 or 13devel, an assertion is
> raised. (Spotted by Debian's postgresql-common upgrade tests. Previously
> missed because we were only testing version+1 upgrades.)
>
> TRAP: FailedAssertion(»!(metad->btm_version >= 3)«, Datei:
> »/build/postgresql-12-3URvLF/postgresql-12-12~beta2/build/../src/backend/access/nbtree/nbtpage.c«,
> Zeile: 665)

Seems that _bt_getrootheight is too optimistic about the metapage
version it'll find.  I suppose this could be handled by just not caching
the metapage if it is of the old version ... or maybe by calling
_bt_upgrademetapage().

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: BUG #15896: pg_upgrade from 10-or-earlier: TRAP: FailedAssertion(»!(metad->btm_version >= 3)«

Peter Geoghegan-4
On Fri, Jul 5, 2019 at 8:49 AM Alvaro Herrera <[hidden email]> wrote:
> > TRAP: FailedAssertion(»!(metad->btm_version >= 3)«, Datei:
> > »/build/postgresql-12-3URvLF/postgresql-12-12~beta2/build/../src/backend/access/nbtree/nbtpage.c«,
> > Zeile: 665)
>
> Seems that _bt_getrootheight is too optimistic about the metapage
> version it'll find.  I suppose this could be handled by just not caching
> the metapage if it is of the old version ... or maybe by calling
> _bt_upgrademetapage().

The problem here predates v12 -- the call to _bt_cachemetadata() was
added to _bt_getrootheight() by commit 0a64b45152b, which went into
v11. My commit dd299df8189 added a new assertion that fails, but
that's just a symptom -- I changed the code in  _bt_getrootheight() to
use a BTMetaPageData pointer to shared memory (i.e. a pointer to the
authoritative version), rather than using the newly-out-of-sync cached
version. It shouldn't be out-of-sync at all.

_bt_getrootheight() is mostly just something that exists for the
planner, so it has no business calling _bt_cachemetadata(), which will
"upgrade" the cached metadata image from version 2 to version 3 if it
happens to be on version 2. How can it be okay to upgrade the cached
version without also upgrading the on-disk/shared_buffers version?
This bug was hiding in plain sight.

--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: BUG #15896: pg_upgrade from 10-or-earlier: TRAP: FailedAssertion(»!(metad->btm_version >= 3)«

Peter Geoghegan-4
In reply to this post by PG Bug reporting form
On Fri, Jul 5, 2019 at 3:00 AM PG Bug reporting form
<[hidden email]> wrote:
> When pg_upgrading from 10-or-earlier to 12beta2 or 13devel, an assertion is
> raised. (Spotted by Debian's postgresql-common upgrade tests. Previously
> missed because we were only testing version+1 upgrades.)

Obviously you were right to adopt pg_upgrade tests that cross more
than one version at a time. Thanks!

--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: BUG #15896: pg_upgrade from 10-or-earlier: TRAP: FailedAssertion(»!(metad->btm_version >= 3)«

Christoph Berg-2
Re: Peter Geoghegan 2019-07-06 <CAH2-WznYXDG6aANoag5ixoMrV=[hidden email]>
> > When pg_upgrading from 10-or-earlier to 12beta2 or 13devel, an assertion is
> > raised. (Spotted by Debian's postgresql-common upgrade tests. Previously
> > missed because we were only testing version+1 upgrades.)
>
> Obviously you were right to adopt pg_upgrade tests that cross more
> than one version at a time. Thanks!

Indeed, and running 9.6->10->11 tests (that were present before as
well, but never run) revealed another bug in pg_upgradecluster
itself... Testsuites are just awesome.

Christoph


Reply | Threaded
Open this post in threaded view
|

Re: BUG #15896: pg_upgrade from 10-or-earlier: TRAP: FailedAssertion(»!(metad->btm_version >= 3)«

Peter Geoghegan-4
In reply to this post by Peter Geoghegan-4
On Fri, Jul 5, 2019 at 3:14 PM Peter Geoghegan <[hidden email]> wrote:

> On Fri, Jul 5, 2019 at 8:49 AM Alvaro Herrera <[hidden email]> wrote:
> > > TRAP: FailedAssertion(»!(metad->btm_version >= 3)«, Datei:
> > > »/build/postgresql-12-3URvLF/postgresql-12-12~beta2/build/../src/backend/access/nbtree/nbtpage.c«,
> > > Zeile: 665)
> >
> > Seems that _bt_getrootheight is too optimistic about the metapage
> > version it'll find.  I suppose this could be handled by just not caching
> > the metapage if it is of the old version ... or maybe by calling
> > _bt_upgrademetapage().
>
> The problem here predates v12 -- the call to _bt_cachemetadata() was
> added to _bt_getrootheight() by commit 0a64b45152b, which went into
> v11. My commit dd299df8189 added a new assertion that fails, but
> that's just a symptom -- I changed the code in  _bt_getrootheight() to
> use a BTMetaPageData pointer to shared memory (i.e. a pointer to the
> authoritative version), rather than using the newly-out-of-sync cached
> version. It shouldn't be out-of-sync at all.
>
> _bt_getrootheight() is mostly just something that exists for the
> planner, so it has no business calling _bt_cachemetadata(), which will
> "upgrade" the cached metadata image from version 2 to version 3 if it
> happens to be on version 2. How can it be okay to upgrade the cached
> version without also upgrading the on-disk/shared_buffers version?
> This bug was hiding in plain sight.

CC'ing Alexander, who was also involved in commit 0a64b45152b...

--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: BUG #15896: pg_upgrade from 10-or-earlier: TRAP: FailedAssertion(»!(metad->btm_version >= 3)«

Michael Paquier-2
On Thu, Jul 11, 2019 at 12:18:46AM -0700, Peter Geoghegan wrote:
> On Fri, Jul 5, 2019 at 3:14 PM Peter Geoghegan <[hidden email]> wrote:
>> _bt_getrootheight() is mostly just something that exists for the
>> planner, so it has no business calling _bt_cachemetadata(), which will
>> "upgrade" the cached metadata image from version 2 to version 3 if it
>> happens to be on version 2. How can it be okay to upgrade the cached
>> version without also upgrading the on-disk/shared_buffers version?
>> This bug was hiding in plain sight.
>
> CC'ing Alexander, who was also involved in commit 0a64b45152b...

So the problem has been introduced in v11 per this commit, still we
only see the issue since v12 because your code relied on a wrong
assumption.  Per what I am reading, it seems to me that we should fix
v11, but that's a live problem for v12 because of the page format
upgrade so we need to track it as an open item.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: BUG #15896: pg_upgrade from 10-or-earlier: TRAP: FailedAssertion(»!(metad->btm_version >= 3)«

Peter Geoghegan-4
On Thu, Jul 11, 2019 at 1:16 AM Michael Paquier <[hidden email]> wrote:
> So the problem has been introduced in v11 per this commit, still we
> only see the issue since v12 because your code relied on a wrong
> assumption.

We see the issue on v12 because my code added an assertion that caught
the issue. The behavior on v12 is otherwise substantively the same as
on v11. Even if it was true that my commits made nbtree rely on the
same "wrong assumption" more often than on v11, why would it matter?
The bug is still a v11 bug.

> Per what I am reading, it seems to me that we should fix
> v11, but that's a live problem for v12 because of the page format
> upgrade so we need to track it as an open item.

The highest priority ought to be fixing the problem in the stable v11
branch. That said, there is hardly any difference between v11 and v12.
It isn't possible to upgrade a B-Tree index from v2/v3 to v4 (the
latest version) on-the-fly. But if in your judgement it makes sense to
add a v12 open item, I have no objection.

--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: BUG #15896: pg_upgrade from 10-or-earlier: TRAP: FailedAssertion(»!(metad->btm_version >= 3)«

Peter Geoghegan-4
In reply to this post by Peter Geoghegan-4
On Fri, Jul 5, 2019 at 3:14 PM Peter Geoghegan <[hidden email]> wrote:
> The problem here predates v12 -- the call to _bt_cachemetadata() was
> added to _bt_getrootheight() by commit 0a64b45152b, which went into
> v11. My commit dd299df8189 added a new assertion that fails, but
> that's just a symptom -- I changed the code in  _bt_getrootheight() to
> use a BTMetaPageData pointer to shared memory (i.e. a pointer to the
> authoritative version), rather than using the newly-out-of-sync cached
> version. It shouldn't be out-of-sync at all.

The attached HEAD-only patch fixes the issue by not actually upgrading
the page within _bt_cachemetadata(), while making some existing
assertions less aggressive. I haven't tested it properly just yet --
ran out of time to do that today.

Maybe you can try this out with your original test case, Christoph?

Thanks
--
Peter Geoghegan

0001-Fix-nbtree-metapage-cache-upgrade-bug.patch (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: BUG #15896: pg_upgrade from 10-or-earlier: TRAP: FailedAssertion(»!(metad->btm_version >= 3)«

Christoph Berg-2
Re: Peter Geoghegan 2019-07-16 <CAH2-Wz=[hidden email]>
> Maybe you can try this out with your original test case, Christoph?

I applied the patch to 12/HEAD and the Debian testsuite now passes the
10->12 upgrade test.

Thanks,
Christoph