Upgrading from Postgresql 9.3.8 to 9.6.10

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Upgrading from Postgresql 9.3.8 to 9.6.10

Yuri Niyazov
We have an old cluster, running 9.3.8, that we are trying to upgrade to 9.6.10.

After running pg_upgrade and starting the server, and testing some common queries against it, we experienced the following error:

PG::IndexCorrupted: ERROR: index "table_pkey" contains unexpected zero page at block 17021871 HINT: Please REINDEX it.

So, if I am reading this correctly, there is an index that 9.3.8 created, and it recognizes that index as OK, but 9.6.10 thinks that index is corrupt. Since this happened on one index, it seems reasonable to assume that this could happen on any index in that database, and we need to reindex the entire database. 

Now, this is a live application, and we would like to minimize continuous downtime (multiple short downtimes are fine), so what we are considering doing is a manual reindex: for each index, create a new one, and then drop the old one, and after all that, upgrade.

However, we are leery of doing this reindexing using 9.3.8, since it's already demonstrated itself to be unreliable. 

Which version should we use to reindex? Just the latest 9.3.X? Was there a known bug with older versions missing corrupted pages in indices? 

PS:  Just in case I am glaringly doing something wrong:

Here's our mechanism of doing the upgrade: we create a streaming replica from a basebackup, and then at some point turn off writes to the primary, convert the replica to another primary, and then run pg_upgrade on this new primary. 
Reply | Threaded
Open this post in threaded view
|

Re: Upgrading from Postgresql 9.3.8 to 9.6.10

Ron-2
On 3/26/19 12:51 AM, Yuri Niyazov wrote:

> We have an old cluster, running 9.3.8, that we are trying to upgrade to
> 9.6.10.
>
> After running pg_upgrade and starting the server, and testing some common
> queries against it, we experienced the following error:
>
> PG::IndexCorrupted: ERROR: index "table_pkey" contains unexpected zero
> page at block 17021871 HINT: Please REINDEX it.
>
> So, if I am reading this correctly, there is an index that 9.3.8 created,
> and it recognizes that index as OK, but 9.6.10 thinks that index is
> corrupt. Since this happened on one index, it seems reasonable to assume
> that this could happen on any index in that database,

Yes, *could*.

> and we need to reindex the entire database.

That does not seem justified.

> Now, this is a live application, and we would like to minimize continuous
> downtime (multiple short downtimes are fine), so what we are considering
> doing is a manual reindex: for each index, create a new one, and then drop
> the old one, and after all that, upgrade.

Won't work if you've got lots of FK constraints.

But "REINDEX INDEX foo;" works just fine.

> However, we are leery of doing this reindexing using 9.3.8, since it's
> already demonstrated itself to be unreliable.
>
> Which version should we use to reindex? Just the latest 9.3.X? Was there a
> known bug with older versions missing corrupted pages in indices?
>
> PS:  Just in case I am glaringly doing something wrong:
>
> Here's our mechanism of doing the upgrade: we create a streaming replica
> from a basebackup, and then at some point turn off writes to the primary,
> convert the replica to another primary, and then run pg_upgrade on this
> new primary.

--
Angular momentum makes the world go 'round.

Reply | Threaded
Open this post in threaded view
|

Re: Upgrading from Postgresql 9.3.8 to 9.6.10

Keith
In reply to this post by Yuri Niyazov


On Tue, Mar 26, 2019 at 1:51 AM Yuri Niyazov <[hidden email]> wrote:
We have an old cluster, running 9.3.8, that we are trying to upgrade to 9.6.10.

After running pg_upgrade and starting the server, and testing some common queries against it, we experienced the following error:

PG::IndexCorrupted: ERROR: index "table_pkey" contains unexpected zero page at block 17021871 HINT: Please REINDEX it.

So, if I am reading this correctly, there is an index that 9.3.8 created, and it recognizes that index as OK, but 9.6.10 thinks that index is corrupt. Since this happened on one index, it seems reasonable to assume that this could happen on any index in that database, and we need to reindex the entire database. 

Now, this is a live application, and we would like to minimize continuous downtime (multiple short downtimes are fine), so what we are considering doing is a manual reindex: for each index, create a new one, and then drop the old one, and after all that, upgrade.

However, we are leery of doing this reindexing using 9.3.8, since it's already demonstrated itself to be unreliable. 

Which version should we use to reindex? Just the latest 9.3.X? Was there a known bug with older versions missing corrupted pages in indices? 

PS:  Just in case I am glaringly doing something wrong:

Here's our mechanism of doing the upgrade: we create a streaming replica from a basebackup, and then at some point turn off writes to the primary, convert the replica to another primary, and then run pg_upgrade on this new primary. 


Have you done this more than once? If so, is it the same indexes every time you're having issues with?

Answering these questions will help you narrow down if the corruption is in your original database or something to do with the upgrade.

However, I would also highly recommend getting to the latest version of 9.3 and rebuilding your replicas afterwards. The latest patch release was 9.3.25, so your current version is quite far behind. There were quite a number of data corruption issues related to replication in the 9.3 series. More-so in the earlier patch versions, but I highly recommend also trying this before your major upgrade.

Keith
Reply | Threaded
Open this post in threaded view
|

Re: Upgrading from Postgresql 9.3.8 to 9.6.10

Yuri Niyazov
On Mon, Mar 25, 2019 at 11:44 PM Keith <[hidden email]> wrote:
On Tue, Mar 26, 2019 at 1:51 AM Yuri Niyazov <[hidden email]> wrote:
We have an old cluster, running 9.3.8, that we are trying to upgrade to 9.6.10.

After running pg_upgrade and starting the server, and testing some common queries against it, we experienced the following error:

PG::IndexCorrupted: ERROR: index "table_pkey" contains unexpected zero page at block 17021871 HINT: Please REINDEX it.

So, if I am reading this correctly, there is an index that 9.3.8 created, and it recognizes that index as OK, but 9.6.10 thinks that index is corrupt. Since this happened on one index, it seems reasonable to assume that this could happen on any index in that database, and we need to reindex the entire database. 

Now, this is a live application, and we would like to minimize continuous downtime (multiple short downtimes are fine), so what we are considering doing is a manual reindex: for each index, create a new one, and then drop the old one, and after all that, upgrade.

However, we are leery of doing this reindexing using 9.3.8, since it's already demonstrated itself to be unreliable. 

Which version should we use to reindex? Just the latest 9.3.X? Was there a known bug with older versions missing corrupted pages in indices? 

PS:  Just in case I am glaringly doing something wrong:

Here's our mechanism of doing the upgrade: we create a streaming replica from a basebackup, and then at some point turn off writes to the primary, convert the replica to another primary, and then run pg_upgrade on this new primary. 


Have you done this more than once? If so, is it the same indexes every time you're having issues with?

Answering these questions will help you narrow down if the corruption is in your original database or something to do with the upgrade.

However, I would also highly recommend getting to the latest version of 9.3 and rebuilding your replicas afterwards. The latest patch release was 9.3.25, so your current version is quite far behind. There were quite a number of data corruption issues related to replication in the 9.3 series. More-so in the earlier patch versions, but I highly recommend also trying this before your major upgrade.

Keith

We tried it a second time, and were unable to reproduce the issue, so it seems like the issue is non-deterministic, which is a little scary. In any case, we then proceeded to upgrade to 9.3.25 as you recommended, and then upgraded to the 9.6 series from it, and the issue didn't reproduce; we've been running on the new upgraded version for thel last two weeks and it looks ok so far. 

Thanks for your help!
Reply | Threaded
Open this post in threaded view
|

Re: Upgrading from Postgresql 9.3.8 to 9.6.10

Keith


On Thu, Apr 25, 2019 at 4:43 PM Yuri Niyazov <[hidden email]> wrote:
On Mon, Mar 25, 2019 at 11:44 PM Keith <[hidden email]> wrote:
On Tue, Mar 26, 2019 at 1:51 AM Yuri Niyazov <[hidden email]> wrote:
We have an old cluster, running 9.3.8, that we are trying to upgrade to 9.6.10.

After running pg_upgrade and starting the server, and testing some common queries against it, we experienced the following error:

PG::IndexCorrupted: ERROR: index "table_pkey" contains unexpected zero page at block 17021871 HINT: Please REINDEX it.

So, if I am reading this correctly, there is an index that 9.3.8 created, and it recognizes that index as OK, but 9.6.10 thinks that index is corrupt. Since this happened on one index, it seems reasonable to assume that this could happen on any index in that database, and we need to reindex the entire database. 

Now, this is a live application, and we would like to minimize continuous downtime (multiple short downtimes are fine), so what we are considering doing is a manual reindex: for each index, create a new one, and then drop the old one, and after all that, upgrade.

However, we are leery of doing this reindexing using 9.3.8, since it's already demonstrated itself to be unreliable. 

Which version should we use to reindex? Just the latest 9.3.X? Was there a known bug with older versions missing corrupted pages in indices? 

PS:  Just in case I am glaringly doing something wrong:

Here's our mechanism of doing the upgrade: we create a streaming replica from a basebackup, and then at some point turn off writes to the primary, convert the replica to another primary, and then run pg_upgrade on this new primary. 


Have you done this more than once? If so, is it the same indexes every time you're having issues with?

Answering these questions will help you narrow down if the corruption is in your original database or something to do with the upgrade.

However, I would also highly recommend getting to the latest version of 9.3 and rebuilding your replicas afterwards. The latest patch release was 9.3.25, so your current version is quite far behind. There were quite a number of data corruption issues related to replication in the 9.3 series. More-so in the earlier patch versions, but I highly recommend also trying this before your major upgrade.

Keith

We tried it a second time, and were unable to reproduce the issue, so it seems like the issue is non-deterministic, which is a little scary. In any case, we then proceeded to upgrade to 9.3.25 as you recommended, and then upgraded to the 9.6 series from it, and the issue didn't reproduce; we've been running on the new upgraded version for thel last two weeks and it looks ok so far. 

Thanks for your help!


Yuri,

Glad things are working, but I would highly recommend running a reindex on all tables in your cluster at some point, just as a precaution. It may not hurt to also run a pg_dump/restore of the database to ensure it doesn't find any corrupt data during that process as well. Don't mean that you have to rebuild your current system, just make a dump and restore it to a secondary system somewhere and make sure it all works ok.

Keith