BUG #15755: After upgrading to 9.6.12 using pg_upgrade, select query does not return the expected results.

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

BUG #15755: After upgrading to 9.6.12 using pg_upgrade, select query does not return the expected results.

PG Bug reporting form
The following bug has been logged on the website:

Bug reference:      15755
Logged by:          pg_upgrade fails intermittently from version 9.3.12 to 9.6.12
Email address:      [hidden email]
PostgreSQL version: 9.6.12
Operating system:   RHEL 7.5
Description:        

When we upgraded Postgres from 9.3.12 to 9.6.12 we observed that queries
like the following are no longer giving the expected results.
     select * from xyz.job_attr where jobid = 'foo';
Whereas when we modified the query to contain 'like' operator instead of
equals operator as given below it works perfectly fine.
    select * from xyz.job_attr where jobid  like '%foo';

We requested the pgsql mailing group for the help. Thanks to the respective
people for their prompt reply. Attaching the email communication regarding
the same towards the end of this ticket description for more details.

One of the workarounds suggested is to rebuild the indexes of the corrupted
table. We tried this workaround, reindex at table level did not help us
whereas reindex at the whole database level using the following command
solved our above problem.
broken_db=> REINDEX DATABASE broken_db;

We informed the pgsql group that the above workaround solved our issue and
then we are asked to create a bug for pg_upgrade in order to find out where
the problem is coming from and hence we logged the current bug.

Appreciate your help in letting us know the following.
1. It looks like ONLY indexes are corrupted in our case and reindexing on
whole database solved our issue. Just eager to know is there any possibility
that similar corruption can happen for other database objects like tables,
sequences apart from indexes?
2. Is it fine to proceed with this workaround as solution for these kind of
problems during the upgrade.

Snapshot of Email communication that we had
******************************************************

I can use reindexing as a work around. Thanks!


Great!

If this is not something expected and if you need any more information on
this issue please let me know.


No, this is not the expected behaviour, you can open a bug for pg_upgrade in
order to find out where the problem comes from.

Regards,

Juan José Santamaría Flech

Thanks for the response Juan. Please find the  requested details below:
All the steps are performed on the broken db after pg_upgrade
 
broken_db=# select * from xyz.job_attr where jobid = 'foo';
jobid | attr_name | attr_resource | attr_value | attr_flags
----------+-----------+---------------+------------+------------
(0 rows)
 
/* The query does not returns any error code and neither in the pg_log. But
it returns zero number of rows */
/* where as a query with trim returns 59 rows with same text match */
 
broken_db=# select count(*)  from xyz.job_attr where trim(jobid) = 'foo';
count
-------
    59
(1 row)
 
/* explain analyse results */
 
broken_db=# explain analyze select * from xyz.job_attr where jobid =
'foo';
                                                      QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on job_attr  (cost=4.73..19.47 rows=59 width=57) (actual
time=0.005..0.005 rows=0 loops=1)
   Recheck Cond: (jobid = 'foo'::text)
   ->  Bitmap Index Scan on job_attr_idx  (cost=0.00..4.72 rows=59 width=0)
(actual time=0.005..0.005 rows=0 loops=1)
         Index Cond: (jobid = 'foo'::text)
Planning time: 0.035 ms
Execution time: 0.025 ms
(6 rows)
 
broken_db=# explain analyze select * from xyz.job_attr where trim(jobid) =
'foo';
                                                QUERY PLAN                  
                          QUERY PLAN
-----------------------------------------------------------------------------------------------------
Seq Scan on job_attr  (cost=0.00..28.20 rows=5 width=57) (actual
time=0.117..0.148 rows=59 loops=1)
   Filter: (btrim(jobid) = 'foo'::text)
   Rows Removed by Filter: 888
Planning time: 0.026 ms
Execution time: 0.158 ms
(5 rows)
 
/* table definition */
 
broken_db=# \d xyz.job_attr;
        Table "xyz.job_attr"
    Column     |  Type   | Modifiers
---------------+---------+-----------
jobid      | text    | not null
attr_name     | text    | not null
attr_resource | text    |
attr_value    | text    |
attr_flags    | integer | not null
Indexes:
    "job_attr_idx" btree (jobid, attr_name, attr_resource)
 
/* However REINDEX is not successful neither against the table nor the index
*/
 
broken_db=# REINDEX TABLE xyz.job_attr;
ERROR:  syntax error at or near "QUERY"
LINE 1: QUERY PLANREINDEX TABLE xyz.job_attr;
 
broken_db=# REINDEX INDEX job_attr_idx;
ERROR:  relation "job_attr_idx" does not exist
 
/* then tried reindexing system tables, and it worked */
 
[root@sys workspace]# export PGOPTIONS="-P"
[root@sys workspace]# psql -d broken_db
psql (9.6.11)
Type "help" for help.
broken_db=# REINDEX DATABASE broken_db;
REINDEX
 
broken_db=# select count(*) from xyz.job_attr where jobid = 'foo';
count
-------
    59
(1 row)
 
 
I can use reindexing as a work around. Thanks!
 
If this is not something expected and if you need any more information on
this issue please let me know.
 
 
Best Regards,
Nithin.
 
From: Juan José Santamaría Flecha <[hidden email]>
Sent: 14 April 2019 01:04
To: Nithin Johnson <[hidden email]>
Cc: [hidden email]; Suresh Thelkar
<[hidden email]>; Subhasis Bhattacharya
<[hidden email]>
Subject: Re: SELECT query fails after pg_upgrade as the conditional operator
fails
 
The following query fails:
SELECT * from table where jobid = 'foo';
 
Whereas the following is passing:
SELECT * from table where jobid LIKE '%foo';
SELECT * from table where trim(jobid) = 'foo';
 
We were suspecting some leading invisible characters before 'foo'. but the
following queries went well:
SELECT left(jobid, 1) from table where jobid LIKE '%foo';
SELECT ascii(jobid) from table where jobid LIKE '%foo';
 
Please post the error code, also the plan for the different queries and the
description of the table.
 
Can you check if rebuilding the table's indexes solves the issue? If so,
send output of that operation.
 
Regards,
 
Juan José Santamaría Flecha

Reply | Threaded
Open this post in threaded view
|

Re: BUG #15755: After upgrading to 9.6.12 using pg_upgrade, select query does not return the expected results.

Tom Lane-2
PG Bug reporting form <[hidden email]> writes:
> When we upgraded Postgres from 9.3.12 to 9.6.12 we observed that queries
> like the following are no longer giving the expected results.
>      select * from xyz.job_attr where jobid = 'foo';
> [ but reindexing fixed it ]

What collation locale do you use in that database?  (psql \l's
"Collate" column would say.)  If it's not "C", the most likely
explanation is that you also upgraded the underlying OS, and glibc's
sort order for that locale changed from what it was before.  If so,
reindexing is the appropriate cure.  There isn't a lot we can do about
this inside Postgres, because we have no visibility into whether a
locale's behavior is different today than it was yesterday :-(

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: BUG #15755: After upgrading to 9.6.12 using pg_upgrade, select query does not return the expected results.

Suresh Thelkar

 

Hi Tom,

 

Thanks very much for your quick response. Given below are the details of Collate and Ctype.

 

                              List of databases

     Name      |  Owner  | Encoding  | Collate | Ctype |  Access privileges  

---------------+---------+-----------+---------+-------+---------------------

 bs_datastore |   bsdata | SQL_ASCII | C       | C     | 

 

 

We are actually doing upgrade using pg_upgrade from 9.3.12 to 9.6.12 on the same operating system.(We have not upgraded OS or glibc). Also the above data is same before and after pg_upgrade.

 

Encoding and locale we always pass during initdb. Since we are passing these settings explicitly I don’t think they are derived from OS as per our understanding. Please correct us if we are wrong.  

 

OS Details:

cat /etc/redhat-release 

Red Hat Enterprise Linux Server release 7.0 (Maipo)

 

Glibc version: 2.17

 

Please let me know in case if you need any further details.

 

Thanks,

Suresh

 

 

On 15/04/19, 9:45 PM, "Tom Lane" <[hidden email]> wrote:

 

    PG Bug reporting form <[hidden email]> writes:

    > When we upgraded Postgres from 9.3.12 to 9.6.12 we observed that queries

    > like the following are no longer giving the expected results.

    >      select * from xyz.job_attr where jobid = 'foo';

    > [ but reindexing fixed it ]

   

    What collation locale do you use in that database?  (psql \l's

    "Collate" column would say.)  If it's not "C", the most likely

    explanation is that you also upgraded the underlying OS, and glibc's

    sort order for that locale changed from what it was before.  If so,

    reindexing is the appropriate cure.  There isn't a lot we can do about

    this inside Postgres, because we have no visibility into whether a

    locale's behavior is different today than it was yesterday :-(

   

                                    regards, tom lane

   

Reply | Threaded
Open this post in threaded view
|

Re: BUG #15755: After upgrading to 9.6.12 using pg_upgrade, select query does not return the expected results.

David G Johnston
In reply to this post by PG Bug reporting form
On Mon, Apr 15, 2019 at 1:03 AM PG Bug reporting form <[hidden email]> wrote:
The following bug has been logged on the website:

Bug reference:      15755
Logged by:          pg_upgrade fails intermittently from version 9.3.12 to 9.6.12
Email address:      [hidden email]
PostgreSQL version: 9.6.12
Operating system:   RHEL 7.5
[...]
One of the workarounds suggested is to rebuild the indexes of the corrupted
table. We tried this workaround, reindex at table level did not help us

You didn't actually succeed in forming valid and executable REINDEX command so whether or not it would have worked at the table/index level is an unsolved question.  Given that it worked at the whole database level what I suspect happens is the whole DB command succeeded and when it got to the problem index it fixed the problem.  Had the original table/index commands been written correctly they would have also resulted in a fixed setup and the need to reindex the whole database would have been mitigated (though given the possibility of other indexes having problems a whole database reindex was probably a worthwhile exercise anyway).

Appreciate your help in letting us know the following.
1. It looks like ONLY indexes are corrupted in our case and reindexing on
whole database solved our issue. Just eager to know is there any possibility
that similar corruption can happen for other database objects like tables,
sequences apart from indexes?

Corruption is always a possibility though indexes, because they are ordered, are the most vulnerable.

/* However REINDEX is not successful neither against the table nor the index
*/

broken_db=# REINDEX TABLE xyz.job_attr;
ERROR:  syntax error at or near "QUERY"
LINE 1: QUERY PLANREINDEX TABLE xyz.job_attr;


You got a syntax error "at or near QUERY"; something you did entering the command (LINE 1: is the whole command the server saw) was problematic and the error has no bearing on whether "REINDEX TABLE xyz.job_attr" would have worked without the typo (it likely would have)
 
broken_db=# REINDEX INDEX job_attr_idx;
ERROR:  relation "job_attr_idx" does not exist

The index was never found, in the search_path, and so no reindexing was attempted.  Whether it would have worked had you correctly added the necessary schema to the identifier (or search_path) remains unknown (it likely would have).

Its not enough to say/know that something "didn't work" - understanding "why" it failed is necessary before drawing conclusions.

David J.
Reply | Threaded
Open this post in threaded view
|

Re: BUG #15755: After upgrading to 9.6.12 using pg_upgrade, select query does not return the expected results.

Suresh Thelkar

Thanks for the quick reply David. It looks like we tried the reindex commands at table or index level without schema identifier in a hurry. Thanks for catching that. We are in the process of trying to reproduce the issue and then try these reindex command at index and table level also and post you the results.

 

Thanks very much for your help on this.

 

Regards,

Suresh

 

From: "David G. Johnston" <[hidden email]>
Date: Tuesday, 16 April 2019 at 12:01 PM
To: Suresh Thelkar <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: BUG #15755: After upgrading to 9.6.12 using pg_upgrade, select query does not return the expected results.

 

On Mon, Apr 15, 2019 at 1:03 AM PG Bug reporting form <[hidden email]> wrote:

The following bug has been logged on the website:

Bug reference:      15755
Logged by:          pg_upgrade fails intermittently from version 9.3.12 to 9.6.12
Email address:      [hidden email]
PostgreSQL version: 9.6.12
Operating system:   RHEL 7.5
[...]
One of the workarounds suggested is to rebuild the indexes of the corrupted
table. We tried this workaround, reindex at table level did not help us

 

You didn't actually succeed in forming valid and executable REINDEX command so whether or not it would have worked at the table/index level is an unsolved question.  Given that it worked at the whole database level what I suspect happens is the whole DB command succeeded and when it got to the problem index it fixed the problem.  Had the original table/index commands been written correctly they would have also resulted in a fixed setup and the need to reindex the whole database would have been mitigated (though given the possibility of other indexes having problems a whole database reindex was probably a worthwhile exercise anyway).

 

Appreciate your help in letting us know the following.
1. It looks like ONLY indexes are corrupted in our case and reindexing on
whole database solved our issue. Just eager to know is there any possibility
that similar corruption can happen for other database objects like tables,
sequences apart from indexes?

 

Corruption is always a possibility though indexes, because they are ordered, are the most vulnerable.

 

/* However REINDEX is not successful neither against the table nor the index
*/

broken_db=# REINDEX TABLE xyz.job_attr;
ERROR:  syntax error at or near "QUERY"
LINE 1: QUERY PLANREINDEX TABLE xyz.job_attr;

 

You got a syntax error "at or near QUERY"; something you did entering the command (LINE 1: is the whole command the server saw) was problematic and the error has no bearing on whether "REINDEX TABLE xyz.job_attr" would have worked without the typo (it likely would have)

 

broken_db=# REINDEX INDEX job_attr_idx;
ERROR:  relation "job_attr_idx" does not exist

 

The index was never found, in the search_path, and so no reindexing was attempted.  Whether it would have worked had you correctly added the necessary schema to the identifier (or search_path) remains unknown (it likely would have).

 

Its not enough to say/know that something "didn't work" - understanding "why" it failed is necessary before drawing conclusions.

 

David J.

Reply | Threaded
Open this post in threaded view
|

Re: BUG #15755: After upgrading to 9.6.12 using pg_upgrade, select query does not return the expected results.

Suresh Thelkar

Hello David,

 

As said earlier we are able to reproduce the pg_upgrade issue again. We actually have several indexes in our datastore. Surprisingly it seems only the following index(job_attr_idx) got corrupted. And I just applied reindex to this particular index and it solved whole of our problem. Please look into the following for more details.

 

Details of the table where we see the corruption

pbs_datastore=# \d pbs.job_attr;

        Table "pbs.job_attr"

    Column     |  Type   | Modifiers

---------------+---------+-----------

ji_jobid      | text    | not null

attr_name     | text    | not null

attr_resource | text    |

 attr_value    | text    |

 attr_flags    | integer | not null

Indexes:

    "job_attr_idx" btree (ji_jobid, attr_name, attr_resource)

 

Before applying reindex:

pbs_datastore=# select ji_jobid, attr_name from pbs.job_attr where ji_jobid='1.testdev-08-r7';

ji_jobid | attr_name | attr_resource | attr_value | attr_flags

----------+-----------+---------------+------------+------------

(0 rows)

 

pbs_datastore=# select ji_jobid, attr_name from pbs.job_attr  where ji_jobid like '%1.testdev-08-r7';

     ji_jobid     |       attr_name      

------------------+-----------------------

1.testdev-08-r7  | job_state

1.testdev-08-r7  | mtime

1.testdev-08-r7  | substate

1.testdev-08-r7  | Job_Name

1.testdev-08-r7  | Job_Owner

<similar data follows>

Here the data is present in the table but the query works with only LIKE operator. EQUALS operator does not work.

 

Apply reindex

pbs_datastore=# reindex index pbs.job_attr_idx;

REINDEX

 

After the above reindex

pbs_datastore=# select ji_jobid, attr_name from pbs.job_attr where ji_jobid='1.testdev-08-r7';

    ji_jobid     |       attr_name      

-----------------+-----------------------

1.testdev-08-r7 | Checkpoint

1.testdev-08-r7 | Error_Path

1.testdev-08-r7 | Exit_status

1.testdev-08-r7 | Hold_Types

1.testdev-08-r7 | Job_Name

1.testdev-08-r7 | Job_Owner

<similar data follows>

 

Here EQUAS operator work.

 

Summary: It looks like only one index got corrupted and reindex on the particular index indeed solved our issue.  We tried to reproduce this issue 4-5 times and all the time only one index i.e. pbs.job_attr_idx: got corrupted.  So given this circumstance can we rely on reindex of corrupted index or at whole database level as a SOLUTION to the pg_upgrade that we are seeing.

 

It appears that pg_upgrade corrupted an index of an otherwise healthy source database during the upgrade process. Seems like pg_upgrade bug to me and could we know if it corrupts anything else ? or in other words what should be fixed in pg_upgrade.

 

Thanks in advance,

Suresh

 

From: Suresh Thelkar <[hidden email]>
Date: Tuesday, 16 April 2019 at 4:33 PM
To: "David G. Johnston" <[hidden email]>, "[hidden email]" <[hidden email]>
Cc: subhasis bhattacharya <[hidden email]>, Nithin Johnson <[hidden email]>
Subject: Re: BUG #15755: After upgrading to 9.6.12 using pg_upgrade, select query does not return the expected results.

 

Thanks for the quick reply David. It looks like we tried the reindex commands at table or index level without schema identifier in a hurry. Thanks for catching that. We are in the process of trying to reproduce the issue and then try these reindex command at index and table level also and post you the results.

 

Thanks very much for your help on this.

 

Regards,

Suresh

 

From: "David G. Johnston" <[hidden email]>
Date: Tuesday, 16 April 2019 at 12:01 PM
To: Suresh Thelkar <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: BUG #15755: After upgrading to 9.6.12 using pg_upgrade, select query does not return the expected results.

 

On Mon, Apr 15, 2019 at 1:03 AM PG Bug reporting form <[hidden email]> wrote:

The following bug has been logged on the website:

Bug reference:      15755
Logged by:          pg_upgrade fails intermittently from version 9.3.12 to 9.6.12
Email address:      [hidden email]
PostgreSQL version: 9.6.12
Operating system:   RHEL 7.5
[...]
One of the workarounds suggested is to rebuild the indexes of the corrupted
table. We tried this workaround, reindex at table level did not help us

 

You didn't actually succeed in forming valid and executable REINDEX command so whether or not it would have worked at the table/index level is an unsolved question.  Given that it worked at the whole database level what I suspect happens is the whole DB command succeeded and when it got to the problem index it fixed the problem.  Had the original table/index commands been written correctly they would have also resulted in a fixed setup and the need to reindex the whole database would have been mitigated (though given the possibility of other indexes having problems a whole database reindex was probably a worthwhile exercise anyway).

 

Appreciate your help in letting us know the following.
1. It looks like ONLY indexes are corrupted in our case and reindexing on
whole database solved our issue. Just eager to know is there any possibility
that similar corruption can happen for other database objects like tables,
sequences apart from indexes?

 

Corruption is always a possibility though indexes, because they are ordered, are the most vulnerable.

 

/* However REINDEX is not successful neither against the table nor the index
*/

broken_db=# REINDEX TABLE xyz.job_attr;
ERROR:  syntax error at or near "QUERY"
LINE 1: QUERY PLANREINDEX TABLE xyz.job_attr;

 

You got a syntax error "at or near QUERY"; something you did entering the command (LINE 1: is the whole command the server saw) was problematic and the error has no bearing on whether "REINDEX TABLE xyz.job_attr" would have worked without the typo (it likely would have)

 

broken_db=# REINDEX INDEX job_attr_idx;
ERROR:  relation "job_attr_idx" does not exist

 

The index was never found, in the search_path, and so no reindexing was attempted.  Whether it would have worked had you correctly added the necessary schema to the identifier (or search_path) remains unknown (it likely would have).

 

Its not enough to say/know that something "didn't work" - understanding "why" it failed is necessary before drawing conclusions.

 

David J.

Reply | Threaded
Open this post in threaded view
|

Re: BUG #15755: After upgrading to 9.6.12 using pg_upgrade, select query does not return the expected results.

Bruce Momjian
On Wed, Apr 17, 2019 at 12:32:16PM +0000, Suresh Thelkar wrote:

> Summary: It looks like only one index got corrupted and reindex on the
> particular index indeed solved our issue.  We tried to reproduce this issue 4-5
> times and all the time only one index i.e. pbs.job_attr_idx: got corrupted.  So
> given this circumstance can we rely on reindex of corrupted index or at whole
> database level as a SOLUTION to the pg_upgrade that we are seeing.
>
>  
>
> It appears that pg_upgrade corrupted an index of an otherwise healthy source
> database during the upgrade process. Seems like pg_upgrade bug to me and could
> we know if it corrupts anything else ? or in other words what should be fixed
> in pg_upgrade.

My wild guess is that the index is corrupted on 9.3.12 but only detected
as corrupted on 9.6.12.  Can you try running reindex before doing the
upgrade to see if it works after the upgrade?  If that is wrong, it must
be something else since pg_upgrade never modifies files when
copying/linking them between versions.

--
  Bruce Momjian  <[hidden email]>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +