Obvious data mismatch in View2 which basically SELECT * from View1

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Ben
Reply | Threaded
Open this post in threaded view
|

Obvious data mismatch in View2 which basically SELECT * from View1

Ben
Dear list,

Recently I am getting feedback, data in my analytic report is not
repeatable. From time to time they get different data for the same time
span.
(but IIRC previously it was OK). Therefore I started debuging the View
chain for that report, during which I bumped into this issue/phenomenon.

In a over -simplified version:

CREATE VIEW2 AS SELECT * FROM VIEW1;
SELECT  col1 FROM VIEW2 WHERE cond1=True;
SELECT  col1 FROM VIEW1 WHERE cond1=True;

Now col1 from both views looks different. I don't know where to start to
solve this problem.

The actual situation is a bit more than that, the following is the
actual query:


     -- trying to audit utlog weighed stat
     with t as (
     select '2020-07-01 00:00:00'::timestamp t0, '2020--07-02
0:0:0'::timestamp t1
     )
     --select * from t;
     select *
     -- from utlog.cache_stats_per_shift_per_reason_weighed_stats
     -- from utlog.stats_per_shift_filtered_per_reason
     from utlog.stats_per_shift_filtered                     (let's call
it #View2 for short)
     -- from utlog.stats_per_shift_filtered_b0206      (let's call it
#View1 for short)
     -- from utlog.stats_per_shift
     cross join t
     where wline = 'F02'  and wts >= t.t0 and wts < t.t1 and wsft ='D'
     limit 100
     ;

The Result for #View2

     wts                 | wsft | wspan  | wstate | wline | rcodes
     --------------------+------+--------+--------+-------+-------
     2020-07-01 08:00:00 | D    |      0 | S00    | F02   | {PDCB}
     2020-07-01 09:50:01 | D    | 12.533 | S00    | F02   | {PDCB}
     2020-07-01 11:35:46 | D    | 12.217 | S00    | F02   | {CDSO}
     2020-07-01 13:22:58 | D    |   5.15 | S00    | F02   | {PDCB}
     2020-07-01 14:57:38 | D    |    6.8 | S00    | F02   | {PDCB}

     INDEX | COLUMN_NAME | DATA_TYPE
     ------+-------------+------------
     1     | wts         | timestamptz
     3     | wsft        | varchar
     4     | wspan       | float8
     5     | wstate      | varchar
     6     | wline       | varchar
     7     | rcodes      | text[]


Same query, the Result for #View1

     wts                 | wsft | wspan | wstate | wline | rcodes
     --------------------+------+-------+--------+-------+-------
     2020-07-01 08:00:00 | D    |     5 | S00    | F02   | {PDCB}
     2020-07-01 09:50:01 | D    |    13 | S00    | F02   | {PDCB}
     2020-07-01 11:35:46 | D    |    12 | S00    | F02   | {CDSO}
     2020-07-01 13:22:58 | D    |     5 | S00    | F02   | {PDCB}
     2020-07-01 14:57:38 | D    |     7 | S00    | F02   | {PDCB}

     INDEX | COLUMN_NAME | DATA_TYPE
     ------+-------------+------------
     1     | wts         | timestamptz
     3     | wsft        | varchar
     4     | wspan       | float8
     5     | wstate      | varchar
     6     | wline       | varchar
     7     | rcodes      | varchar[]

Reuslts in `wspan` column is inaccurate while both type are float8. Most
weird thing is the 5 to 0 change. for Row 1.

The `_b0206`(#View1) is just a version of
`stats_per_shift_filtered`(#View2) from past revisions.
I am sure the original CREATE statement for (#View2) is `CREATE VIEW ...
AS SELECT * FROM ...._b0206`

Definition of View2 in SQLWorkbench/J generated schema:


     CREATE OR REPLACE VIEW utlog.stats_per_shift_filtered (#View2)
     (
     wts,
     wdate,
     wsft,
     wspan,
     wstate,
     wline,
     rcodes
     )
     AS
     SELECT stats_per_shift_filtered_u0206.wts,
         stats_per_shift_filtered_u0206.wsft::character varying AS wsft,
         stats_per_shift_filtered_u0206.wspan,
         stats_per_shift_filtered_u0206.wstate,
         stats_per_shift_filtered_u0206.wline,
         stats_per_shift_filtered_u0206.rcodes
     FROM utlog.stats_per_shift_filtered_u0206;  (as #View1 in this post)


It feels like the utlog.stats_per_shift_filtered_u0206 in
utlog.stats_per_shift_filtered definition is a different object from
utlog.stats_per_shift_filtered_u0206?

I am totally out of clues. Any help would be appreciated. Thanks.


Regards,

Ben




Reply | Threaded
Open this post in threaded view
|

Re: Obvious data mismatch in View2 which basically SELECT * from View1

Magnus Hagander-2


On Wed, Sep 16, 2020 at 9:26 AM Ben <[hidden email]> wrote:
Dear list,

Recently I am getting feedback, data in my analytic report is not
repeatable. From time to time they get different data for the same time
span.
(but IIRC previously it was OK). Therefore I started debuging the View
chain for that report, during which I bumped into this issue/phenomenon.

In a over -simplified version:

CREATE VIEW2 AS SELECT * FROM VIEW1;
SELECT  col1 FROM VIEW2 WHERE cond1=True;
SELECT  col1 FROM VIEW1 WHERE cond1=True;

Now col1 from both views looks different. I don't know where to start to
solve this problem.

The actual situation is a bit more than that, the following is the
actual query:


     -- trying to audit utlog weighed stat
     with t as (
     select '2020-07-01 00:00:00'::timestamp t0, '2020--07-02
0:0:0'::timestamp t1
     )
     --select * from t;
     select *
     -- from utlog.cache_stats_per_shift_per_reason_weighed_stats
     -- from utlog.stats_per_shift_filtered_per_reason
     from utlog.stats_per_shift_filtered                     (let's call
it #View2 for short)
     -- from utlog.stats_per_shift_filtered_b0206      (let's call it
#View1 for short)
     -- from utlog.stats_per_shift
     cross join t
     where wline = 'F02'  and wts >= t.t0 and wts < t.t1 and wsft ='D'
     limit 100
     ;


Not sure if it might be something lost in your simplification here, but you have a LIMIT with no ORDER BY there. That basically means "give me 100 random rows" (but not with a very good random level). It does not return rows in a consistent/predictable order. So as long as that query is part of what you're doing, you should not be surprised if you get the rows in an inconsistent/unpredictable order, with whatever follow-on effects that might have. (And it can lead to weird follow-on effects like the ones you're talking about when used in larger query structures)

--
Reply | Threaded
Open this post in threaded view
|

Re: Obvious data mismatch in View2 which basically SELECT * from View1

Ron-2
In reply to this post by Ben
On 9/15/20 10:40 PM, Ben wrote:

> Dear list,
>
> Recently I am getting feedback, data in my analytic report is not
> repeatable. From time to time they get different data for the same time span.
> (but IIRC previously it was OK). Therefore I started debuging the View
> chain for that report, during which I bumped into this issue/phenomenon.
>
> In a over -simplified version:
>
> CREATE VIEW2 AS SELECT * FROM VIEW1;
> SELECT  col1 FROM VIEW2 WHERE cond1=True;
> SELECT  col1 FROM VIEW1 WHERE cond1=True;
>
> Now col1 from both views looks different. I don't know where to start to
> solve this problem.
>
> The actual situation is a bit more than that, the following is the actual
> query:
>
>
>     -- trying to audit utlog weighed stat
>     with t as (
>     select '2020-07-01 00:00:00'::timestamp t0, '2020--07-02
> 0:0:0'::timestamp t1
>     )
>     --select * from t;
>     select *
>     -- from utlog.cache_stats_per_shift_per_reason_weighed_stats
>     -- from utlog.stats_per_shift_filtered_per_reason
>     from utlog.stats_per_shift_filtered                     (let's call it
> #View2 for short)
>     -- from utlog.stats_per_shift_filtered_b0206      (let's call it
> #View1 for short)
>     -- from utlog.stats_per_shift
>     cross join t
>     where wline = 'F02'  and wts >= t.t0 and wts < t.t1 and wsft ='D'
>     limit 100
>     ;
>
> The Result for #View2
>
>     wts                 | wsft | wspan  | wstate | wline | rcodes
>     --------------------+------+--------+--------+-------+-------
>     2020-07-01 08:00:00 | D    |      0 | S00    | F02   | {PDCB}
>     2020-07-01 09:50:01 | D    | 12.533 | S00    | F02   | {PDCB}
>     2020-07-01 11:35:46 | D    | 12.217 | S00    | F02   | {CDSO}
>     2020-07-01 13:22:58 | D    |   5.15 | S00    | F02   | {PDCB}
>     2020-07-01 14:57:38 | D    |    6.8 | S00    | F02   | {PDCB}
>
>     INDEX | COLUMN_NAME | DATA_TYPE
>     ------+-------------+------------
>     1     | wts         | timestamptz
>     3     | wsft        | varchar
>     4     | wspan       | float8
>     5     | wstate      | varchar
>     6     | wline       | varchar
>     7     | rcodes      | text[]
>
>
> Same query, the Result for #View1
>
>     wts                 | wsft | wspan | wstate | wline | rcodes
>     --------------------+------+-------+--------+-------+-------
>     2020-07-01 08:00:00 | D    |     5 | S00    | F02   | {PDCB}
>     2020-07-01 09:50:01 | D    |    13 | S00    | F02   | {PDCB}
>     2020-07-01 11:35:46 | D    |    12 | S00    | F02   | {CDSO}
>     2020-07-01 13:22:58 | D    |     5 | S00    | F02   | {PDCB}
>     2020-07-01 14:57:38 | D    |     7 | S00    | F02   | {PDCB}
>
>     INDEX | COLUMN_NAME | DATA_TYPE
>     ------+-------------+------------
>     1     | wts         | timestamptz
>     3     | wsft        | varchar
>     4     | wspan       | float8
>     5     | wstate      | varchar
>     6     | wline       | varchar
>     7     | rcodes      | varchar[]
>
> Reuslts in `wspan` column is inaccurate while both type are float8. Most
> weird thing is the 5 to 0 change. for Row 1.
>
> The `_b0206`(#View1) is just a version of
> `stats_per_shift_filtered`(#View2) from past revisions.
> I am sure the original CREATE statement for (#View2) is `CREATE VIEW ...
> AS SELECT * FROM ...._b0206`
>
> Definition of View2 in SQLWorkbench/J generated schema:
>
>
>     CREATE OR REPLACE VIEW utlog.stats_per_shift_filtered (#View2)
>     (
>     wts,
>     wdate,
>     wsft,
>     wspan,
>     wstate,
>     wline,
>     rcodes
>     )
>     AS
>     SELECT stats_per_shift_filtered_u0206.wts,
>         stats_per_shift_filtered_u0206.wsft::character varying AS wsft,
>         stats_per_shift_filtered_u0206.wspan,
>         stats_per_shift_filtered_u0206.wstate,
>         stats_per_shift_filtered_u0206.wline,
>         stats_per_shift_filtered_u0206.rcodes
>     FROM utlog.stats_per_shift_filtered_u0206;  (as #View1 in this post)
>
>
> It feels like the utlog.stats_per_shift_filtered_u0206 in
> utlog.stats_per_shift_filtered definition is a different object from
> utlog.stats_per_shift_filtered_u0206?
>
> I am totally out of clues. Any help would be appreciated. Thanks.

Try the queries in a serializable read only transaction.  That should any
possible changes in the underlying data.

START TRANSACTION ISOLATION LEVEL SERIALIZABLE READ ONLY;
SELECT  col1 FROM VIEW2 WHERE cond1=True;
SELECT  col1 FROM VIEW1 WHERE cond1=True;
COMMIT;

--
Angular momentum makes the world go 'round.


Ben
Reply | Threaded
Open this post in threaded view
|

Re: Obvious data mismatch in View2 which basically SELECT * from View1

Ben
In reply to this post by Magnus Hagander-2

Hi Magnus,

Thanks for the heads up.

For the data posted in this email, the listed result is the full result set for that query.
I have checked the data in view2 and view1 with various conditions.
Both view has many difference in column wspan::float8.
The condition in the shown query is intended to show just a small set of them.
But you are right, I should be more cautious. Thanks for the headsup.

Regards,
Ben


On 9/16/20 3:35 PM, Magnus Hagander wrote:


On Wed, Sep 16, 2020 at 9:26 AM Ben <[hidden email]> wrote:
Dear list,

Recently I am getting feedback, data in my analytic report is not
repeatable. From time to time they get different data for the same time
span.
(but IIRC previously it was OK). Therefore I started debuging the View
chain for that report, during which I bumped into this issue/phenomenon.

In a over -simplified version:

CREATE VIEW2 AS SELECT * FROM VIEW1;
SELECT  col1 FROM VIEW2 WHERE cond1=True;
SELECT  col1 FROM VIEW1 WHERE cond1=True;

Now col1 from both views looks different. I don't know where to start to
solve this problem.

The actual situation is a bit more than that, the following is the
actual query:


     -- trying to audit utlog weighed stat
     with t as (
     select '2020-07-01 00:00:00'::timestamp t0, '2020--07-02
0:0:0'::timestamp t1
     )
     --select * from t;
     select *
     -- from utlog.cache_stats_per_shift_per_reason_weighed_stats
     -- from utlog.stats_per_shift_filtered_per_reason
     from utlog.stats_per_shift_filtered                     (let's call
it #View2 for short)
     -- from utlog.stats_per_shift_filtered_b0206      (let's call it
#View1 for short)
     -- from utlog.stats_per_shift
     cross join t
     where wline = 'F02'  and wts >= t.t0 and wts < t.t1 and wsft ='D'
     limit 100
     ;


Not sure if it might be something lost in your simplification here, but you have a LIMIT with no ORDER BY there. That basically means "give me 100 random rows" (but not with a very good random level). It does not return rows in a consistent/predictable order. So as long as that query is part of what you're doing, you should not be surprised if you get the rows in an inconsistent/unpredictable order, with whatever follow-on effects that might have. (And it can lead to weird follow-on effects like the ones you're talking about when used in larger query structures)

--
Ben
Reply | Threaded
Open this post in threaded view
|

Re: Obvious data mismatch in View2 which basically SELECT * from View1

Ben
In reply to this post by Ben
Dear List,

Some further investigation.

Creating a fresh View3 on View1 gives exactly the same result as View1.

The View1 View2 are both years old in a production database, in use for quite some time. (The database is production duty but not hosted in server room with UPS. It's like a edge PC in industry monitoring. Now am more concerned with its data integrity)

The problem with the final report is reported recently. I am not sure what's broken in the database.

I haven't replaced the broken View2 yet. Hope someone can point me to some further investigation.

My concern is that if there are other views inside that database having similar integrity issue, how can I find them all (if any).

It's beyond my regular SQL ability. I guess I really need help from people with maintenance experience.

Any help will be appreciated, thanks in advance.

Ben



On September 16, 2020 3:40:34 AM UTC, Ben <[hidden email]> wrote:
Dear list,

Recently I am getting feedback, data in my analytic report is not
repeatable. From time to time they get different data for the same time
span.
(but IIRC previously it was OK). Therefore I started debuging the View
chain for that report, during which I bumped into this issue/phenomenon.

In a over -simplified version:

CREATE VIEW2 AS SELECT * FROM VIEW1;
SELECT  col1 FROM VIEW2 WHERE cond1=True;
SELECT  col1 FROM VIEW1 WHERE cond1=True;

Now col1 from both views looks different. I don't know where to start to
solve this problem.

The actual situation is a bit more than that, the following is the
actual query:


    -- trying to audit utlog weighed stat
    with t as (
    select '2020-07-01 00:00:00'::timestamp t0, '2020--07-02
0:0:0'::timestamp t1
    )
    --select * from t;
    select *
    -- from utlog.cache_stats_per_shift_per_reason_weighed_stats
    -- from utlog.stats_per_shift_filtered_per_reason
    from utlog.stats_per_shift_filtered                     (let's call
it #View2 for short)
    -- from utlog.stats_per_shift_filtered_b0206      (let's call it
#View1 for short)
    -- from utlog.stats_per_shift
    cross join t
    where wline = 'F02'  and wts >= t.t0 and wts < t.t1 and wsft ='D'
    limit 100
    ;

The Result for #View2

    wts                 | wsft | wspan  | wstate | wline | rcodes
    --------------------+------+--------+--------+-------+-------
    2020-07-01 08:00:00 | D    |      0 | S00    | F02   | {PDCB}
    2020-07-01 09:50:01 | D    | 12.533 | S00    | F02   | {PDCB}
    2020-07-01 11:35:46 | D    | 12.217 | S00    | F02   | {CDSO}
    2020-07-01 13:22:58 | D    |   5.15 | S00    | F02   | {PDCB}
    2020-07-01 14:57:38 | D    |    6.8 | S00    | F02   | {PDCB}

    INDEX | COLUMN_NAME | DATA_TYPE
    ------+-------------+------------
    1     | wts         | timestamptz
    3     | wsft        | varchar
    4     | wspan       | float8
    5     | wstate      | varchar
    6     | wline       | varchar
    7     | rcodes      | text[]


Same query, the Result for #View1

    wts                 | wsft | wspan | wstate | wline | rcodes
    --------------------+------+-------+--------+-------+-------
    2020-07-01 08:00:00 | D    |     5 | S00    | F02   | {PDCB}
    2020-07-01 09:50:01 | D    |    13 | S00    | F02   | {PDCB}
    2020-07-01 11:35:46 | D    |    12 | S00    | F02   | {CDSO}
    2020-07-01 13:22:58 | D    |     5 | S00    | F02   | {PDCB}
    2020-07-01 14:57:38 | D    |     7 | S00    | F02   | {PDCB}

    INDEX | COLUMN_NAME | DATA_TYPE
    ------+-------------+------------
    1     | wts         | timestamptz
    3     | wsft        | varchar
    4     | wspan       | float8
    5     | wstate      | varchar
    6     | wline       | varchar
    7     | rcodes      | varchar[]

Reuslts in `wspan` column is inaccurate while both type are float8. Most
weird thing is the 5 to 0 change. for Row 1.

The `_b0206`(#View1) is just a version of
`stats_per_shift_filtered`(#View2) from past revisions.
I am sure the original CREATE statement for (#View2) is `CREATE VIEW ...
AS SELECT * FROM ...._b0206`

Definition of View2 in SQLWorkbench/J generated schema:


    CREATE OR REPLACE VIEW utlog.stats_per_shift_filtered (#View2)
    (
    wts,
    wdate,
    wsft,
    wspan,
    wstate,
    wline,
    rcodes
    )
    AS
    SELECT stats_per_shift_filtered_u0206.wts,
        stats_per_shift_filtered_u0206.wsft::character varying AS wsft,
        stats_per_shift_filtered_u0206.wspan,
        stats_per_shift_filtered_u0206.wstate,
        stats_per_shift_filtered_u0206.wline,
        stats_per_shift_filtered_u0206.rcodes
    FROM utlog.stats_per_shift_filtered_u0206;  (as #View1 in this post)


It feels like the utlog.stats_per_shift_filtered_u0206 in
utlog.stats_per_shift_filtered definition is a different object from
utlog.stats_per_shift_filtered_u0206?

I am totally out of clues. Any help would be appreciated. Thanks.


Regards,

Ben



--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
Reply | Threaded
Open this post in threaded view
|

Re: Obvious data mismatch in View2 which basically SELECT * from View1

Ravi Krishna-6
In reply to this post by Ben


>I haven't replaced the broken View2 yet. >Hope someone can point me to some >further investigation.

Did you look at the actual definition of view2. Like all RDBMS PG materializes the Ddl as it existed at the time of creation and converts it into an in line sql.
There is a possibility it may be different than what you think.

Reply | Threaded
Open this post in threaded view
|

Re: Obvious data mismatch in View2 which basically SELECT * from View1

Jerry Sievers-3
In reply to this post by Ben
Ben <[hidden email]> writes:

> Dear List,
>
> Some further investigation.
>
> Creating a fresh View3 on View1 gives exactly the same result as
> View1.
>
> The View1 View2 are both years old in a production database, in use
> for quite some time. (The database is production duty but not hosted
> in server room with UPS. It's like a edge PC in industry monitoring.
> Now am more concerned with its data integrity)
>
> The problem with the final report is reported recently. I am not sure
> what's broken in the database.
>
> I haven't replaced the broken View2 yet. Hope someone can point me to
> some further investigation.

Already mentioned downthread, but have a look at the view definitions
by...

select pg_get_viewdef('$your-view');

Or...

pg_dump --table $your-view

Expect to see something different if you repeat the above for the old
view giving undesired results and the new correct version.

HTH



> My concern is that if there are other views inside that database
> having similar integrity issue, how can I find them all (if any).
>
> It's beyond my regular SQL ability. I guess I really need help from
> people with maintenance experience.
>
> Any help will be appreciated, thanks in advance.
>
> Ben
>
>
>
> On September 16, 2020 3:40:34 AM UTC, Ben <[hidden email]>
> wrote:
>
>     Dear list,
>    
>     Recently I am getting feedback, data in my analytic report is not
>     repeatable. From time to time they get different data for the same time
>     span.
>     (but IIRC previously it was OK). Therefore I started debuging the View
>     chain for that report, during which I bumped into this issue/phenomenon.
>    
>     In a over -simplified version:
>    
>     CREATE VIEW2 AS SELECT * FROM VIEW1;
>     SELECT  col1 FROM VIEW2 WHERE cond1=True;
>     SELECT  col1 FROM VIEW1 WHERE cond1=True;
>    
>     Now col1 from both views looks different. I don't know where to start to
>     solve this problem.
>    
>     The actual situation is a bit more than that, the following is the
>     actual query:
>    
>    
>          -- trying to audit utlog weighed stat
>          with t as (
>          select '2020-07-01 00:00:00'::timestamp t0, '2020--07-02
>     0:0:0'::timestamp t1
>          )
>          --select * from t;
>          select *
>          -- from utlog.cache_stats_per_shift_per_reason_weighed_stats
>          -- from utlog.stats_per_shift_filtered_per_reason
>          from utlog.stats_per_shift_filtered                     (let's call
>     it #View2 for short)
>          -- from utlog.stats_per_shift_filtered_b0206      (let's call it
>     #View1 for short)
>          -- from utlog.stats_per_shift
>          cross join t
>          where wline = 'F02'  and wts >= t.t0 and wts < t.t1 and wsft ='D'
>          limit 100
>          ;
>    
>     The Result for #View2
>    
>          wts                 | wsft | wspan  | wstate | wline | rcodes
>          --------------------+------+--------+--------+-------+-------
>          2020-07-01 08:00:00 | D    |      0 | S00    | F02   | {PDCB}
>          2020-07-01 09:50:01 | D    | 12.533 | S00    | F02   | {PDCB}
>          2020-07-01 11:35:46 | D    | 12.217 | S00    | F02   | {CDSO}
>          2020-07-01 13:22:58 | D    |   5.15 | S00    | F02   | {PDCB}
>          2020-07-01 14:57:38 | D    |    6.8 | S00    | F02   | {PDCB}
>    
>          INDEX | COLUMN_NAME | DATA_TYPE
>          ------+-------------+------------
>          1     | wts         | timestamptz
>          3     | wsft        | varchar
>          4     | wspan       | float8
>          5     | wstate      | varchar
>          6     | wline       | varchar
>          7     | rcodes      | text[]
>    
>    
>     Same query, the Result for #View1
>    
>          wts                 | wsft | wspan | wstate | wline | rcodes
>          --------------------+------+-------+--------+-------+-------
>          2020-07-01 08:00:00 | D    |     5 | S00    | F02   | {PDCB}
>          2020-07-01 09:50:01 | D    |    13 | S00    | F02   | {PDCB}
>          2020-07-01 11:35:46 | D    |    12 | S00    | F02   | {CDSO}
>          2020-07-01 13:22:58 | D    |     5 | S00    | F02   | {PDCB}
>          2020-07-01 14:57:38 | D    |     7 | S00    | F02   | {PDCB}
>    
>          INDEX | COLUMN_NAME | DATA_TYPE
>          ------+-------------+------------
>          1     | wts         | timestamptz
>          3     | wsft        | varchar
>          4     | wspan       | float8
>          5     | wstate      | varchar
>          6     | wline       | varchar
>          7     | rcodes      | varchar[]
>    
>     Reuslts in `wspan` column is inaccurate while both type are float8. Most
>     weird thing is the 5 to 0 change. for Row 1.
>    
>     The `_b0206`(#View1) is just a version of
>     `stats_per_shift_filtered`(#View2) from past revisions.
>     I am sure the original CREATE statement for (#View2) is `CREATE VIEW ...
>     AS SELECT * FROM ...._b0206`
>    
>     Definition of View2 in SQLWorkbench/J generated schema:
>    
>    
>          CREATE OR REPLACE VIEW utlog.stats_per_shift_filtered (#View2)
>          (
>          wts,
>          wdate,
>          wsft,
>          wspan,
>          wstate,
>          wline,
>          rcodes
>          )
>          AS
>          SELECT stats_per_shift_filtered_u0206.wts,
>              stats_per_shift_filtered_u0206.wsft::character varying AS wsft,
>              stats_per_shift_filtered_u0206.wspan,
>              stats_per_shift_filtered_u0206.wstate,
>              stats_per_shift_filtered_u0206.wline,
>              stats_per_shift_filtered_u0206.rcodes
>          FROM utlog.stats_per_shift_filtered_u0206;  (as #View1 in this post)
>    
>    
>     It feels like the utlog.stats_per_shift_filtered_u0206 in
>     utlog.stats_per_shift_filtered definition is a different object from
>     utlog.stats_per_shift_filtered_u0206?
>    
>     I am totally out of clues. Any help would be appreciated. Thanks.
>    
>    
>     Regards,
>    
>     Ben
>    
>    
>    
>
> --
> Sent from my Android device with K-9 Mail. Please excuse my brevity.
>
>

--
Jerry Sievers
Postgres DBA/Development Consulting
e: [hidden email]


Ben
Reply | Threaded
Open this post in threaded view
|

Re: Obvious data mismatch in View2 which basically SELECT * from View1

Ben
Hi Sievers,  Krishna


You are right none of them is what I originally used to create them
which should be
`select * from utlog.stats_per_shift_filtered_b0206`, but they do look
different,
maybe the prev version is created before a pg update ? b0206 could be
20190206 or 20180206...

the output:


lets_db=#     select pg_get_viewdef('utlog.stats_per_shift_filtered');
                             pg_get_viewdef
-----------------------------------------------------------------------
   SELECT stats_per_shift_filtered_u0206.wts,                          +
stats_per_shift_filtered_u0206.wdate,                            +
      (stats_per_shift_filtered_u0206.wsft)::character varying AS wsft,+
stats_per_shift_filtered_u0206.wspan,                            +
stats_per_shift_filtered_u0206.wstate,                           +
stats_per_shift_filtered_u0206.wline,                            +
stats_per_shift_filtered_u0206.rcodes                            +
     FROM utlog.stats_per_shift_filtered_u0206;
(1 row)

(result for the same query, wspan is the column in question:

  wts           |   wdate    | wsft |      wspan       | wstate | wlin
------------------------+------------+------+------------------+--------+-----
2020-07-01 14:57:38+08 | 2020-07-01 | D    |              6.8 | S00    | F02
  2020-07-01 13:22:58+08 | 2020-07-01 | D    |             5.15 | S00   
| F02
2020-07-01 11:35:46+08 | 2020-07-01 | D    | 12.2166666666667 | S00    | F02
2020-07-01 09:50:01+08 | 2020-07-01 | D    | 12.5333333333333 | S00    | F02
2020-07-01 08:00:00+08 | 2020-07-01 | D    |                0 | S00    | F02
(5 rows) )

lets_db=#     select pg_get_viewdef('utlog.view_test1');
                 pg_get_viewdef
-----------------------------------------------
   SELECT stats_per_shift_filtered_b0206.wts,  +
      stats_per_shift_filtered_b0206.wdate,    +
      stats_per_shift_filtered_b0206.wsft,     +
      stats_per_shift_filtered_b0206.wspan,    +
      stats_per_shift_filtered_b0206.wstate,   +
      stats_per_shift_filtered_b0206.wline,    +
      stats_per_shift_filtered_b0206.rcodes    +
     FROM utlog.stats_per_shift_filtered_b0206;
(1 row)

(the result for the same query:

wts |   wdate    | wsft | wspan | wstate | wline | rcodes
------------------------+------------+------+-------+--------+-------+----------------+---------------------
  2020-07-01 09:50:01+08 | 2020-07-01 | D    |    13 | S00    | F02   |
{PDCB}
2020-07-01 11:35:46+08 | 2020-07-01 | D    |    12 | S00    | F02   | {CDSO}
2020-07-01 14:57:38+08 | 2020-07-01 | D    |     7 | S00    | F02   | {PDCB}
2020-07-01 08:00:00+08 | 2020-07-01 | D    |     5 | S00    | F02   | {PDCB}
2020-07-01 13:22:58+08 | 2020-07-01 | D    |     5 | S00    | F02   | {PDCB}
(5 rows)
)

The result in returned column looks different but

definition of the column in question (wspan::float8) looks identical in
both case.


Regards,

Ben


On 9/17/20 10:41 PM, Jerry Sievers wrote:

> Ben <[hidden email]> writes:
>
>> Dear List,
>>
>> Some further investigation.
>>
>> Creating a fresh View3 on View1 gives exactly the same result as
>> View1.
>>
>> The View1 View2 are both years old in a production database, in use
>> for quite some time. (The database is production duty but not hosted
>> in server room with UPS. It's like a edge PC in industry monitoring.
>> Now am more concerned with its data integrity)
>>
>> The problem with the final report is reported recently. I am not sure
>> what's broken in the database.
>>
>> I haven't replaced the broken View2 yet. Hope someone can point me to
>> some further investigation.
> Already mentioned downthread, but have a look at the view definitions
> by...
>
> select pg_get_viewdef('$your-view');
>
> Or...
>
> pg_dump --table $your-view
>
> Expect to see something different if you repeat the above for the old
> view giving undesired results and the new correct version.
>
> HTH
>
>
>
>> My concern is that if there are other views inside that database
>> having similar integrity issue, how can I find them all (if any).
>>
>> It's beyond my regular SQL ability. I guess I really need help from
>> people with maintenance experience.
>>
>> Any help will be appreciated, thanks in advance.
>>
>> Ben
>>
>>
>>
>> On September 16, 2020 3:40:34 AM UTC, Ben <[hidden email]>
>> wrote:
>>
>>      Dear list,
>>      
>>      Recently I am getting feedback, data in my analytic report is not
>>      repeatable. From time to time they get different data for the same time
>>      span.
>>      (but IIRC previously it was OK). Therefore I started debuging the View
>>      chain for that report, during which I bumped into this issue/phenomenon.
>>      
>>      In a over -simplified version:
>>      
>>      CREATE VIEW2 AS SELECT * FROM VIEW1;
>>      SELECT  col1 FROM VIEW2 WHERE cond1=True;
>>      SELECT  col1 FROM VIEW1 WHERE cond1=True;
>>      
>>      Now col1 from both views looks different. I don't know where to start to
>>      solve this problem.
>>      
>>      The actual situation is a bit more than that, the following is the
>>      actual query:
>>      
>>      
>>           -- trying to audit utlog weighed stat
>>           with t as (
>>           select '2020-07-01 00:00:00'::timestamp t0, '2020--07-02
>>      0:0:0'::timestamp t1
>>           )
>>           --select * from t;
>>           select *
>>           -- from utlog.cache_stats_per_shift_per_reason_weighed_stats
>>           -- from utlog.stats_per_shift_filtered_per_reason
>>           from utlog.stats_per_shift_filtered                     (let's call
>>      it #View2 for short)
>>           -- from utlog.stats_per_shift_filtered_b0206      (let's call it
>>      #View1 for short)
>>           -- from utlog.stats_per_shift
>>           cross join t
>>           where wline = 'F02'  and wts >= t.t0 and wts < t.t1 and wsft ='D'
>>           limit 100
>>           ;
>>      
>>      The Result for #View2
>>      
>>           wts                 | wsft | wspan  | wstate | wline | rcodes
>>           --------------------+------+--------+--------+-------+-------
>>           2020-07-01 08:00:00 | D    |      0 | S00    | F02   | {PDCB}
>>           2020-07-01 09:50:01 | D    | 12.533 | S00    | F02   | {PDCB}
>>           2020-07-01 11:35:46 | D    | 12.217 | S00    | F02   | {CDSO}
>>           2020-07-01 13:22:58 | D    |   5.15 | S00    | F02   | {PDCB}
>>           2020-07-01 14:57:38 | D    |    6.8 | S00    | F02   | {PDCB}
>>      
>>           INDEX | COLUMN_NAME | DATA_TYPE
>>           ------+-------------+------------
>>           1     | wts         | timestamptz
>>           3     | wsft        | varchar
>>           4     | wspan       | float8
>>           5     | wstate      | varchar
>>           6     | wline       | varchar
>>           7     | rcodes      | text[]
>>      
>>      
>>      Same query, the Result for #View1
>>      
>>           wts                 | wsft | wspan | wstate | wline | rcodes
>>           --------------------+------+-------+--------+-------+-------
>>           2020-07-01 08:00:00 | D    |     5 | S00    | F02   | {PDCB}
>>           2020-07-01 09:50:01 | D    |    13 | S00    | F02   | {PDCB}
>>           2020-07-01 11:35:46 | D    |    12 | S00    | F02   | {CDSO}
>>           2020-07-01 13:22:58 | D    |     5 | S00    | F02   | {PDCB}
>>           2020-07-01 14:57:38 | D    |     7 | S00    | F02   | {PDCB}
>>      
>>           INDEX | COLUMN_NAME | DATA_TYPE
>>           ------+-------------+------------
>>           1     | wts         | timestamptz
>>           3     | wsft        | varchar
>>           4     | wspan       | float8
>>           5     | wstate      | varchar
>>           6     | wline       | varchar
>>           7     | rcodes      | varchar[]
>>      
>>      Reuslts in `wspan` column is inaccurate while both type are float8. Most
>>      weird thing is the 5 to 0 change. for Row 1.
>>      
>>      The `_b0206`(#View1) is just a version of
>>      `stats_per_shift_filtered`(#View2) from past revisions.
>>      I am sure the original CREATE statement for (#View2) is `CREATE VIEW ...
>>      AS SELECT * FROM ...._b0206`
>>      
>>      Definition of View2 in SQLWorkbench/J generated schema:
>>      
>>      
>>           CREATE OR REPLACE VIEW utlog.stats_per_shift_filtered (#View2)
>>           (
>>           wts,
>>           wdate,
>>           wsft,
>>           wspan,
>>           wstate,
>>           wline,
>>           rcodes
>>           )
>>           AS
>>           SELECT stats_per_shift_filtered_u0206.wts,
>>               stats_per_shift_filtered_u0206.wsft::character varying AS wsft,
>>               stats_per_shift_filtered_u0206.wspan,
>>               stats_per_shift_filtered_u0206.wstate,
>>               stats_per_shift_filtered_u0206.wline,
>>               stats_per_shift_filtered_u0206.rcodes
>>           FROM utlog.stats_per_shift_filtered_u0206;  (as #View1 in this post)
>>      
>>      
>>      It feels like the utlog.stats_per_shift_filtered_u0206 in
>>      utlog.stats_per_shift_filtered definition is a different object from
>>      utlog.stats_per_shift_filtered_u0206?
>>      
>>      I am totally out of clues. Any help would be appreciated. Thanks.
>>      
>>      
>>      Regards,
>>      
>>      Ben
>>      
>>      
>>      
>>
>> --
>> Sent from my Android device with K-9 Mail. Please excuse my brevity.
>>
>>


Reply | Threaded
Open this post in threaded view
|

Re: Obvious data mismatch in View2 which basically SELECT * from View1

Tom Lane-2
Ben <[hidden email]> <[hidden email]> writes:
> You are right none of them is what I originally used to create them
> which should be
> `select * from utlog.stats_per_shift_filtered_b0206`, but they do look
> different,

This discrepancy could explain a lot:

>       (stats_per_shift_filtered_u0206.wsft)::character varying AS wsft,

vs

>       stats_per_shift_filtered_b0206.wsft,

Your problem query has a WHERE clause testing wsft, so this difference
is subtly affecting what that WHERE clause means.  That could result in
a plan change, thereby changing the row output order, which as already
noted would be sufficient to explain the inconsistency.

If I had to bet I'd suspect that the extra cast is blocking use of an
index on wsft.

                        regards, tom lane