BUG #15577: Query returns different results when executed multiple times

classic Classic list List threaded Threaded
37 messages Options
12
Reply | Threaded
Open this post in threaded view
|

BUG #15577: Query returns different results when executed multiple times

PG Doc comments form
The following bug has been logged on the website:

Bug reference:      15577
Logged by:          Bartosz Polnik
Email address:      [hidden email]
PostgreSQL version: 11.1
Operating system:   Windows 10
Description:        

Hello,

A couple of days ago I stumbled upon a query that doesn't return all rows it
should. When we execute it multiple times, we get only a subset of data. The
query is:

SELECT ta.id AS table_a_id,
       tc.id as table_c_id,
       tba.id AS table_b_id
FROM test.table_b_active tba
       INNER JOIN test.table_c tc ON tba.target_id = tc.id
       INNER JOIN test.table_d td ON tc.table_d_id = td.id
       LEFT JOIN test.table_a ta ON ta.table_c_id = tc.id AND ta.date =
'2018-08-31' :: DATE
WHERE tba.date BETWEEN '2018-08-10' :: DATE AND '2018-09-01' :: DATE
  AND td.group = 'A'
  AND tc.table_e_id = 4
  AND (
    (tba.target_id = tc.id AND tba.group_type = 'A')
    OR tba.source_id = tc.id
  );

We always expect 31 rows in return:

table_a_id | table_c_id | table_b_id
------------+------------+------------
   16116185 |     328860 |    2936924
   16115788 |     348539 |    2913754
   16115788 |     348539 |    3039173
   16115788 |     348539 |    2913874
   16116256 |     293541 |    2997160
   16116256 |     293541 |    2901938
   16116256 |     293541 |    2901933
   16114789 |     292051 |    3038539
   16114813 |     342353 |    3052371
   16116069 |     351585 |    3025941
   16114814 |     331329 |    2946332
   16115861 |     350487 |    2933633
   16115106 |     350047 |    2902075
   16116066 |     351434 |    3010909
   16114811 |     298605 |    2998909
   16114811 |     298605 |    2893809
   16114811 |     298605 |    2987038
   16114811 |     298605 |    3038877
   16114811 |     298605 |    3010694
   16114811 |     298605 |    2893188
   16114811 |     298605 |    2893391
   16114811 |     298605 |    2983360
   16114811 |     298605 |    3038221
   16114811 |     298605 |    3026078
   16114811 |     298605 |    2998966
   16116260 |     290583 |    2955483
   16116260 |     290583 |    2921135
   16116260 |     290583 |    2947914
   16116260 |     290583 |    2901669
   16116241 |     295971 |    3038921
   16116249 |     296708 |    3038888

But when we run it using psql, we get different results (not all rows are
returned). Here's the query I used to get the execution plan and example
output from two consecutive plan executions:

explain (costs false, analyze true)
  SELECT ta.id AS table_a_id,
         tc.id as table_c_id,
         tba.id AS table_b_id
  FROM test.table_b_active tba
         INNER JOIN test.table_c tc ON tba.target_id = tc.id
         INNER JOIN test.table_d td ON tc.table_d_id = td.id
         LEFT JOIN test.table_a ta ON ta.table_c_id = tc.id AND ta.date =
'2018-08-31' :: DATE
  WHERE tba.date BETWEEN '2018-08-10' :: DATE AND '2018-09-01' :: DATE
    AND td.group = 'A'
    AND tc.table_e_id = 4
    AND (
      (tba.target_id = tc.id AND tba.group_type = 'A')
      OR tba.source_id = tc.id
    );


Nested Loop Left Join  (actual time=157.436..329.538 rows=21 loops=1)
  ->  Gather  (actual time=157.407..349.743 rows=21 loops=1)
        Workers Planned: 1
        Workers Launched: 1
        ->  Nested Loop (actual time=167.978..281.540 rows=11 loops=2)
              ->  Hash Join  (actual time=0.300..18.206 rows=945 loops=2)
                    Hash Cond: (tc.table_d_id = td.id)
                    ->  Parallel Seq Scan on table_c tc (actual
time=0.107..17.326 rows=1751 loops=2)
                          Filter: (table_e_id = 4)
                          Rows Removed by Filter: 49117
                    ->  Hash (actual time=0.065..0.065 rows=8 loops=2)
                          Buckets: 1024  Batches: 1  Memory Usage: 9kB
                          ->  Seq Scan on table_d td  (actual
time=0.043..0.049 rows=8 loops=2)
                                Filter: (group = 'A')
                                Rows Removed by Filter: 55
              ->  Index Scan using table_b_idx_target_id on table_b (actual
time=0.273..0.278 rows=0 loops=1890)
                    Index Cond: (target_id = tc.id)
                    Filter: ((date >= '2018-08-10'::date) AND (date <=
'2018-09-01'::date) AND (((target_id = tc.id) AND (group_type = 'A')) OR
(source_id = tc.id)))
                    Rows Removed by Filter: 26
  ->  Index Scan using table_a_uq_001 on table_a ta (actual
time=0.019..0.019 rows=1 loops=21)
        Index Cond: ((table_c_id = tc.id) AND (date = '2018-08-31'::date))
Planning Time: 1.218 ms
Execution Time: 350.283 ms

Above we only got 21 rows, instead of 31.

Nested Loop Left Join (actual time=164.189..358.068 rows=28 loops=1)
  ->  Gather  (actual time=164.140..370.034 rows=28 loops=1)
        Workers Planned: 1
        Workers Launched: 1
        ->  Nested Loop  (actual time=188.718..306.289 rows=14 loops=2)
              ->  Hash Join (actual time=0.321..21.860 rows=945 loops=2)
                    Hash Cond: (tc.table_d_id = td.id)
                    ->  Parallel Seq Scan on table_c tc (actual
time=0.113..20.784 rows=1751 loops=2)
                          Filter: (table_e_id = 4)
                          Rows Removed by Filter: 49117
                    ->  Hash (actual time=0.063..0.063 rows=8 loops=2)
                          Buckets: 1024  Batches: 1  Memory Usage: 9kB
                          ->  Seq Scan on table_d td (actual
time=0.040..0.049 rows=8 loops=2)
                                Filter: (group = 'A')
                                Rows Removed by Filter: 55
              ->  Index Scan using table_b_idx_target_id on table_b (actual
time=0.294..0.300 rows=0 loops=1890)
                    Index Cond: (target_id = tc.id)
                    Filter: ((date >= '2018-08-10'::date) AND (date <=
'2018-09-01'::date) AND (((target_id = tc.id) AND (group_type = 'A')) OR
(source_id = tc.id)))
                    Rows Removed by Filter: 26
  ->  Index Scan using table_a_uq_001 on table_a ta (actual
time=0.031..0.031 rows=1 loops=28)
        Index Cond: ((table_c_id = tc.id) AND (date = '2018-08-31'::date))
Planning Time: 0.971 ms
Execution Time: 371.202 ms

Here we only got 28 rows, instead of 31.

Used indices:
create index table_b_idx_target_id
   on test.table_b (target_id)
   where (t_active IS TRUE);

create unique index table_a_uq_001
  on table_a (table_c_id, date);

And views:
create view test.table_b_active as
SELECT *
FROM test.table_b
WHERE (table_b.t_active IS TRUE);

I made a couple of tests and couldn't reproduce this problem with
max_parallel_workers=0. With this parameter set, we get the execution
plan:

Nested Loop Left Join (actual time=212.010..509.274 rows=31 loops=1)
  ->  Gather (actual time=211.976..508.855 rows=31 loops=1)
        Workers Planned: 1
        Workers Launched: 0
        ->  Nested Loop (actual time=211.593..508.425 rows=31 loops=1)
              ->  Hash Join (actual time=0.063..31.628 rows=1890 loops=1)
                    Hash Cond: (tc.table_d_id = td.id)
                    ->  Parallel Seq Scan on table_c tc (actual
time=0.042..30.000 rows=3501 loops=1)
                          Filter: (table_e_id = 4)
                          Rows Removed by Filter: 98233
                    ->  Hash (actual time=0.016..0.016 rows=8 loops=1)
                          Buckets: 1024  Batches: 1  Memory Usage: 9kB
                          ->  Seq Scan on table_d_id td (actual
time=0.008..0.012 rows=8 loops=1)
                                Filter: (group = 'A')
                                Rows Removed by Filter: 55
              ->  Index Scan using table_b_idx_target_id on table_b (actual
time=0.248..0.251 rows=0 loops=1890)
                    Index Cond: (target_id = tc.id)
                    Filter: ((date >= '2018-08-10'::date) AND (date <=
'2018-09-01'::date) AND (((target_id = tc.id) AND (group_type = 'A')) OR
(source_id = tc.id)))
                    Rows Removed by Filter: 26
  ->  Index Scan using table_a_uq_001 on table_a ta (actual
time=0.014..0.014 rows=1 loops=31)
        Index Cond: ((table_c_id = tc.id) AND (date = '2018-08-31'::date))
Planning Time: 1.135 ms
Execution Time: 509.431 ms

Nr of rows returned is correct - 31.

I tried to create a clean schema with test data, but couldn't get the same
execution plan, so I can't include that.
The data above comes from db snapshot with table, index and view names
changed by hand, so there might be some typos. Sorry about them.

I'm using PostgreSQL 11.1, compiled by Visual C++ build 1914, 64-bit.

Best regards,
Bartosz Polnik

Reply | Threaded
Open this post in threaded view
|

Re: BUG #15577: Query returns different results when executed multiple times

David Rowley-3
On Mon, 7 Jan 2019 at 05:46, PG Bug reporting form
<[hidden email]> wrote:
> A couple of days ago I stumbled upon a query that doesn't return all rows it
> should.

>         ->  Nested Loop (actual time=167.978..281.540 rows=11 loops=2)

>         ->  Nested Loop  (actual time=188.718..306.289 rows=14 loops=2)

Above is the first place where the actual row counts diverge. This
makes it appear that it's the join to table_b that's causing the
variation. The parallel worker is meant to be running with the same
snapshot as the main process so that it properly sees the same tuples
as visible. If for some reason that was not working correctly then
that might explain why you get differing results.

Can you confirm if there are concurrent changes being made to table_b
perhaps while the query is running?

Do you still get the variation with an otherwise idle database with no
open transactions that's just received a complete VACUUM?

I'm able to produce the same plan as you're getting with the attached
script, but unable to see any sort of row variation. My row counts
don't match yours exactly. If I adjust the rows in table_b too much I
get a completely different plan.

I tested with PostgreSQL 11.1, compiled by Visual C++ build 1916, 64-bit.

> I tried to create a clean schema with test data, but couldn't get the same
> execution plan, so I can't include that.

If you're removing columns to simplify the test script then you made
need to add some columns back to pad the tuples out a bit in order to
keep the tables around the same size. Variations in the table size
will have an effect on the join costs and could change the join order.
Also, a smaller table_c may not receive a parallel seq scan. In my
mockup, I made table_b and table_a fairly large so as to keep the
joins as parameterized nested loops. I also left out indexes on the
"id" columns to reduce the chances of a Merge Join.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

mockup_bug15577.sql (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: BUG #15577: Query returns different results when executed multiple times

Bartosz Polnik
> Can you confirm if there are concurrent changes being made to table_b
> perhaps while the query is running?  

I confirm. There are no concurrent changes being made and I'm the only client connecting to that db.

> Do you still get the variation with an otherwise idle database with no
> open transactions that's just received a complete VACUUM?  

I executed VACUUM (FULL, ANALYZE) on all tables from the query, but I'm still getting different results.

With your script, I'm always getting the same output, so it must be something else.

I thought about sharing db dump, but I can't do that due to the company policy. What I could share though, is anonymized db dump. Do you know if there is any good tool for performing anonymization?


On Mon, Jan 7, 2019 at 2:57 AM David Rowley <[hidden email]> wrote:
On Mon, 7 Jan 2019 at 05:46, PG Bug reporting form
<[hidden email]> wrote:
> A couple of days ago I stumbled upon a query that doesn't return all rows it
> should.

>         ->  Nested Loop (actual time=167.978..281.540 rows=11 loops=2)

>         ->  Nested Loop  (actual time=188.718..306.289 rows=14 loops=2)

Above is the first place where the actual row counts diverge. This
makes it appear that it's the join to table_b that's causing the
variation. The parallel worker is meant to be running with the same
snapshot as the main process so that it properly sees the same tuples
as visible. If for some reason that was not working correctly then
that might explain why you get differing results.

Can you confirm if there are concurrent changes being made to table_b
perhaps while the query is running?

Do you still get the variation with an otherwise idle database with no
open transactions that's just received a complete VACUUM?

I'm able to produce the same plan as you're getting with the attached
script, but unable to see any sort of row variation. My row counts
don't match yours exactly. If I adjust the rows in table_b too much I
get a completely different plan.

I tested with PostgreSQL 11.1, compiled by Visual C++ build 1916, 64-bit.

> I tried to create a clean schema with test data, but couldn't get the same
> execution plan, so I can't include that.

If you're removing columns to simplify the test script then you made
need to add some columns back to pad the tuples out a bit in order to
keep the tables around the same size. Variations in the table size
will have an effect on the join costs and could change the join order.
Also, a smaller table_c may not receive a parallel seq scan. In my
mockup, I made table_b and table_a fairly large so as to keep the
joins as parameterized nested loops. I also left out indexes on the
"id" columns to reduce the chances of a Merge Join.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
Reply | Threaded
Open this post in threaded view
|

Re: BUG #15577: Query returns different results when executed multiple times

Thomas Munro-3
On Mon, Jan 7, 2019 at 10:29 PM Bartosz Polnik <[hidden email]> wrote:

>
> > Can you confirm if there are concurrent changes being made to table_b
> > perhaps while the query is running?
>
> I confirm. There are no concurrent changes being made and I'm the only client connecting to that db.
>
> > Do you still get the variation with an otherwise idle database with no
> > open transactions that's just received a complete VACUUM?
>
> I executed VACUUM (FULL, ANALYZE) on all tables from the query, but I'm still getting different results.
>
> With your script, I'm always getting the same output, so it must be something else.

What do you get if you run it with SET force_parallel_mode = on?

--
Thomas Munro
http://www.enterprisedb.com

Reply | Threaded
Open this post in threaded view
|

Re: BUG #15577: Query returns different results when executed multiple times

Bartosz Polnik
I've modified the script:
1. I Added SET force_parallel_mode = on;:

SET force_parallel_mode = on;
explain (costs off, analyze)
  SELECT ta.id AS table_a_id,
         tc.id as table_c_id,
         tba.id AS table_b_id

2. I've changed group and group_type to be enums (we have them as enums in db):
drop table if exists table_c;
drop table if exists table_d;
drop type if exists "group";
drop type if exists "group_type";

create type group_type AS ENUM ('A', 'B');
create type "group" AS ENUM ('A', 'B');

Updated version of the script is in attachment.

When I execute it, I get:

c:\Program Files\PostgreSQL\11\bin>psql -U postgres -f "C:\mockup_bug15577.sql" test
DROP VIEW
DROP TABLE
DROP TABLE
DROP TABLE
DROP TABLE
DROP TYPE
DROP TYPE
CREATE TABLE
INSERT 0 20000
CREATE TYPE
CREATE TYPE
CREATE TABLE
INSERT 0 200000
CREATE TABLE
INSERT 0 101736
CREATE TABLE
INSERT 0 8
INSERT 0 55
CREATE INDEX
CREATE INDEX
CREATE VIEW
ANALYZE
SET
SET
ALTER TABLE
DROP VIEW
DROP TABLE
DROP TABLE
DROP TABLE
DROP TABLE
DROP TYPE
DROP TYPE
CREATE TYPE
CREATE TYPE
CREATE TABLE
INSERT 0 20000
CREATE TABLE
INSERT 0 200000
CREATE TABLE
INSERT 0 101736
CREATE TABLE
INSERT 0 8
INSERT 0 55
CREATE INDEX
CREATE INDEX
CREATE VIEW
ANALYZE
SET
                                                                                    QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Left Join (actual time=1.185..81.170 rows=112 loops=1)
   ->  Gather (actual time=1.179..95.573 rows=112 loops=1)
         Workers Planned: 1
         Workers Launched: 1
         ->  Nested Loop (actual time=0.076..7.182 rows=56 loops=2)
               ->  Hash Join (actual time=0.050..4.592 rows=936 loops=2)
                     Hash Cond: (tc.table_d_id = td.id)
                     ->  Parallel Seq Scan on table_c tc (actual time=0.004..4.358 rows=1755 loops=2)
                           Filter: (table_e_id = 4)
                           Rows Removed by Filter: 49114
                     ->  Hash (actual time=0.019..0.020 rows=8 loops=1)
                           Buckets: 1024  Batches: 1  Memory Usage: 9kB
                           ->  Seq Scan on table_d td (actual time=0.011..0.014 rows=8 loops=1)
                                 Filter: ("group" = 'A'::"group")
                                 Rows Removed by Filter: 55
               ->  Index Scan using table_b_idx_target_id on table_b (actual time=0.003..0.003 rows=0 loops=1871)
                     Index Cond: (target_id = tc.id)
                     Filter: ((date >= '2018-08-10'::date) AND (date <= '2018-09-01'::date) AND (((target_id = tc.id) AND (group_type = 'A'::group_type)) OR (source_id = tc.id)))
                     Rows Removed by Filter: 1
   ->  Index Scan using table_a_uq_001 on table_a ta (actual time=0.003..0.003 rows=0 loops=112)
         Index Cond: ((table_c_id = tc.id) AND (date = '2018-08-31'::date))
 Planning Time: 0.773 ms
 Execution Time: 96.037 ms
(23 rows)

Nr of rows doesn't change from execution to execution.



On Mon, Jan 7, 2019 at 1:20 PM Thomas Munro <[hidden email]> wrote:
On Mon, Jan 7, 2019 at 10:29 PM Bartosz Polnik <[hidden email]> wrote:
>
> > Can you confirm if there are concurrent changes being made to table_b
> > perhaps while the query is running?
>
> I confirm. There are no concurrent changes being made and I'm the only client connecting to that db.
>
> > Do you still get the variation with an otherwise idle database with no
> > open transactions that's just received a complete VACUUM?
>
> I executed VACUUM (FULL, ANALYZE) on all tables from the query, but I'm still getting different results.
>
> With your script, I'm always getting the same output, so it must be something else.

What do you get if you run it with SET force_parallel_mode = on?

--
Thomas Munro
http://www.enterprisedb.com

mockup_bug15577.sql (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: BUG #15577: Query returns different results when executed multiple times

Thomas Munro-3
On Tue, Jan 8, 2019 at 1:56 AM Bartosz Polnik <[hidden email]> wrote:
> I've modified the script:
> 1. I Added SET force_parallel_mode = on;:

What if you try that with your original query in the database that
shows the problem?  Or alternatively SET parallel_leader_participation
= on.  What I'm wondering is whether the parallel worker is
consistently getting the same result, but it's a different result from
the leader, and the variation comes from the race between leader and
worker, or whether the instability comes from somewhere else.

--
Thomas Munro
http://www.enterprisedb.com

Reply | Threaded
Open this post in threaded view
|

Re: BUG #15577: Query returns different results when executed multiple times

Bartosz Polnik
For the following sql:

SET force_parallel_mode = on;
explain (costs false, analyze true)
  SELECT ta.id AS table_a_id,
         tc.id as table_c_id,
         tba.id AS table_b_id
  FROM test.table_b_active tba
         INNER JOIN test.table_c tc ON tba.target_id = tc.id
         INNER JOIN test.table_d td ON tc.table_d_id = td.id
         LEFT JOIN test.table_a ta ON ta.table_c_id = tc.id AND ta.date =
'2018-08-31' :: DATE
  WHERE tba.date BETWEEN '2018-08-10' :: DATE AND '2018-09-01' :: DATE
    AND td.group = 'A'
    AND tc.table_e_id = 4
    AND (
      (tba.target_id = tc.id AND tba.group_type = 'A')
      OR tba.source_id = tc.id
    );


I got:

                                                                                                              QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Left Join (actual time=145.151..284.612 rows=26 loops=1)
   ->  Gather (actual time=145.092..304.166 rows=26 loops=1)
         Workers Planned: 1
         Workers Launched: 1
         ->  Nested Loop (actual time=153.289..247.768 rows=13 loops=2)
               ->  Hash Join (actual time=0.221..15.028 rows=945 loops=2)
                     Hash Cond: (tc.table_d_id = td.id)
                     ->  Parallel Seq Scan on table_c tc (actual time=0.085..14.442 rows=1751 loops=2)
                           Filter: (table_e_id = 4)
                           Rows Removed by Filter: 49117
                     ->  Hash (actual time=0.044..0.045 rows=8 loops=2)
                           Buckets: 1024  Batches: 1  Memory Usage: 9kB
                           ->  Seq Scan on table_d td (actual time=0.028..0.033 rows=8 loops=2)
                                 Filter: (group = 'A'::test.group)
                                 Rows Removed by Filter: 55
               ->  Index Scan using table_b_idx_target_id on table_b (actual time=0.242..0.246 rows=0 loops=1890)
                     Index Cond: (target_id = tc.id)
                     Filter: ((date >= '2018-08-10'::date) AND (date <= '2018-09-01'::date) AND (((target_id = tc.id) AND (group_type = 'A'::test.group_type)) OR (source_id = tc.id)))
                     Rows Removed by Filter: 26
   ->  Index Scan using table_a_uq_001 on table_a ta (actual time=0.016..0.016 rows=1 loops=26)
         Index Cond: ((table_c_id = tc.id) AND (date = '2018-08-31'::date))
 Planning Time: 5.024 ms
 Execution Time: 304.858 ms


On Second execution:
                                                                                                              QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Left Join (actual time=228.405..279.383 rows=19 loops=1)
   ->  Gather (actual time=228.376..288.807 rows=19 loops=1)
         Workers Planned: 1
         Workers Launched: 1
         ->  Nested Loop (actual time=150.785..242.440 rows=10 loops=2)
               ->  Hash Join (actual time=0.212..14.353 rows=945 loops=2)
                     Hash Cond: (tc.table_d_id = td.id)
                     ->  Parallel Seq Scan on table_c tc (actual time=0.075..13.794 rows=1751 loops=2)
                           Filter: (table_e_id = 4)
                           Rows Removed by Filter: 49117
                     ->  Hash (actual time=0.050..0.050 rows=8 loops=2)
                           Buckets: 1024  Batches: 1  Memory Usage: 9kB
                           ->  Seq Scan on table_d_id td (actual time=0.031..0.037 rows=8 loops=2)
                                 Filter: (group = 'A'::test.group)
                                 Rows Removed by Filter: 55
               ->  Index Scan using table_b_idx_target_id on table_b (actual time=0.237..0.241 rows=0 loops=1890)
                     Index Cond: (target_id = tc.id)
                     Filter: ((date >= '2018-08-10'::date) AND (date <= '2018-09-01'::date) AND (((target_id = tc.id) AND (group_type = 'A'::test.group_type)) OR (source_id = tc.id)))
                     Rows Removed by Filter: 26
   ->  Index Scan using table_a_uq_001 on table_a ta (actual time=0.021..0.021 rows=1 loops=19)
         Index Cond: ((table_c_id = tc.id) AND (date = '2018-08-31'::date))
 Planning Time: 4.816 ms
 Execution Time: 289.503 ms


I executed it a couple of times and saw in output 26 rows (as above), 21 rows (also above), 28 rows, 28 rows, 29 rows etc. 
The changing parts (nr of rows) were: Nested Loop Left Join and Nested Loop.


For the following sql:

SET parallel_leader_participation = on;
explain (costs false, analyze true)
  SELECT ta.id AS table_a_id,
         tc.id as table_c_id,
         tba.id AS table_b_id
  FROM test.table_b_active tba
         INNER JOIN test.table_c tc ON tba.target_id = tc.id
         INNER JOIN test.table_d td ON tc.table_d_id = td.id
         LEFT JOIN test.table_a ta ON ta.table_c_id = tc.id AND ta.date =
'2018-08-31' :: DATE
  WHERE tba.date BETWEEN '2018-08-10' :: DATE AND '2018-09-01' :: DATE
    AND td.group = 'A'
    AND tc.table_e_id = 4
    AND (
      (tba.target_id = tc.id AND tba.group_type = 'A')
      OR tba.source_id = tc.id
    );

I received:

                                                                                                              QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Left Join (actual time=143.805..279.354 rows=29 loops=1)
   ->  Gather (actual time=143.735..299.066 rows=29 loops=1)
         Workers Planned: 1
         Workers Launched: 1
         ->  Nested Loop (actual time=149.334..242.108 rows=15 loops=2)
               ->  Hash Join (actual time=0.203..14.536 rows=945 loops=2)
                     Hash Cond: (tc.table_d_id = td.id)
                     ->  Parallel Seq Scan on table_c tc (actual time=0.072..13.971 rows=1751 loops=2)
                           Filter: (table_e_id = 4)
                           Rows Removed by Filter: 49117
                     ->  Hash (actual time=0.043..0.043 rows=8 loops=2)
                           Buckets: 1024  Batches: 1  Memory Usage: 9kB
                           ->  Seq Scan on table_d td (actual time=0.028..0.034 rows=8 loops=2)
                                 Filter: (group = 'A'::test.group)
                                 Rows Removed by Filter: 55
               ->  Index Scan using table_b_idx_target_id on table_b (actual time=0.237..0.240 rows=0 loops=1890)
                     Index Cond: (target_id = tc.id)
                     Filter: ((date >= '2018-08-10'::date) AND (date <= '2018-09-01'::date) AND (((target_id = tc.id) AND (group_type = 'A'::test.group_type)) OR (source_id = tc.id)))
                     Rows Removed by Filter: 26
   ->  Index Scan using table_a_uq_001 on table_a ta (actual time=0.018..0.018 rows=1 loops=29)
         Index Cond: ((table_c_id = tc.id) AND (date = '2018-08-31'::date))
 Planning Time: 5.070 ms
 Execution Time: 299.849 ms



                                                                                                              QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Left Join (actual time=231.714..276.770 rows=26 loops=1)
   ->  Gather (actual time=231.654..295.971 rows=26 loops=1)
         Workers Planned: 1
         Workers Launched: 1
         ->  Nested Loop (actual time=151.209..239.965 rows=13 loops=2)
               ->  Hash Join (actual time=0.214..14.467 rows=945 loops=2)
                     Hash Cond: (tc.table_d_id = td.id)
                     ->  Parallel Seq Scan on table_c tc (actual time=0.070..13.923 rows=1751 loops=2)
                           Filter: (table_e_id = 4)
                           Rows Removed by Filter: 49117
                     ->  Hash (actual time=0.045..0.045 rows=8 loops=2)
                           Buckets: 1024  Batches: 1  Memory Usage: 9kB
                           ->  Seq Scan on table_d td (actual time=0.028..0.033 rows=8 loops=2)
                                 Filter: (group = 'A'::test.group)
                                 Rows Removed by Filter: 55
               ->  Index Scan using table_b_idx_target_id on table_b (actual time=0.235..0.238 rows=0 loops=1890)
                     Index Cond: (target_id = tc.id)
                     Filter: ((date >= '2018-08-10'::date) AND (date <= '2018-09-01'::date) AND (((target_id = tc.id) AND (group_type = 'A'::test.group_type)) OR (source_id = tc.id)))
                     Rows Removed by Filter: 26
   ->  Index Scan using table_a_uq_001 on table_a ta (actual time=0.017..0.017 rows=1 loops=26)
         Index Cond: ((table_c_id = tc.id) AND (date = '2018-08-31'::date))
 Planning Time: 4.976 ms
 Execution Time: 296.729 ms


I also made a test with:

SET parallel_leader_participation = off;
SET force_parallel_mode = on;
explain (costs false, analyze true)
  SELECT ta.id AS table_a_id,
         tc.id as table_c_id,
         tba.id AS table_b_id
  FROM test.table_b_active tba
         INNER JOIN test.table_c tc ON tba.target_id = tc.id
         INNER JOIN test.table_d td ON tc.table_d_id = td.id
         LEFT JOIN test.table_a ta ON ta.table_c_id = tc.id AND ta.date =
'2018-08-31' :: DATE
  WHERE tba.date BETWEEN '2018-08-10' :: DATE AND '2018-09-01' :: DATE
    AND td.group = 'A'
    AND tc.table_e_id = 4
    AND (
      (tba.target_id = tc.id AND tba.group_type = 'A')
      OR tba.source_id = tc.id
    );

The output is correct - 31 rows, but the plan is different:
                                                                                                              QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Gather (actual time=263.880..507.668 rows=31 loops=1)
   Workers Planned: 1
   Workers Launched: 1
   Single Copy: true
   ->  Nested Loop Left Join (actual time=193.246..415.020 rows=31 loops=1)
         ->  Nested Loop (actual time=192.997..414.336 rows=31 loops=1)
               ->  Hash Join (actual time=0.287..23.958 rows=1890 loops=1)
                     Hash Cond: (tc.table_d_id = td.id)
                     ->  Seq Scan on table_c tc (actual time=0.128..23.044 rows=3501 loops=1)
                           Filter: (table_e_id = 4)
                           Rows Removed by Filter: 98233
                     ->  Hash (actual time=0.064..0.069 rows=8 loops=1)
                           Buckets: 1024  Batches: 1  Memory Usage: 9kB
                           ->  Seq Scan on table_d td (actual time=0.045..0.052 rows=8 loops=1)
                                 Filter: (group = 'A'::test.group)
                                 Rows Removed by Filter: 55
               ->  Index Scan using table_b_idx_target_id on table_b (actual time=0.203..0.206 rows=0 loops=1890)
                     Index Cond: (target_id = tc.id)
                     Filter: ((date >= '2018-08-10'::date) AND (date <= '2018-09-01'::date) AND (((target_id = tc.id) AND (group_type = 'A'::test.group_type)) OR (source_id = tc.id)))
                     Rows Removed by Filter: 26
         ->  Index Scan using table_a_uq_001 on table_a ta (actual time=0.021..0.021 rows=1 loops=31)
               Index Cond: ((table_c_id = tc.id) AND (date = '2018-08-31'::date))
 Planning Time: 5.051 ms
 Execution Time: 507.858 ms


On Mon, Jan 7, 2019 at 8:59 PM Thomas Munro <[hidden email]> wrote:
On Tue, Jan 8, 2019 at 1:56 AM Bartosz Polnik <[hidden email]> wrote:
> I've modified the script:
> 1. I Added SET force_parallel_mode = on;:

What if you try that with your original query in the database that
shows the problem?  Or alternatively SET parallel_leader_participation
= on.  What I'm wondering is whether the parallel worker is
consistently getting the same result, but it's a different result from
the leader, and the variation comes from the race between leader and
worker, or whether the instability comes from somewhere else.

--
Thomas Munro
http://www.enterprisedb.com
Reply | Threaded
Open this post in threaded view
|

Re: BUG #15577: Query returns different results when executed multiple times

Thomas Munro-3
On Tue, Jan 8, 2019 at 10:39 AM Bartosz Polnik <[hidden email]> wrote:

>  Gather (actual time=263.880..507.668 rows=31 loops=1)
>    Workers Planned: 1
>    Workers Launched: 1
>    Single Copy: true
>    ->  Nested Loop Left Join (actual time=193.246..415.020 rows=31 loops=1)
>          ->  Nested Loop (actual time=192.997..414.336 rows=31 loops=1)
>                ->  Hash Join (actual time=0.287..23.958 rows=1890 loops=1)
>                      Hash Cond: (tc.table_d_id = td.id)
>                      ->  Seq Scan on table_c tc (actual time=0.128..23.044 rows=3501 loops=1)
>                            Filter: (table_e_id = 4)
>                            Rows Removed by Filter: 98233
>                      ->  Hash (actual time=0.064..0.069 rows=8 loops=1)
>                            Buckets: 1024  Batches: 1  Memory Usage: 9kB
>                            ->  Seq Scan on table_d td (actual time=0.045..0.052 rows=8 loops=1)
>                                  Filter: (group = 'A'::test.group)
>                                  Rows Removed by Filter: 55
>                ->  Index Scan using table_b_idx_target_id on table_b (actual time=0.203..0.206 rows=0 loops=1890)
>                      Index Cond: (target_id = tc.id)
>                      Filter: ((date >= '2018-08-10'::date) AND (date <= '2018-09-01'::date) AND (((target_id = tc.id) AND (group_type = 'A'::test.group_type)) OR (source_id = tc.id)))
>                      Rows Removed by Filter: 26
>          ->  Index Scan using table_a_uq_001 on table_a ta (actual time=0.021..0.021 rows=1 loops=31)
>                Index Cond: ((table_c_id = tc.id) AND (date = '2018-08-31'::date))

Hmm.  So if the plan runs entirely in a worker, we get 31 rows.  If
the plan runs entirely in the leader, we get 31 rows.  But if it runs
in both, we get fewer.  In all cases the Hash Join produces 1890 rows,
and in all cases the Nested Loop probes table_b_idx_target_id for all
of them: "loops=1890".  But somehow the output of the Nested Loop is
not 31 rows (or 31 / 2) when divided over two processes.

As a sanity check, can you please see if the run-only-in-leader case
(max_parallel_workers = 0, so that we see "Workers Launched: 0")
produces the *same* 31 rows as the run-only-in-worker case
(force_parallel_mode = on, so that we see "Single Copy: true")?  That
is, the actual values of those 31 rows, in particular the columns
coming from table_b.

--
Thomas Munro
http://www.enterprisedb.com

Reply | Threaded
Open this post in threaded view
|

Re: BUG #15577: Query returns different results when executed multiple times

Bartosz Polnik
For:

SET max_parallel_workers = 0;
explain (costs false, analyze true)
SELECT ta.id AS table_a_id,
       tc.id as table_c_id,
       tba.id AS table_b_id
FROM test.table_b_active tba
       INNER JOIN test.table_c tc ON tba.target_id = tc.id
       INNER JOIN test.table_d td ON tc.table_d_id = td.id
       LEFT JOIN test.table_a ta ON ta.table_c_id = tc.id AND ta.date =
'2018-08-31' :: DATE
WHERE tba.date BETWEEN '2018-08-10' :: DATE AND '2018-09-01' :: DATE
  AND td.group = 'A'
  AND tc.table_e_id = 4
  AND (
    (tba.target_id = tc.id AND tba.group_type = 'A')
    OR tba.source_id = tc.id
  );

SELECT ta.id AS table_a_id,
       tc.id as table_c_id,
       tba.id AS table_b_id
FROM test.table_b_active tba
       INNER JOIN test.table_c tc ON tba.target_id = tc.id
       INNER JOIN test.table_d td ON tc.table_d_id = td.id
       LEFT JOIN test.table_a ta ON ta.table_c_id = tc.id AND ta.date =
'2018-08-31' :: DATE
WHERE tba.date BETWEEN '2018-08-10' :: DATE AND '2018-09-01' :: DATE
  AND td.group = 'A'
  AND tc.table_e_id = 4
  AND (
    (tba.target_id = tc.id AND tba.group_type = 'A')
    OR tba.source_id = tc.id
  );

I got:

                                                                                                              QUERY PLAN                                                                                                
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Left Join (actual time=196.969..433.340 rows=31 loops=1)
   ->  Gather (actual time=196.897..432.757 rows=31 loops=1)
         Workers Planned: 1
         Workers Launched: 0
         ->  Nested Loop (actual time=196.384..432.197 rows=31 loops=1)
               ->  Hash Join (actual time=0.104..24.477 rows=1890 loops=1)
                     Hash Cond: (tc.table_d_id = td.id)
                     ->  Parallel Seq Scan on table_c tc (actual time=0.060..23.639 rows=3501 loops=1)
                           Filter: (table_e_id = 4)
                           Rows Removed by Filter: 98233
                     ->  Hash (actual time=0.022..0.023 rows=8 loops=1)
                           Buckets: 1024  Batches: 1  Memory Usage: 9kB
                           ->  Seq Scan on table_d td (actual time=0.009..0.013 rows=8 loops=1)
                                 Filter: (group = 'A'::test.group)
                                 Rows Removed by Filter: 55
               ->  Index Scan using table_b_idx_target_id on table_b (actual time=0.212..0.215 rows=0 loops=1890)
                     Index Cond: (target_id = tc.id)
                     Filter: ((date >= '2018-08-10'::date) AND (date <= '2018-09-01'::date) AND (((target_id = tc.id) AND (group_type = 'A'::test.group_type)) OR (source_id = tc.id)))
                     Rows Removed by Filter: 26
   ->  Index Scan using table_a_uq_001 on table_a ta (actual time=0.019..0.019 rows=1 loops=31)
         Index Cond: ((table_c_id = tc.id) AND (date = '2018-08-31'::date))
 Planning Time: 5.084 ms
 Execution Time: 433.630 ms


 table_a_id | table_c_id | table_b_id
------------+------------+------------
   16116185 |     328860 |    2936924
   16115788 |     348539 |    3039173
   16115788 |     348539 |    2913874
   16115788 |     348539 |    2913754
   16116256 |     293541 |    2901938
   16116256 |     293541 |    2901933
   16116256 |     293541 |    2997160
   16114789 |     292051 |    3038539
   16114813 |     342353 |    3052371
   16116069 |     351585 |    3025941
   16114814 |     331329 |    2946332
   16115861 |     350487 |    2933633
   16115106 |     350047 |    2902075
   16116066 |     351434 |    3010909
   16114811 |     298605 |    2893809
   16114811 |     298605 |    2987038
   16114811 |     298605 |    3038877
   16114811 |     298605 |    3010694
   16114811 |     298605 |    2893188
   16114811 |     298605 |    2893391
   16114811 |     298605 |    2983360
   16114811 |     298605 |    3038221
   16114811 |     298605 |    3026078
   16114811 |     298605 |    2998966
   16114811 |     298605 |    2998909
   16116260 |     290583 |    2955483
   16116260 |     290583 |    2921135
   16116260 |     290583 |    2947914
   16116260 |     290583 |    2901669
   16116241 |     295971 |    3038921
   16116249 |     296708 |    3038888
(31 rows)

When I use force_parallel_mode = on, parallel_leader_participation = off and removed max_parallel_workers line from the script (so that default applies), I got:

                                                                                                              QUERY PLAN                                                                                                
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Gather (actual time=271.255..530.713 rows=31 loops=1)
   Workers Planned: 1
   Workers Launched: 1
   Single Copy: true
   ->  Nested Loop Left Join (actual time=200.434..437.321 rows=31 loops=1)
         ->  Nested Loop (actual time=200.126..436.547 rows=31 loops=1)
               ->  Hash Join (actual time=0.322..25.510 rows=1890 loops=1)
                     Hash Cond: (tc.table_d_id = td.id)
                     ->  Seq Scan on table_c tc (actual time=0.141..24.501 rows=3501 loops=1)
                           Filter: (table_e_id = 4)
                           Rows Removed by Filter: 98233
                     ->  Hash (actual time=0.082..0.084 rows=8 loops=1)
                           Buckets: 1024  Batches: 1  Memory Usage: 9kB
                           ->  Seq Scan on table_d td (actual time=0.049..0.066 rows=8 loops=1)
                                 Filter: (group = 'A'::test.group)
                                 Rows Removed by Filter: 55
               ->  Index Scan using table_b_idx_target_id on table_b (actual time=0.214..0.217 rows=0 loops=1890)
                     Index Cond: (target_id = tc.id)
                     Filter: ((date >= '2018-08-10'::date) AND (date <= '2018-09-01'::date) AND (((target_id = tc.id) AND (group_type = 'A'::test.group_type)) OR (source_id = tc.id)))
                     Rows Removed by Filter: 26
         ->  Index Scan using table_a_uq_001 on table_a ta (actual time=0.024..0.024 rows=1 loops=31)
               Index Cond: ((table_c_id = tc.id) AND (date = '2018-08-31'::date))
 Planning Time: 5.641 ms
 Execution Time: 530.926 ms

 table_a_id | table_c_id | table_b_id
------------+------------+------------
   16116185 |     328860 |    2936924
   16115788 |     348539 |    3039173
   16115788 |     348539 |    2913874
   16115788 |     348539 |    2913754
   16116256 |     293541 |    2901938
   16116256 |     293541 |    2901933
   16116256 |     293541 |    2997160
   16114789 |     292051 |    3038539
   16114813 |     342353 |    3052371
   16116069 |     351585 |    3025941
   16114814 |     331329 |    2946332
   16115861 |     350487 |    2933633
   16115106 |     350047 |    2902075
   16116066 |     351434 |    3010909
   16114811 |     298605 |    2893809
   16114811 |     298605 |    2987038
   16114811 |     298605 |    3038877
   16114811 |     298605 |    3010694
   16114811 |     298605 |    2893188
   16114811 |     298605 |    2893391
   16114811 |     298605 |    2983360
   16114811 |     298605 |    3038221
   16114811 |     298605 |    3026078
   16114811 |     298605 |    2998966
   16114811 |     298605 |    2998909
   16116260 |     290583 |    2955483
   16116260 |     290583 |    2921135
   16116260 |     290583 |    2947914
   16116260 |     290583 |    2901669
   16116241 |     295971 |    3038921
   16116249 |     296708 |    3038888
(31 rows)

Rows returned by two queries are the same.

On Mon, Jan 7, 2019 at 11:23 PM Thomas Munro <[hidden email]> wrote:
On Tue, Jan 8, 2019 at 10:39 AM Bartosz Polnik <[hidden email]> wrote:
>  Gather (actual time=263.880..507.668 rows=31 loops=1)
>    Workers Planned: 1
>    Workers Launched: 1
>    Single Copy: true
>    ->  Nested Loop Left Join (actual time=193.246..415.020 rows=31 loops=1)
>          ->  Nested Loop (actual time=192.997..414.336 rows=31 loops=1)
>                ->  Hash Join (actual time=0.287..23.958 rows=1890 loops=1)
>                      Hash Cond: (tc.table_d_id = td.id)
>                      ->  Seq Scan on table_c tc (actual time=0.128..23.044 rows=3501 loops=1)
>                            Filter: (table_e_id = 4)
>                            Rows Removed by Filter: 98233
>                      ->  Hash (actual time=0.064..0.069 rows=8 loops=1)
>                            Buckets: 1024  Batches: 1  Memory Usage: 9kB
>                            ->  Seq Scan on table_d td (actual time=0.045..0.052 rows=8 loops=1)
>                                  Filter: (group = 'A'::test.group)
>                                  Rows Removed by Filter: 55
>                ->  Index Scan using table_b_idx_target_id on table_b (actual time=0.203..0.206 rows=0 loops=1890)
>                      Index Cond: (target_id = tc.id)
>                      Filter: ((date >= '2018-08-10'::date) AND (date <= '2018-09-01'::date) AND (((target_id = tc.id) AND (group_type = 'A'::test.group_type)) OR (source_id = tc.id)))
>                      Rows Removed by Filter: 26
>          ->  Index Scan using table_a_uq_001 on table_a ta (actual time=0.021..0.021 rows=1 loops=31)
>                Index Cond: ((table_c_id = tc.id) AND (date = '2018-08-31'::date))

Hmm.  So if the plan runs entirely in a worker, we get 31 rows.  If
the plan runs entirely in the leader, we get 31 rows.  But if it runs
in both, we get fewer.  In all cases the Hash Join produces 1890 rows,
and in all cases the Nested Loop probes table_b_idx_target_id for all
of them: "loops=1890".  But somehow the output of the Nested Loop is
not 31 rows (or 31 / 2) when divided over two processes.

As a sanity check, can you please see if the run-only-in-leader case
(max_parallel_workers = 0, so that we see "Workers Launched: 0")
produces the *same* 31 rows as the run-only-in-worker case
(force_parallel_mode = on, so that we see "Single Copy: true")?  That
is, the actual values of those 31 rows, in particular the columns
coming from table_b.

--
Thomas Munro
http://www.enterprisedb.com
Reply | Threaded
Open this post in threaded view
|

Re: BUG #15577: Query returns different results when executed multiple times

Thomas Munro-3
On Tue, Jan 8, 2019 at 12:04 PM Bartosz Polnik <[hidden email]> wrote:
> On Mon, Jan 7, 2019 at 11:23 PM Thomas Munro <[hidden email]> wrote:
>> As a sanity check, can you please see if the run-only-in-leader case
>> (max_parallel_workers = 0, so that we see "Workers Launched: 0")
>> produces the *same* 31 rows as the run-only-in-worker case
>> (force_parallel_mode = on, so that we see "Single Copy: true")?  That
>> is, the actual values of those 31 rows, in particular the columns
>> coming from table_b.
>
> Rows returned by two queries are the same.

Can we please also see the actual output in the broken case, where it
runs in two processes and produces fewer than 31 rows?

--
Thomas Munro
http://www.enterprisedb.com

Reply | Threaded
Open this post in threaded view
|

Re: BUG #15577: Query returns different results when executed multiple times

Bartosz Polnik
Query:

explain (costs false, analyze true)
SELECT ta.id AS table_a_id,
     tc.id as table_c_id,
     tba.id AS table_b_id
FROM test.table_b_active tba
     INNER JOIN test.table_c tc ON tba.target_id = tc.id
     INNER JOIN test.table_d td ON tc.table_d_id = td.id
     LEFT JOIN test.table_a ta ON ta.table_c_id = tc.id AND ta.date = '2018-08-31' :: DATE
WHERE tba.date BETWEEN '2018-08-10' :: DATE AND '2018-09-01' :: DATE
AND td.group = 'A'
AND tc.table_e_id = 4
AND (
  (tba.target_id = tc.id AND tba.group_type = 'A')
  OR tba.source_id = tc.id
);

SELECT ta.id AS table_a_id,
     tc.id as table_c_id,
     tba.id AS table_b_id
FROM test.table_b_active tba
     INNER JOIN test.table_c tc ON tba.target_id = tc.id
     INNER JOIN test.table_d td ON tc.table_d_id = td.id
     LEFT JOIN test.table_a ta ON ta.table_c_id = tc.id AND ta.date = '2018-08-31' :: DATE
WHERE tba.date BETWEEN '2018-08-10' :: DATE AND '2018-09-01' :: DATE
AND td.group = 'A'
AND tc.table_e_id = 4
AND (
  (tba.target_id = tc.id AND tba.group_type = 'A')
  OR tba.source_id = tc.id
);


Output:
                                                                                                             QUERY PLAN                                                        
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Left Join (actual time=236.640..281.711 rows=26 loops=1)
   ->  Gather (actual time=236.578..304.194 rows=26 loops=1)
         Workers Planned: 1
         Workers Launched: 1
         ->  Nested Loop (actual time=154.947..244.097 rows=13 loops=2)
               ->  Hash Join (actual time=0.223..14.649 rows=945 loops=2)
                     Hash Cond: (tc.table_d_id = td.id)
                     ->  Parallel Seq Scan on table_c tc (actual time=0.083..14.102 rows=1751 loops=2)
                           Filter: (table_e_id = 4)
                           Rows Removed by Filter: 49117
                     ->  Hash (actual time=0.044..0.044 rows=8 loops=2)
                           Buckets: 1024  Batches: 1  Memory Usage: 9kB
                           ->  Seq Scan on table_d td (actual time=0.028..0.033 rows=8 loops=2)
                                 Filter: (group = 'A'::test.group)
                                 Rows Removed by Filter: 55
               ->  Index Scan using table_b_idx_target_id on table_b (actual time=0.239..0.242 rows=0 loops=1890)
                     Index Cond: (target_id = tc.id)
                     Filter: ((date >= '2018-08-10'::date) AND (date <= '2018-09-01'::date) AND (((target_id = tc.id) AND (group_type = 'A'::test.group_type)) OR (source_id = tc.id)))
                     Rows Removed by Filter: 26
   ->  Index Scan using table_a_uq_001 on table_a ta (actual time=0.020..0.020 rows=1 loops=26)
         Index Cond: ((table_c_id = tc.id) AND (date = '2018-08-31'::date))
 Planning Time: 5.047 ms
 Execution Time: 304.960 ms


 table_a_id | table_c_id | table_b_id
------------+------------+------------
   16116185 |     328860 |    2936924
   16116256 |     293541 |    2901938
   16115788 |     348539 |    3039173
   16115788 |     348539 |    2913874
   16115788 |     348539 |    2913754
   16114813 |     342353 |    3052371
   16114789 |     292051 |    3038539
   16116069 |     351585 |    3025941
   16115861 |     350487 |    2933633
   16114814 |     331329 |    2946332
   16115106 |     350047 |    2902075
   16116260 |     290583 |    2955483
   16116066 |     351434 |    3010909
   16114811 |     298605 |    2893809
   16114811 |     298605 |    2987038
   16114811 |     298605 |    3038877
   16114811 |     298605 |    3010694
   16114811 |     298605 |    2893188
   16114811 |     298605 |    2893391
   16114811 |     298605 |    2983360
   16114811 |     298605 |    3038221
   16114811 |     298605 |    3026078
   16114811 |     298605 |    2998966
   16116249 |     296708 |    3038888
   16114811 |     298605 |    2998909
   16116241 |     295971 |    3038921
(26 rows)

Here's an example with only 21 rows:

 table_a_id | table_c_id | table_b_id
------------+------------+------------
   16116185 |     328860 |    2936924
   16115788 |     348539 |    3039173
   16115788 |     348539 |    2913874
   16115788 |     348539 |    2913754
   16116256 |     293541 |    2901938
   16116256 |     293541 |    2901933
   16116256 |     293541 |    2997160
   16116069 |     351585 |    3025941
   16114789 |     292051 |    3038539
   16114813 |     342353 |    3052371
   16114814 |     331329 |    2946332
   16115861 |     350487 |    2933633
   16116066 |     351434 |    3010909
   16114811 |     298605 |    2893809
   16115106 |     350047 |    2902075
   16116241 |     295971 |    3038921
   16116260 |     290583 |    2955483
   16116249 |     296708 |    3038888
   16116260 |     290583 |    2921135
   16116260 |     290583 |    2947914
   16116260 |     290583 |    2901669
(21 rows)

On Tue, Jan 8, 2019 at 12:39 AM Thomas Munro <[hidden email]> wrote:
On Tue, Jan 8, 2019 at 12:04 PM Bartosz Polnik <[hidden email]> wrote:
> On Mon, Jan 7, 2019 at 11:23 PM Thomas Munro <[hidden email]> wrote:
>> As a sanity check, can you please see if the run-only-in-leader case
>> (max_parallel_workers = 0, so that we see "Workers Launched: 0")
>> produces the *same* 31 rows as the run-only-in-worker case
>> (force_parallel_mode = on, so that we see "Single Copy: true")?  That
>> is, the actual values of those 31 rows, in particular the columns
>> coming from table_b.
>
> Rows returned by two queries are the same.

Can we please also see the actual output in the broken case, where it
runs in two processes and produces fewer than 31 rows?

--
Thomas Munro
http://www.enterprisedb.com
Reply | Threaded
Open this post in threaded view
|

Re: BUG #15577: Query returns different results when executed multiple times

David Rowley-3
On Tue, 8 Jan 2019 at 13:09, Bartosz Polnik <[hidden email]> wrote:
> (26 rows)
>
> Here's an example with only 21 rows:

Interestingly between those two results, all your 31 distinct rows
exist. I threw them into a table and did:

# select * from t1 full join t2 on t1.a=t2.a and t1.b=t2.b and t1.c=t2.c;
    a     |   b    |    c    |    a     |   b    |    c
----------+--------+---------+----------+--------+---------
 16114789 | 292051 | 3038539 | 16114789 | 292051 | 3038539
 16114811 | 298605 | 2893188 |          |        |
 16114811 | 298605 | 2893391 |          |        |
 16114811 | 298605 | 2893809 | 16114811 | 298605 | 2893809
 16114811 | 298605 | 2983360 |          |        |
 16114811 | 298605 | 2987038 |          |        |
 16114811 | 298605 | 2998909 |          |        |
 16114811 | 298605 | 2998966 |          |        |
 16114811 | 298605 | 3010694 |          |        |
 16114811 | 298605 | 3026078 |          |        |
 16114811 | 298605 | 3038221 |          |        |
 16114811 | 298605 | 3038877 |          |        |
 16114813 | 342353 | 3052371 | 16114813 | 342353 | 3052371
 16114814 | 331329 | 2946332 | 16114814 | 331329 | 2946332
 16115106 | 350047 | 2902075 | 16115106 | 350047 | 2902075
 16115788 | 348539 | 2913754 | 16115788 | 348539 | 2913754
 16115788 | 348539 | 2913874 | 16115788 | 348539 | 2913874
 16115788 | 348539 | 3039173 | 16115788 | 348539 | 3039173
 16115861 | 350487 | 2933633 | 16115861 | 350487 | 2933633
 16116066 | 351434 | 3010909 | 16116066 | 351434 | 3010909
 16116069 | 351585 | 3025941 | 16116069 | 351585 | 3025941
 16116185 | 328860 | 2936924 | 16116185 | 328860 | 2936924
 16116241 | 295971 | 3038921 | 16116241 | 295971 | 3038921
 16116249 | 296708 | 3038888 | 16116249 | 296708 | 3038888
          |        |         | 16116256 | 293541 | 2901933
 16116256 | 293541 | 2901938 | 16116256 | 293541 | 2901938
          |        |         | 16116256 | 293541 | 2997160
          |        |         | 16116260 | 290583 | 2901669
          |        |         | 16116260 | 290583 | 2921135
          |        |         | 16116260 | 290583 | 2947914
 16116260 | 290583 | 2955483 | 16116260 | 290583 | 2955483
(31 rows)

IOW, it does not seem like there's a particlar pattern to what rows are missing.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: BUG #15577: Query returns different results when executed multiple times

Thomas Munro-3
On Tue, Jan 8, 2019 at 1:31 PM David Rowley
<[hidden email]> wrote:

> On Tue, 8 Jan 2019 at 13:09, Bartosz Polnik <[hidden email]> wrote:
> > (26 rows)
> >
> > Here's an example with only 21 rows:
>
> Interestingly between those two results, all your 31 distinct rows
> exist. I threw them into a table and did:
>
> # select * from t1 full join t2 on t1.a=t2.a and t1.b=t2.b and t1.c=t2.c;
>     a     |   b    |    c    |    a     |   b    |    c
> ----------+--------+---------+----------+--------+---------
>  16114789 | 292051 | 3038539 | 16114789 | 292051 | 3038539
>  16114811 | 298605 | 2893188 |          |        |
>  16114811 | 298605 | 2893391 |          |        |
>  16114811 | 298605 | 2893809 | 16114811 | 298605 | 2893809
>  16114811 | 298605 | 2983360 |          |        |
>  16114811 | 298605 | 2987038 |          |        |
>  16114811 | 298605 | 2998909 |          |        |
>  16114811 | 298605 | 2998966 |          |        |
>  16114811 | 298605 | 3010694 |          |        |
>  16114811 | 298605 | 3026078 |          |        |
>  16114811 | 298605 | 3038221 |          |        |
>  16114811 | 298605 | 3038877 |          |        |
>  16114813 | 342353 | 3052371 | 16114813 | 342353 | 3052371
>  16114814 | 331329 | 2946332 | 16114814 | 331329 | 2946332
>  16115106 | 350047 | 2902075 | 16115106 | 350047 | 2902075
>  16115788 | 348539 | 2913754 | 16115788 | 348539 | 2913754
>  16115788 | 348539 | 2913874 | 16115788 | 348539 | 2913874
>  16115788 | 348539 | 3039173 | 16115788 | 348539 | 3039173
>  16115861 | 350487 | 2933633 | 16115861 | 350487 | 2933633
>  16116066 | 351434 | 3010909 | 16116066 | 351434 | 3010909
>  16116069 | 351585 | 3025941 | 16116069 | 351585 | 3025941
>  16116185 | 328860 | 2936924 | 16116185 | 328860 | 2936924
>  16116241 | 295971 | 3038921 | 16116241 | 295971 | 3038921
>  16116249 | 296708 | 3038888 | 16116249 | 296708 | 3038888
>           |        |         | 16116256 | 293541 | 2901933
>  16116256 | 293541 | 2901938 | 16116256 | 293541 | 2901938
>           |        |         | 16116256 | 293541 | 2997160
>           |        |         | 16116260 | 290583 | 2901669
>           |        |         | 16116260 | 290583 | 2921135
>           |        |         | 16116260 | 290583 | 2947914
>  16116260 | 290583 | 2955483 | 16116260 | 290583 | 2955483
> (31 rows)
>
> IOW, it does not seem like there's a particlar pattern to what rows are missing.

Is the pattern something like this?  When the index probe of b should
find multiple matches, it's returning only one and then we move onto
the next outer row?  For example we see this for table_c_id = 298605,
of which we see 11 examples in the 31-row output (with different
table_b_id values), but only one in the 21-row output, with the first
table_b_id value.

--
Thomas Munro
http://www.enterprisedb.com

Reply | Threaded
Open this post in threaded view
|

Re: BUG #15577: Query returns different results when executed multiple times

Thomas Munro-3
On Tue, Jan 8, 2019 at 1:40 PM Thomas Munro
<[hidden email]> wrote:

> On Tue, Jan 8, 2019 at 1:31 PM David Rowley
> <[hidden email]> wrote:
> > On Tue, 8 Jan 2019 at 13:09, Bartosz Polnik <[hidden email]> wrote:
> > > (26 rows)
> > >
> > > Here's an example with only 21 rows:
> >
> > Interestingly between those two results, all your 31 distinct rows
> > exist. I threw them into a table and did:
> >
> > # select * from t1 full join t2 on t1.a=t2.a and t1.b=t2.b and t1.c=t2.c;
> >     a     |   b    |    c    |    a     |   b    |    c
> > ----------+--------+---------+----------+--------+---------
> >  16114789 | 292051 | 3038539 | 16114789 | 292051 | 3038539
> >  16114811 | 298605 | 2893188 |          |        |
> >  16114811 | 298605 | 2893391 |          |        |
> >  16114811 | 298605 | 2893809 | 16114811 | 298605 | 2893809
> >  16114811 | 298605 | 2983360 |          |        |
> >  16114811 | 298605 | 2987038 |          |        |
> >  16114811 | 298605 | 2998909 |          |        |
> >  16114811 | 298605 | 2998966 |          |        |
> >  16114811 | 298605 | 3010694 |          |        |
> >  16114811 | 298605 | 3026078 |          |        |
> >  16114811 | 298605 | 3038221 |          |        |
> >  16114811 | 298605 | 3038877 |          |        |
> >  16114813 | 342353 | 3052371 | 16114813 | 342353 | 3052371
> >  16114814 | 331329 | 2946332 | 16114814 | 331329 | 2946332
> >  16115106 | 350047 | 2902075 | 16115106 | 350047 | 2902075
> >  16115788 | 348539 | 2913754 | 16115788 | 348539 | 2913754
> >  16115788 | 348539 | 2913874 | 16115788 | 348539 | 2913874
> >  16115788 | 348539 | 3039173 | 16115788 | 348539 | 3039173
> >  16115861 | 350487 | 2933633 | 16115861 | 350487 | 2933633
> >  16116066 | 351434 | 3010909 | 16116066 | 351434 | 3010909
> >  16116069 | 351585 | 3025941 | 16116069 | 351585 | 3025941
> >  16116185 | 328860 | 2936924 | 16116185 | 328860 | 2936924
> >  16116241 | 295971 | 3038921 | 16116241 | 295971 | 3038921
> >  16116249 | 296708 | 3038888 | 16116249 | 296708 | 3038888
> >           |        |         | 16116256 | 293541 | 2901933
> >  16116256 | 293541 | 2901938 | 16116256 | 293541 | 2901938
> >           |        |         | 16116256 | 293541 | 2997160
> >           |        |         | 16116260 | 290583 | 2901669
> >           |        |         | 16116260 | 290583 | 2921135
> >           |        |         | 16116260 | 290583 | 2947914
> >  16116260 | 290583 | 2955483 | 16116260 | 290583 | 2955483
> > (31 rows)
> >
> > IOW, it does not seem like there's a particlar pattern to what rows are missing.
>
> Is the pattern something like this?  When the index probe of b should
> find multiple matches, it's returning only one and then we move onto
> the next outer row?  For example we see this for table_c_id = 298605,
> of which we see 11 examples in the 31-row output (with different
> table_b_id values), but only one in the 21-row output, with the first
> table_b_id value.

It's as if it thought it was executing a semi-join.

--
Thomas Munro
http://www.enterprisedb.com

Reply | Threaded
Open this post in threaded view
|

Re: BUG #15577: Query returns different results when executed multiple times

David Rowley-3
On Tue, 8 Jan 2019 at 13:43, Thomas Munro <[hidden email]> wrote:
> It's as if it thought it was executing a semi-join.

EXPLAIN VERBOSE would show the inner_unique flag. If that's set it
would act as a semi-join.


--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: BUG #15577: Query returns different results when executed multiple times

Thomas Munro-3
On Tue, Jan 8, 2019 at 1:48 PM David Rowley
<[hidden email]> wrote:
> On Tue, 8 Jan 2019 at 13:43, Thomas Munro <[hidden email]> wrote:
> > It's as if it thought it was executing a semi-join.
>
> EXPLAIN VERBOSE would show the inner_unique flag. If that's set it
> would act as a semi-join.

Hmm, yes, node->js.single_match == true in one (but not both!) process
would fit these results.  But I don't see a mechanism for that.  I
*guess* it's the worker that is skipping duplicates, because the
leader usually has time to emit a few tuples while the worker's
warming up and we see some duplicates (348539) in first few results
(at least it usually does on my machines for eager plans, though maybe
on Windows it's different?)

Bartosz, can we please try with force_parallel_mode = off, but also
parallel_leader_participation = off?  (I meant to write that in an
earlier email but accidentally wrote "on".  Gah.)  Then we should get
the same plan (the forced parallel plan is a bit different as you
noted; maybe whatever is broken isn't triggered that way).  You might
need to set parallel_workers to 2 on the table for it to pick a
parallel plan without leader participation.

--
Thomas Munro
http://www.enterprisedb.com

Reply | Threaded
Open this post in threaded view
|

Re: BUG #15577: Query returns different results when executed multiple times

Bartosz Polnik
Sure. 

SET force_parallel_mode = off;
SET parallel_leader_participation = off;

alter table_c set (parallel_workers=2);
explain (costs false, analyze true)
  SELECT ta.id AS table_a_id,
         tc.id as table_c_id,
         tba.id AS table_b_id
  FROM test.table_b_active tba
         INNER JOIN test.table_c tc ON tba.target_id = tc.id
         INNER JOIN test.table_d td ON tc.table_d_id = td.id
         LEFT JOIN test.table_a ta ON ta.table_c_id = tc.id AND ta.date = '2018-08-31' :: DATE
  WHERE tba.date BETWEEN '2018-08-10' :: DATE AND '2018-09-01' :: DATE
    AND td.group = 'A'
    AND tc.table_e_id = 4
    AND (
      (tba.target_id = tc.id AND tba.group_type = 'A')
      OR tba.source_id = tc.id
    );

Output:
                                                                                                              QUERY PLAN                                                                    
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Left Join (actual time=203.910..346.758 rows=31 loops=1)
   ->  Gather (actual time=203.802..373.228 rows=31 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Nested Loop (actual time=157.358..253.888 rows=16 loops=2)
               ->  Hash Join (actual time=0.365..15.749 rows=945 loops=2)
                     Hash Cond: (tc.table_d_id = td.id)
                     ->  Parallel Seq Scan on table_c tc (actual time=0.113..15.043 rows=1751 loops=2)
                           Filter: (table_e_id = 4)
                           Rows Removed by Filter: 49117
                     ->  Hash (actual time=0.078..0.078 rows=8 loops=2)
                           Buckets: 1024  Batches: 1  Memory Usage: 9kB
                           ->  Seq Scan on table_d td (actual time=0.058..0.065 rows=8 loops=2)
                                 Filter: (group = 'A'::test.group)
                                 Rows Removed by Filter: 55
               ->  Index Scan using table_b_idx_target_id on table_b (actual time=0.248..0.252 rows=0 loops=1890)
                     Index Cond: (target_id = tc.id)
                     Filter: ((date >= '2018-08-10'::date) AND (date <= '2018-09-01'::date) AND (((target_id = tc.id) AND (group_type = 'A'::test.group_type)) OR (source_id = tc.id)))
                     Rows Removed by Filter: 26
   ->  Index Scan using table_a_uq_001 on table_a ta (actual time=0.030..0.030 rows=1 loops=31)
         Index Cond: ((table_c_id = tc.id) AND (date = '2018-08-31'::date))
 Planning Time: 4.923 ms
 Execution Time: 374.408 ms

31 rows - correct result.


On Tue, Jan 8, 2019 at 2:30 AM Thomas Munro <[hidden email]> wrote:
On Tue, Jan 8, 2019 at 1:48 PM David Rowley
<[hidden email]> wrote:
> On Tue, 8 Jan 2019 at 13:43, Thomas Munro <[hidden email]> wrote:
> > It's as if it thought it was executing a semi-join.
>
> EXPLAIN VERBOSE would show the inner_unique flag. If that's set it
> would act as a semi-join.

Hmm, yes, node->js.single_match == true in one (but not both!) process
would fit these results.  But I don't see a mechanism for that.  I
*guess* it's the worker that is skipping duplicates, because the
leader usually has time to emit a few tuples while the worker's
warming up and we see some duplicates (348539) in first few results
(at least it usually does on my machines for eager plans, though maybe
on Windows it's different?)

Bartosz, can we please try with force_parallel_mode = off, but also
parallel_leader_participation = off?  (I meant to write that in an
earlier email but accidentally wrote "on".  Gah.)  Then we should get
the same plan (the forced parallel plan is a bit different as you
noted; maybe whatever is broken isn't triggered that way).  You might
need to set parallel_workers to 2 on the table for it to pick a
parallel plan without leader participation.

--
Thomas Munro
http://www.enterprisedb.com
Reply | Threaded
Open this post in threaded view
|

Re: BUG #15577: Query returns different results when executed multiple times

Thomas Munro-3
On Tue, Jan 8, 2019 at 10:45 PM Bartosz Polnik <[hidden email]> wrote:
> SET force_parallel_mode = off;
> SET parallel_leader_participation = off;
>
> alter table_c set (parallel_workers=2);
> ...
>
> 31 rows - correct result.

Huh.  So now we have:

1.  Leader only (max_parallel_workers = 0): OK
2.  Worker only (parallel_leader_participation = off, parallel_workers = 2): OK
3.  Leader + worker: broken, short results
4.  Worker only with slightly different plan (force_parallel_mode = on): OK

Ahh, I just reproduced this.  Take mockup_bug15577.sql (Bartosz's
version) and then do this:

insert into table_c select * from table_c;

insert into table_b
select generate_series(1000000, 1000010), 100112, 'A', 99888,
'2018-09-01', true,
'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';

Now the number of results from the query in the script is not stable,
varying between 235 or 246 results for me.  Not sure what's going on
yet.

--
Thomas Munro
http://www.enterprisedb.com

Reply | Threaded
Open this post in threaded view
|

Re: BUG #15577: Query returns different results when executed multiple times

Thomas Munro-3
On Wed, Jan 9, 2019 at 2:52 PM Thomas Munro
<[hidden email]> wrote:
> Now the number of results from the query in the script is not stable,
> varying between 235 or 246 results for me.  Not sure what's going on
> yet.

When there are multiple matches found by the Index Scan, sometimes
only the first one makes it through to the Nested Loop, in the leader.
It's not because of single_match being inappropriately set as I
speculated earlier (though that'd behave similarly I guess) but
because the quals are somehow broken in between invocations so that
ExecScan() filters the rest of the matches out, and then fixed again
on the next rescan.  This happens whenever the leader takes a break to
read tuple from parallel workers in between executing the Nested Loop
node, which it often does after reading the first of several tuples
resulting from a single rescan.

What exactly is being corrupted and how, I don't yet know, and I need
to leave this here for today, but that's what I've got so far.

--
Thomas Munro
http://www.enterprisedb.com

Reply | Threaded
Open this post in threaded view
|

Re: BUG #15577: Query returns different results when executed multiple times

Bartosz Polnik
Thomas, I tried to reproduce this case with changes you listed, but failed to do that - the number of results was the same on each run. Could you post an updated script in an attachment?

On Wed, Jan 9, 2019 at 5:01 AM Thomas Munro <[hidden email]> wrote:
On Wed, Jan 9, 2019 at 2:52 PM Thomas Munro
<[hidden email]> wrote:
> Now the number of results from the query in the script is not stable,
> varying between 235 or 246 results for me.  Not sure what's going on
> yet.

When there are multiple matches found by the Index Scan, sometimes
only the first one makes it through to the Nested Loop, in the leader.
It's not because of single_match being inappropriately set as I
speculated earlier (though that'd behave similarly I guess) but
because the quals are somehow broken in between invocations so that
ExecScan() filters the rest of the matches out, and then fixed again
on the next rescan.  This happens whenever the leader takes a break to
read tuple from parallel workers in between executing the Nested Loop
node, which it often does after reading the first of several tuples
resulting from a single rescan.

What exactly is being corrupted and how, I don't yet know, and I need
to leave this here for today, but that's what I've got so far.

--
Thomas Munro
http://www.enterprisedb.com
12