Apparent missed query optimization with self-join and inner grouping

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Apparent missed query optimization with self-join and inner grouping

Zack Weinberg
I have a table recording the results of a web crawl.  (Table
definition at the end of this message.)  The relevant part of the data
stored in it looks like this:

  id  | url_id | full_url_id | experiment_id | redirect_num
------+--------+-------------+---------------+--------------
 2617 |   1312 |        1312 |            16 |            0
 2618 |   1312 |        2311 |            16 |            1
 2619 |   1312 |        2312 |            16 |            2
 2620 |   1312 |        2313 |            16 |            3
 2631 |   1320 |        1320 |            43 |            0
 2633 |   1320 |        2312 |            43 |            2
 2632 |   1320 |        2317 |            43 |            1
 2634 |   1320 |        2318 |            43 |            3

For each (experiment_id, url_id) pair for some small subset of the
experiment_ids, I need to query the full_url_id corresponding to the
*largest* value of redirect_num.  The query planner does something
reasonable with this SELECT:

=> explain (analyze, verbose)
   select b.experiment_id, b.url_id, b.full_url_id
     from blockpage b,
          (select experiment_id, url_id, max(redirect_num) as redirect_num
             from blockpage group by experiment_id, url_id) bm
    where b.experiment_id = bm.experiment_id
      and b.url_id = bm.url_id
      and b.redirect_num = bm.redirect_num
      and bm.experiment_id in (16, 43);

 Nested Loop  (cost=1.14..88505.96 rows=20 width=12) (actual
time=0.041..1.723 rows=501 loops=1)
   Output: b.experiment_id, b.url_id, b.full_url_id
   ->  GroupAggregate  (cost=0.57..15442.73 rows=8543 width=12)
(actual time=0.033..0.501 rows=501 loops=1)
         Output: blockpage.experiment_id, blockpage.url_id,
max(blockpage.redirect_num)
         Group Key: blockpage.experiment_id, blockpage.url_id
         ->  Index Only Scan using
blockpage_experiment_id_url_id_redirect_num_blockpage_reason__ on
iclab.blockpage  (cost=0.57..15293.19 rows=8547 width=12) (actual
time=0.026..0.283 rows=803 loops=1)
               Output: blockpage.experiment_id, blockpage.url_id,
blockpage.full_url_id, blockpage.redirect_num, blockpage.html_tag_id
               Index Cond: (blockpage.experiment_id = ANY
('{16,43}'::integer[]))
               Heap Fetches: 803
   ->  Index Only Scan using
blockpage_experiment_id_url_id_redirect_num_blockpage_reason__ on
iclab.blockpage b  (cost=0.57..8.53 rows=1 width=16) (actual
time=0.002..0.002 rows=1 loops=501)
         Output: b.experiment_id, b.url_id, b.full_url_id,
b.redirect_num, b.html_tag_id
         Index Cond: ((b.experiment_id = blockpage.experiment_id) AND
(b.url_id = blockpage.url_id) AND (b.redirect_num =
(max(blockpage.redirect_num))))
         Heap Fetches: 501
 Planning Time: 0.331 ms
 Execution Time: 1.784 ms


But if I change the final part of the WHERE to reference
b.experiment_id instead of bm.experiment_id, I get this much more
expensive query plan:

=> explain (analyze, verbose)
   select b.experiment_id, b.url_id, b.full_url_id
     from blockpage b,
          (select experiment_id, url_id, max(redirect_num) as redirect_num
             from blockpage group by experiment_id, url_id) bm
    where b.experiment_id = bm.experiment_id
      and b.url_id = bm.url_id
      and b.redirect_num = bm.redirect_num
      and b.experiment_id in (16, 43);

 Hash Join  (cost=2749504.19..2764864.13 rows=2 width=12) (actual
time=144028.343..144029.545 rows=501 loops=1)
   Output: b.experiment_id, b.url_id, b.full_url_id
   Inner Unique: true
   Hash Cond: ((b.experiment_id = blockpage.experiment_id) AND
(b.url_id = blockpage.url_id) AND (b.redirect_num =
(max(blockpage.redirect_num))))
   ->  Index Only Scan using
blockpage_experiment_id_url_id_redirect_num_blockpage_reason__ on
iclab.blockpage b  (cost=0.57..15293.19 rows=8547 width=16) (actual
time=0.039..0.387 rows=803 loops=1)
         Output: b.experiment_id, b.url_id, b.full_url_id,
b.redirect_num, b.html_tag_id
         Index Cond: (b.experiment_id = ANY ('{16,43}'::integer[]))
         Heap Fetches: 803
   ->  Hash  (cost=2595219.62..2595219.62 rows=8816229 width=12)
(actual time=143941.931..143941.931 rows=57061228 loops=1)
         Output: blockpage.experiment_id, blockpage.url_id,
(max(blockpage.redirect_num))
         Buckets: 67108864 (originally 16777216)  Batches: 1
(originally 1)  Memory Usage: 2976138kB
         ->  HashAggregate  (cost=2418895.04..2507057.33 rows=8816229
width=12) (actual time=90020.851..122656.924 rows=57061228 loops=1)
               Output: blockpage.experiment_id, blockpage.url_id,
max(blockpage.redirect_num)
               Group Key: blockpage.experiment_id, blockpage.url_id
               ->  Seq Scan on iclab.blockpage  (cost=0.00..1757677.88
rows=88162288 width=12) (actual time=0.020..32910.709 rows=88164599
loops=1)
                     Output: blockpage.id, blockpage.url_id,
blockpage.full_url_id, blockpage.experiment_id,
blockpage.blockpage_reason_id, blockpage.html_tag_id,
blockpage.body_len, blockpage.blockpage_diff,
blockpage.has_blockpage_regex, blockpage.http_status,
blockpage.redirect_num, blockpage.response_failure
 Planning Time: 0.409 ms
 JIT:
   Functions: 17
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 2.098 ms, Inlining 27.226 ms, Optimization
140.416 ms, Emission 79.840 ms, Total 249.580 ms
 Execution Time: 145108.471 ms

Since we have `b.experiment_id = bm.experiment_id` as a join
condition, it seems to me that the query planner ought to have
recognized that the `experiment_id in (16, 43)` condition could be
pushed down into the sub-select in both cases.

What is the best way to report this to the developers?  Should I file
a bug report?  I'm using Postgres 12.2.

Thanks,
zw

CREATE TABLE iclab.blockpage (
    id                          BIGSERIAL   PRIMARY KEY,
    url_id                      INTEGER     NOT NULL,
    full_url_id                 INTEGER     NOT NULL,
    experiment_id               INTEGER     NOT NULL,
    blockpage_reason_id         INTEGER,
    html_tag_id                 INTEGER     NOT NULL,
    body_len                    INTEGER,
    blockpage_diff              REAL,
    has_blockpage_regex         BOOLEAN,
    http_status                 INTEGER,
    redirect_num                INTEGER     NOT NULL,
    response_failure            INTEGER,
    FOREIGN KEY (experiment_id)         REFERENCES iclab.experiment_results(id),
    FOREIGN KEY(url_id)                 REFERENCES iclab.URL(id),
    FOREIGN KEY(url_id)                 REFERENCES iclab.URL(id),
    FOREIGN KEY(blockpage_reason_id)    REFERENCES iclab.blockpage_reason(id),
    FOREIGN KEY(html_tag_id)            REFERENCES iclab.html_tag(id)
);

create unique index
blockpage__experiment_id_url_id_redirect_num_blockpage_reason
    on iclab.blockpage(experiment_id , url_id, full_url_id,
redirect_num, html_tag_id);


Reply | Threaded
Open this post in threaded view
|

Re: Apparent missed query optimization with self-join and inner grouping

Michael Lewis
Subqueries are evaluated separately when they include an aggregate or window function as I understand it. I agree that it would be helpful in your case if that outside qual was pushed inside.

I assume this query is generated by an ORM and you don't just have the option to simply change it?
Reply | Threaded
Open this post in threaded view
|

Re: Apparent missed query optimization with self-join and inner grouping

Tom Lane-2
In reply to this post by Zack Weinberg
Zack Weinberg <[hidden email]> writes:
> For each (experiment_id, url_id) pair for some small subset of the
> experiment_ids, I need to query the full_url_id corresponding to the
> *largest* value of redirect_num.  The query planner does something
> reasonable with this SELECT:

> => explain (analyze, verbose)
>    select b.experiment_id, b.url_id, b.full_url_id
>      from blockpage b,
>           (select experiment_id, url_id, max(redirect_num) as redirect_num
>              from blockpage group by experiment_id, url_id) bm
>     where b.experiment_id = bm.experiment_id
>       and b.url_id = bm.url_id
>       and b.redirect_num = bm.redirect_num
>       and bm.experiment_id in (16, 43);

With that query, the "bm.experiment_id in (16, 43)" restriction is
pushed into the "bm" sub-select, greatly reducing the amount of work
the GROUP BY step has to do.

> But if I change the final part of the WHERE to reference
> b.experiment_id instead of bm.experiment_id, I get this much more
> expensive query plan:

> => explain (analyze, verbose)
>    select b.experiment_id, b.url_id, b.full_url_id
>      from blockpage b,
>           (select experiment_id, url_id, max(redirect_num) as redirect_num
>              from blockpage group by experiment_id, url_id) bm
>     where b.experiment_id = bm.experiment_id
>       and b.url_id = bm.url_id
>       and b.redirect_num = bm.redirect_num
>       and b.experiment_id in (16, 43);

With that query, the GROUP BY is evaluated in full, and it costs you.

In principle, given the nearby "where b.experiment_id = bm.experiment_id"
clause, we could derive "bm.experiment_id in (16, 43)" from the stated
clause.  But we don't.  The existing machinery around derivation of
implied equalities only works for simple equalities, not OR clauses.
Extending that would be a bit of a research project, and it's far from
clear that the benefits would be worth the additional planning costs.

> What is the best way to report this to the developers?  Should I file
> a bug report?  I'm using Postgres 12.2.

This is not a bug, and you should not hold your breath waiting
for it to change.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

How can you find out what point logical replication is at?

John Ashmead
I have logical replication setup from a factory in Zhuhai China to a data warehouse in New Jersey. We are using postgresql 10.13 on both sides, on Redhat Linux 7.6.

The logical replication has been in “catchup” mode for several days now, stuck at a specific LSN (9EF/89ADF7E0). The slave side seems to be chugging along, generating lots of WAL files — but not actually getting any new records in. 

The logical replication is being used to move some fairly large files:  averaging about 1 MB but with a max up to about 250 MB. (I think I want to fix that longer term, but that’s not a quick fix.)

My working hypothesis is that we are stuck on a long transaction: that we can’t get some abnormally large blob over before we drop the line.  In this case fixing the connection should fix the problem.

Is there a way I can see what is going on?  The slave is working hard, but what is it working on?

And are there any suggestions on how to handle this?

I could restart the logical replication with “copy_data = false”, then fill in the holes by hand.  But I would rather not!

Thanks in advance!

John

PS. Output of pg_stat_replication & pg_stat_subscription on master & slave respectively.  (Some proprietary information X’d out)

select * from pg_stat_replication;
-[ RECORD 1 ]----+------------------------------
pid              | 42451
usesysid         | 10
usename          | postgres
application_name | china_to_nj_sub
client_addr      | XXX.XXX.XXX.XXX
client_hostname  | 
client_port      | 54300
backend_start    | 2020-08-03 09:07:07.257454-04
backend_xmin     | 16574498
state            | catchup
sent_lsn         | 9EF/89ADF7E0
write_lsn        | 9EF/89ADF7E0
flush_lsn        | 9EF/89ADF7E0
replay_lsn       | 9EF/89ADF7E0
write_lag        | 
flush_lag        | 
replay_lag       | 
sync_priority    | 0
sync_state       | async

 select * from pg_stat_subscription;
-[ RECORD 1 ]---------+------------------------------
subid                 | 7222699
subname               | china_to_nj_sub
pid                   | 14764
relid                 | 
received_lsn          | 9EF/89ADF7E0
last_msg_send_time    | 2020-08-03 10:15:48.644575-04
last_msg_receipt_time | 2020-08-03 10:14:57.247993-04
latest_end_lsn        | 9EF/89ADF7E0
latest_end_time       | 2020-08-03 09:30:57.974223-04


John Ashmead
139 Montrose Avenue
Rosemont, PA, 19010-1508
(610) 527 9560 
mobile (610) 247 2323







smime.p7s (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: How can you find out what point logical replication is at? -- or weird, slow, infinite loop

John Ashmead
This has gotten a bit weirder.

The replication is getting up to a specific LSN.  

Then it moves the sent_lsn to a number less than that, for a WAL file about an hour or more earlier and slowly walks forward again till it hit the same highwater mark.

So the replication seems to be stuck in some kind of a slow infinite loop.

Corrupted WAL file?  Any other ideas of what to look for?

TIA,

John

On Aug 3, 2020, at 10:38 AM, John Ashmead <[hidden email]> wrote:

I have logical replication setup from a factory in Zhuhai China to a data warehouse in New Jersey. We are using postgresql 10.13 on both sides, on Redhat Linux 7.6.

The logical replication has been in “catchup” mode for several days now, stuck at a specific LSN (9EF/89ADF7E0). The slave side seems to be chugging along, generating lots of WAL files — but not actually getting any new records in. 

The logical replication is being used to move some fairly large files:  averaging about 1 MB but with a max up to about 250 MB. (I think I want to fix that longer term, but that’s not a quick fix.)

My working hypothesis is that we are stuck on a long transaction: that we can’t get some abnormally large blob over before we drop the line.  In this case fixing the connection should fix the problem.

Is there a way I can see what is going on?  The slave is working hard, but what is it working on?

And are there any suggestions on how to handle this?

I could restart the logical replication with “copy_data = false”, then fill in the holes by hand.  But I would rather not!

Thanks in advance!

John

PS. Output of pg_stat_replication & pg_stat_subscription on master & slave respectively.  (Some proprietary information X’d out)

select * from pg_stat_replication;
-[ RECORD 1 ]----+------------------------------
pid              | 42451
usesysid         | 10
usename          | postgres
application_name | china_to_nj_sub
client_addr      | XXX.XXX.XXX.XXX
client_hostname  | 
client_port      | 54300
backend_start    | 2020-08-03 09:07:07.257454-04
backend_xmin     | 16574498
state            | catchup
sent_lsn         | 9EF/89ADF7E0
write_lsn        | 9EF/89ADF7E0
flush_lsn        | 9EF/89ADF7E0
replay_lsn       | 9EF/89ADF7E0
write_lag        | 
flush_lag        | 
replay_lag       | 
sync_priority    | 0
sync_state       | async

 select * from pg_stat_subscription;
-[ RECORD 1 ]---------+------------------------------
subid                 | 7222699
subname               | china_to_nj_sub
pid                   | 14764
relid                 | 
received_lsn          | 9EF/89ADF7E0
last_msg_send_time    | 2020-08-03 10:15:48.644575-04
last_msg_receipt_time | 2020-08-03 10:14:57.247993-04
latest_end_lsn        | 9EF/89ADF7E0
latest_end_time       | 2020-08-03 09:30:57.974223-04


John Ashmead
139 Montrose Avenue
Rosemont, PA, 19010-1508
(610) 527 9560 
mobile (610) 247 2323







John Ashmead
139 Montrose Avenue
Rosemont, PA, 19010-1508
(610) 527 9560 
mobile (610) 247 2323







smime.p7s (2K) Download Attachment