pgsql: Fix O(N^2) performance issue in pg_publication_tables view.

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

pgsql: Fix O(N^2) performance issue in pg_publication_tables view.

Tom Lane-2
Fix O(N^2) performance issue in pg_publication_tables view.

The original coding of this view relied on a correlated IN sub-query.
Our planner is not very bright about correlated sub-queries, and even
if it were, there's no way for it to know that the output of
pg_get_publication_tables() is duplicate-free, making the de-duplicating
semantics of IN unnecessary.  Hence, rewrite as a LATERAL sub-query.
This provides circa 100X speedup for me with a few hundred published
tables (the whole regression database), and things would degrade as
roughly O(published_relations * all_relations) beyond that.

Because the rules.out expected output changes, force a catversion bump.
Ordinarily we might not want to do that post-beta1; but we already know
we'll be doing a catversion bump before beta2 to fix pg_statistic_ext
issues, so it's pretty much free to fix it now instead of waiting for v13.

Per report and fix suggestion from PegoraroF10.

Discussion: https://postgr.es/m/1551385426763-0.post@...

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/166f69f769c83ef8759d905bf7f1a9aa1d97a340

Modified Files
--------------
src/backend/catalog/system_views.sql | 7 ++++---
src/include/catalog/catversion.h     | 2 +-
src/test/regress/expected/rules.out  | 4 ++--
3 files changed, 7 insertions(+), 6 deletions(-)

Reply | Threaded
Open this post in threaded view
|

Re: pgsql: Fix O(N^2) performance issue in pg_publication_tables view.

fabriziomello

On Wed, May 22, 2019 at 12:47 PM Tom Lane <[hidden email]> wrote:

>
> Fix O(N^2) performance issue in pg_publication_tables view.
>
> The original coding of this view relied on a correlated IN sub-query.
> Our planner is not very bright about correlated sub-queries, and even
> if it were, there's no way for it to know that the output of
> pg_get_publication_tables() is duplicate-free, making the de-duplicating
> semantics of IN unnecessary.  Hence, rewrite as a LATERAL sub-query.
> This provides circa 100X speedup for me with a few hundred published
> tables (the whole regression database), and things would degrade as
> roughly O(published_relations * all_relations) beyond that.
>
> Because the rules.out expected output changes, force a catversion bump.
> Ordinarily we might not want to do that post-beta1; but we already know
> we'll be doing a catversion bump before beta2 to fix pg_statistic_ext
> issues, so it's pretty much free to fix it now instead of waiting for v13.
>
> Per report and fix suggestion from PegoraroF10.
>
> Discussion: https://postgr.es/m/1551385426763-0.post@...
>
> Branch
> ------
> master
>
> Details
> -------
> https://git.postgresql.org/pg/commitdiff/166f69f769c83ef8759d905bf7f1a9aa1d97a340
>
> Modified Files
> --------------
> src/backend/catalog/system_views.sql | 7 ++++---
> src/include/catalog/catversion.h     | 2 +-
> src/test/regress/expected/rules.out  | 4 ++--
> 3 files changed, 7 insertions(+), 6 deletions(-)
>

Just one doubt, why use LATERAL with pg_get_publication_tables SRF instead of JOIN direct to pg_publication_rel?

Regards,

--
   Fabrízio de Royes Mello         Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Blog sobre TI: http://fabriziomello.blogspot.com
>> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
Reply | Threaded
Open this post in threaded view
|

Re: pgsql: Fix O(N^2) performance issue in pg_publication_tables view.

Tom Lane-2
=?UTF-8?Q?Fabr=C3=ADzio_de_Royes_Mello?= <[hidden email]> writes:
> Just one doubt, why use LATERAL with pg_get_publication_tables SRF instead
> of JOIN direct to pg_publication_rel?

Yes, LATERAL is just a noise word in this context, but I think
it's good for documentation purposes.

If you're asking why I used a comma instead of JOIN ... ON TRUE,
it's because I don't find the latter to be particularly readable
or good style.  YMMV.

(I agree it's a little weird that the original coding had a mix
of comma and JOIN syntax, but I did not feel a need to revisit
that.)

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: pgsql: Fix O(N^2) performance issue in pg_publication_tables view.

fabriziomello

On Wed, May 22, 2019 at 3:01 PM Tom Lane <[hidden email]> wrote:

>
> =?UTF-8?Q?Fabr=C3=ADzio_de_Royes_Mello?= <[hidden email]> writes:
> > Just one doubt, why use LATERAL with pg_get_publication_tables SRF instead
> > of JOIN direct to pg_publication_rel?
>
> Yes, LATERAL is just a noise word in this context, but I think
> it's good for documentation purposes.
>
> If you're asking why I used a comma instead of JOIN ... ON TRUE,
> it's because I don't find the latter to be particularly readable
> or good style.  YMMV.
>
> (I agree it's a little weird that the original coding had a mix
> of comma and JOIN syntax, but I did not feel a need to revisit
> that.)
>

I'm not wonder about mix of comma and JOIN syntax, I meant why don't do it?

-    FROM pg_publication P,
-         LATERAL pg_get_publication_tables(P.pubname) GPT,
-         pg_class C JOIN pg_namespace N ON (N.oid = C.relnamespace)
-    WHERE C.oid = GPT.relid;
+    FROM pg_publication P
+         JOIN pg_publication_rel PR ON (PR.prpubid = P.oid)
+         JOIN pg_class C ON (C.oid = PR.prrelid)
+         JOIN pg_namespace N ON (N.oid = C.relnamespace);

There are some case I missed to force us to use pg_get_publication_tables SRF instead of use pg_publication_rel ??

Regards,

--
   Fabrízio de Royes Mello         Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Blog sobre TI: http://fabriziomello.blogspot.com
>> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
Reply | Threaded
Open this post in threaded view
|

Re: pgsql: Fix O(N^2) performance issue in pg_publication_tables view.

Tom Lane-2
=?UTF-8?Q?Fabr=C3=ADzio_de_Royes_Mello?= <[hidden email]> writes:
> There are some case I missed to force us to use pg_get_publication_tables
> SRF instead of use pg_publication_rel ??

pg_publication_rel isn't going to provide the correct expansion of
a FOR ALL TABLES publication, IIUC.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: pgsql: Fix O(N^2) performance issue in pg_publication_tables view.

fabriziomello

On Wed, May 22, 2019 at 4:57 PM Tom Lane <[hidden email]> wrote:
>
> =?UTF-8?Q?Fabr=C3=ADzio_de_Royes_Mello?= <[hidden email]> writes:
> > There are some case I missed to force us to use pg_get_publication_tables
> > SRF instead of use pg_publication_rel ??
>
> pg_publication_rel isn't going to provide the correct expansion of
> a FOR ALL TABLES publication, IIUC.
>

You're correct... there are a condition for it in pg_get_publication_tables and FOR ALL TABLES publications doesn't persist anything in pg_publication_rel.  

497         publication = GetPublicationByName(pubname, false);
498         if (publication->alltables)
499             tables = GetAllTablesPublicationRelations();
500         else
501             tables = GetPublicationRelations(publication->oid);

Thanks for clarifying!

Regards,

--
   Fabrízio de Royes Mello         Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Blog sobre TI: http://fabriziomello.blogspot.com
>> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello