Early WIP/PoC for inlining CTEs

classic Classic list List threaded Threaded
102 messages Options
1234 ... 6
Reply | Threaded
Open this post in threaded view
|

Early WIP/PoC for inlining CTEs

Andrew Gierth
About a year ago I was briefly in discussion/collaboration with Adam Sah
regarding the topic of inlining CTEs into the query rather than treating
them as optimization barriers. We didn't take it very far (he sent me
some stuff, I wrote some stuff and sent it back, things kind of got
dropped at that point); but there's been some recent discussion of this
and some people have expressed an interest in seeing the code.

So I'm posting the parts that I wrote for the benefit of anyone wanting
to pick up the issue again. The assumption of this code is that some
form of syntax would exist to mark materialized CTEs and set the
"ctematerialized" flag.

I haven't rebased this or tested it since last year; this patch is
against b81eba6a65.

Posted for discussion, further development, criticism, whatever; feel
free to include this (with credit) in any relevant patch. Consider this
released under the PG license.

--
Andrew (irc:RhodiumToad)


diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index c1a83ca909..393c673164 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2474,6 +2474,7 @@ _copyCommonTableExpr(const CommonTableExpr *from)
  COPY_NODE_FIELD(ctequery);
  COPY_LOCATION_FIELD(location);
  COPY_SCALAR_FIELD(cterecursive);
+ COPY_SCALAR_FIELD(ctematerialized);
  COPY_SCALAR_FIELD(cterefcount);
  COPY_NODE_FIELD(ctecolnames);
  COPY_NODE_FIELD(ctecoltypes);
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 7a700018e7..2112560871 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -2781,6 +2781,7 @@ _equalCommonTableExpr(const CommonTableExpr *a, const CommonTableExpr *b)
  COMPARE_NODE_FIELD(ctequery);
  COMPARE_LOCATION_FIELD(location);
  COMPARE_SCALAR_FIELD(cterecursive);
+ COMPARE_SCALAR_FIELD(ctematerialized);
  COMPARE_SCALAR_FIELD(cterefcount);
  COMPARE_NODE_FIELD(ctecolnames);
  COMPARE_NODE_FIELD(ctecoltypes);
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 43d62062bc..2df4e6300a 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -3010,6 +3010,7 @@ _outCommonTableExpr(StringInfo str, const CommonTableExpr *node)
  WRITE_NODE_FIELD(ctequery);
  WRITE_LOCATION_FIELD(location);
  WRITE_BOOL_FIELD(cterecursive);
+ WRITE_BOOL_FIELD(ctematerialized);
  WRITE_INT_FIELD(cterefcount);
  WRITE_NODE_FIELD(ctecolnames);
  WRITE_NODE_FIELD(ctecoltypes);
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index ccb6a1f4ac..4705ea19d7 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -404,6 +404,7 @@ _readCommonTableExpr(void)
  READ_NODE_FIELD(ctequery);
  READ_LOCATION_FIELD(location);
  READ_BOOL_FIELD(cterecursive);
+ READ_BOOL_FIELD(ctematerialized);
  READ_INT_FIELD(cterefcount);
  READ_NODE_FIELD(ctecolnames);
  READ_NODE_FIELD(ctecoltypes);
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 1103984779..63e5828ef9 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -1129,6 +1129,141 @@ hash_ok_operator(OpExpr *expr)
 }
 
 
+struct inline_cte_walker_ctx
+{
+ const char *ctename;
+ int levelsup;
+ int refcount;
+ Query *ctequery;
+ CommonTableExpr *cte;
+};
+
+static bool inline_cte_walker(Node *node, void *ctxp)
+{
+ struct inline_cte_walker_ctx *ctx = ctxp;
+
+ if (!node)
+ return false;
+
+ if (IsA(node, Query))
+ {
+ /*
+ * This one is a bit tricky. It's our job to handle the recursion here,
+ * but we do some of the lifting normally handled by query_tree_walker
+ * in order to get the sequence of operations right.
+ *
+ * First, if the Query we're looking at is the one containing our CTE
+ * definition, then we don't need to recurse into our own CTE or CTEs
+ * that are earlier in the list than ours (since cteList has been
+ * sorted for us into dependency order). We could check whether a
+ * nested query is hiding ours, but that seems too much of an edge case
+ * to be worth optimizing (the levelsup check will ensure we don't
+ * replace any CTEs improperly). So we scan the cteList ourselves
+ * rather than having query_tree_walker do it.
+ *
+ * Second, we want to walk the rangetable _before_ replacing any
+ * RTE_CTE nodes, in order to avoid re-walking the subquery we just
+ * inlined. (range_table_walker, if told to visit the RTE nodes at all,
+ * visits them before their content.) So we have range_table_walker
+ * ignore the RTE nodes themselves and only walk their contents.
+ *
+ * Third, we scan the rangetable for RTE_CTE nodes to replace.
+ *
+ * Fourth, we use query_tree_walker to find and walk the rest of the
+ * query, telling it to ignore the rangetable and CTEs.
+ *
+ * Note that ctx->levelsup is -1 on entry the first time, since we need
+ * the incremented value to be 0 when scanning the content of the query
+ * containing the definition.
+ */
+ Query *query = castNode(Query, node);
+ ListCell *lc;
+ bool do_replace = ctx->levelsup >= 0;
+
+ ctx->levelsup++;
+
+ foreach (lc, query->cteList)
+ {
+ CommonTableExpr *cte = lfirst_node(CommonTableExpr, lc);
+
+ if (!do_replace && strcmp(cte->ctename, ctx->ctename) == 0)
+ do_replace = true;
+ else if (do_replace)
+ inline_cte_walker(cte->ctequery, ctxp);
+ }
+
+ range_table_walker(query->rtable, inline_cte_walker, ctxp, 0);
+
+ foreach (lc, query->rtable)
+ {
+ RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc);
+
+ if (rte->rtekind == RTE_CTE &&
+ strcmp(rte->ctename, ctx->ctename) == 0 &&
+ rte->ctelevelsup == ctx->levelsup)
+ {
+ Query *newquery = ctx->ctequery;
+
+ /*
+ * We need to do some work here that view rewrite does not, and
+ * in turn we do not do some work that view rewrite does.
+ *
+ * Firstly, views can't have outer references but CTEs can
+ * (especially in the case of CTEs referencing other CTEs), so
+ * we need to fix up all levelsup attributes inside the CTE
+ * query.
+ *
+ * Secondly, views (and explicit subqueries) currently have
+ * different behaviour w.r.t. SELECT FOR UPDATE than CTEs do. A
+ * FOR UPDATE clause is treated as extending into views and
+ * subqueries, but not into CTEs. We preserve this distinction
+ * by not trying to push rowmarks into the new subquery.
+ *
+ * We avoid copyObject if possible because subquery processing
+ * copies the query too.
+ */
+ if (ctx->levelsup > 0)
+ {
+ newquery = copyObject(newquery);
+ IncrementVarSublevelsUp((Node *) newquery, ctx->levelsup, 1);
+ }
+
+ /*
+ * Here's where we do the actual substitution.
+ */
+ rte->rtekind = RTE_SUBQUERY;
+ rte->subquery = newquery;
+ rte->security_barrier = false;
+
+ ctx->refcount--;
+ }
+ }
+
+ query_tree_walker(query, inline_cte_walker, ctxp,
+  QTW_IGNORE_RANGE_TABLE | QTW_IGNORE_CTE_SUBQUERIES);
+
+ ctx->levelsup--;
+
+ return false;
+ }
+
+ return expression_tree_walker(node, inline_cte_walker, ctxp);
+}
+
+static void inline_cte(PlannerInfo *root, CommonTableExpr *cte)
+{
+ struct inline_cte_walker_ctx ctx;
+ ctx.ctequery = castNode(Query, cte->ctequery);
+ ctx.ctename = cte->ctename;
+ ctx.refcount = cte->cterefcount;
+ ctx.levelsup = -1;
+ ctx.cte = cte;
+ inline_cte_walker((Node *) root->parse, &ctx);
+ /* we must replace all references */
+ Assert(ctx.refcount == 0);
+}
+
+
 /*
  * SS_process_ctes: process a query's WITH list
  *
@@ -1167,6 +1302,23 @@ SS_process_ctes(PlannerInfo *root)
  }
 
  /*
+ * Consider inlining the CTE rather than planning it separately.
+ *
+ * XXX this likely needs some additional checks.
+ */
+ if (cmdType == CMD_SELECT &&
+ !cte->ctematerialized &&
+ !cte->cterecursive &&
+ (castNode(Query, cte->ctequery)->rowMarks == NIL))
+ {
+ inline_cte(root, cte);
+
+ /* Make a dummy entry in cte_plan_ids */
+ root->cte_plan_ids = lappend_int(root->cte_plan_ids, -1);
+ continue;
+ }
+
+ /*
  * Copy the source Query node.  Probably not necessary, but let's keep
  * this similar to make_subplan.
  */
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 732e5d6788..901a3a295f 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -1372,6 +1372,7 @@ typedef struct CommonTableExpr
  int location; /* token location, or -1 if unknown */
  /* These fields are set during parse analysis: */
  bool cterecursive; /* is this CTE actually recursive? */
+ bool ctematerialized; /* is this an optimization fence? */
  int cterefcount; /* number of RTEs referencing this CTE
  * (excluding internal self-references) */
  List   *ctecolnames; /* list of output column names */
Reply | Threaded
Open this post in threaded view
|

Re: Early WIP/PoC for inlining CTEs

jfinzel
On Tue, Jul 24, 2018 at 5:28 PM Andrew Gierth <[hidden email]> wrote:
About a year ago I was briefly in discussion/collaboration with Adam Sah
regarding the topic of inlining CTEs into the query rather than treating
them as optimization barriers. We didn't take it very far (he sent me
some stuff, I wrote some stuff and sent it back, things kind of got
dropped at that point); but there's been some recent discussion of this
and some people have expressed an interest in seeing the code.

So I'm posting the parts that I wrote for the benefit of anyone wanting
to pick up the issue again. The assumption of this code is that some
form of syntax would exist to mark materialized CTEs and set the
"ctematerialized" flag.

I haven't rebased this or tested it since last year; this patch is
against b81eba6a65.

Posted for discussion, further development, criticism, whatever; feel
free to include this (with credit) in any relevant patch. Consider this
released under the PG license.

--
Andrew (irc:RhodiumToad)

In our environment we often want this to be a fence.  For example it can be used to only have smaller numbers of joins in each cte and not hit the join collapse limit, or when we really know more about the subquery than the optimizer and have something really specific there .  So in general I would not want the default functionality to change all of the queries we have already written with this in mind. I do however like the idea of this feature being an option, but I would question whether it perhaps worked the other way around where you have to mark a CTE as not being a fence.

Curious what other RDBMSs do here?

Thanks,
Jeremy 

Reply | Threaded
Open this post in threaded view
|

Re: Early WIP/PoC for inlining CTEs

Andres Freund
On 2018-07-24 18:03:43 -0500, Jeremy Finzel wrote:

> On Tue, Jul 24, 2018 at 5:28 PM Andrew Gierth <[hidden email]>
> wrote:
>
> > About a year ago I was briefly in discussion/collaboration with Adam Sah
> > regarding the topic of inlining CTEs into the query rather than treating
> > them as optimization barriers. We didn't take it very far (he sent me
> > some stuff, I wrote some stuff and sent it back, things kind of got
> > dropped at that point); but there's been some recent discussion of this
> > and some people have expressed an interest in seeing the code.
> >
> > So I'm posting the parts that I wrote for the benefit of anyone wanting
> > to pick up the issue again. The assumption of this code is that some
> > form of syntax would exist to mark materialized CTEs and set the
> > "ctematerialized" flag.
> >
> > I haven't rebased this or tested it since last year; this patch is
> > against b81eba6a65.
> >
> > Posted for discussion, further development, criticism, whatever; feel
> > free to include this (with credit) in any relevant patch. Consider this
> > released under the PG license.
> >
> > --
> > Andrew (irc:RhodiumToad)
> >
> > In our environment we often want this to be a fence.  For example it can
> be used to only have smaller numbers of joins in each cte and not hit the
> join collapse limit, or when we really know more about the subquery than
> the optimizer and have something really specific there .  So in general I
> would not want the default functionality to change all of the queries we
> have already written with this in mind. I do however like the idea of this
> feature being an option, but I would question whether it perhaps worked the
> other way around where you have to mark a CTE as not being a fence.

This essentially has been discussed already:
http://archives.postgresql.org/message-id/5351711493487900%40web53g.yandex.ru

My read of the concensus (in which I am in the majority, so I might be
biased) is that we do want inlining to be the default. We were thinking
that it'd be necessary to provide a way to force inlining on the SQL
level for individual CTEs.


> Curious what other RDBMSs do here?

They largely inline by default.

Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: Early WIP/PoC for inlining CTEs

Andrew Gierth
>>>>> "Andres" == Andres Freund <[hidden email]> writes:

 Andres> My read of the concensus (in which I am in the majority, so I
 Andres> might be biased) is that we do want inlining to be the default.
 Andres> We were thinking that it'd be necessary to provide a way to
 Andres> force inlining on the SQL level for individual CTEs.

For the record, here is where it came up in the original discussion
exactly 10 years ago when the feature was being added:

https://www.postgresql.org/message-id/flat/87myk1rg4z.fsf%40news-spur.riddles.org.uk#35efe8ef370518116c38ddc550632aa0

--
Andrew (irc:RhodiumToad)

Reply | Threaded
Open this post in threaded view
|

Re: Early WIP/PoC for inlining CTEs

Tom Lane-2
In reply to this post by Andres Freund
Andres Freund <[hidden email]> writes:
> On 2018-07-24 18:03:43 -0500, Jeremy Finzel wrote:
>> On Tue, Jul 24, 2018 at 5:28 PM Andrew Gierth <[hidden email]>
>> wrote:
>>> Posted for discussion, further development, criticism, whatever; feel
>>> free to include this (with credit) in any relevant patch. Consider this
>>> released under the PG license.

>> In our environment we often want this to be a fence.  For example it can
>> be used to only have smaller numbers of joins in each cte and not hit the
>> join collapse limit, or when we really know more about the subquery than
>> the optimizer and have something really specific there .  So in general I
>> would not want the default functionality to change all of the queries we
>> have already written with this in mind. I do however like the idea of this
>> feature being an option, but I would question whether it perhaps worked the
>> other way around where you have to mark a CTE as not being a fence.

> This essentially has been discussed already:
> http://archives.postgresql.org/message-id/5351711493487900%40web53g.yandex.ru
> My read of the concensus (in which I am in the majority, so I might be
> biased) is that we do want inlining to be the default. We were thinking
> that it'd be necessary to provide a way to force inlining on the SQL
> level for individual CTEs.

We can't inline wCTEs (those containing insert/update/delete) without
risk of semantics change.  I'd also not favor changing the semantics
for CTEs that are read more than once by the parent query.  However,
a singly-referenced SELECT CTE could reasonably be treated as equivalent
to a sub-select-in-FROM, and then you would have the same mechanisms
for preventing inlining as you do for those cases, e.g. OFFSET 0.
And sticking in OFFSET 0 would be backwards-compatible too: your
code would still work the same in older releases, unlike if we invent
new syntax for this.

So if we're going to go in this direction, that's pretty much how
I'd envision it working.

>> Curious what other RDBMSs do here?

> They largely inline by default.

Even for multi-referenced CTEs?

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: Early WIP/PoC for inlining CTEs

Gavin Flower-2
In reply to this post by Andres Freund
On 25/07/18 11:10, Andres Freund wrote:
> On 2018-07-24 18:03:43 -0500, Jeremy Finzel wrote:
>> On Tue, Jul 24, 2018 at 5:28 PM Andrew Gierth <[hidden email]>
>> wrote:
>>
[...]
>>> In our environment we often want this to be a fence.  For example it can
[...]

> This essentially has been discussed already:
> http://archives.postgresql.org/message-id/5351711493487900%40web53g.yandex.ru
>
> My read of the concensus (in which I am in the majority, so I might be
> biased) is that we do want inlining to be the default. We were thinking
> that it'd be necessary to provide a way to force inlining on the SQL
> level for individual CTEs.
>
>
>> Curious what other RDBMSs do here?
> They largely inline by default.
>
> Greetings,
>
> Andres Freund
>
If I'd not read anything about CTE's being a fence, I would have
implicitly assumed that they were optimised together with the main part
of the SQL statement, and I suspect that is the case for most people.

So I'm very much a favour of optimisation of CTE's being the default.


Cheers,
Gavin


Reply | Threaded
Open this post in threaded view
|

Re: Early WIP/PoC for inlining CTEs

Andres Freund
In reply to this post by Tom Lane-2
Hi,

On 2018-07-24 19:49:19 -0400, Tom Lane wrote:

> Andres Freund <[hidden email]> writes:
> > On 2018-07-24 18:03:43 -0500, Jeremy Finzel wrote:
> >> On Tue, Jul 24, 2018 at 5:28 PM Andrew Gierth <[hidden email]>
> >> wrote:
> >>> Posted for discussion, further development, criticism, whatever; feel
> >>> free to include this (with credit) in any relevant patch. Consider this
> >>> released under the PG license.
>
> >> In our environment we often want this to be a fence.  For example it can
> >> be used to only have smaller numbers of joins in each cte and not hit the
> >> join collapse limit, or when we really know more about the subquery than
> >> the optimizer and have something really specific there .  So in general I
> >> would not want the default functionality to change all of the queries we
> >> have already written with this in mind. I do however like the idea of this
> >> feature being an option, but I would question whether it perhaps worked the
> >> other way around where you have to mark a CTE as not being a fence.
>
> > This essentially has been discussed already:
> > http://archives.postgresql.org/message-id/5351711493487900%40web53g.yandex.ru
> > My read of the concensus (in which I am in the majority, so I might be
> > biased) is that we do want inlining to be the default. We were thinking
> > that it'd be necessary to provide a way to force inlining on the SQL
> > level for individual CTEs.
>
> We can't inline wCTEs (those containing insert/update/delete) without
> risk of semantics change.

Right.


> I'd also not favor changing the semantics for CTEs that are read more
> than once by the parent query.

I think medium term it'd be good to do a cost based analysis for that. I
think it's fine to not do that immediately, but we should imo keep that
in mind.


> However, a singly-referenced SELECT CTE could reasonably be treated as
> equivalent to a sub-select-in-FROM, and then you would have the same
> mechanisms for preventing inlining as you do for those cases,
> e.g. OFFSET 0.  And sticking in OFFSET 0 would be backwards-compatible
> too: your code would still work the same in older releases, unlike if
> we invent new syntax for this.

I still think this is just doubling down on prior mistakes.


> >> Curious what other RDBMSs do here?
>
> > They largely inline by default.
>
> Even for multi-referenced CTEs?

I don't know.

Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: Early WIP/PoC for inlining CTEs

Tom Lane-2
Andres Freund <[hidden email]> writes:
> On 2018-07-24 19:49:19 -0400, Tom Lane wrote:
>> However, a singly-referenced SELECT CTE could reasonably be treated as
>> equivalent to a sub-select-in-FROM, and then you would have the same
>> mechanisms for preventing inlining as you do for those cases,
>> e.g. OFFSET 0.  And sticking in OFFSET 0 would be backwards-compatible
>> too: your code would still work the same in older releases, unlike if
>> we invent new syntax for this.

> I still think this is just doubling down on prior mistakes.

Not following what you think a better alternative is?  I'd be the
first to agree that OFFSET 0 is a hack, but people are used to it.

Assuming that we go for inline-by-default for at least some cases,
there's a separate discussion to be had about whether it's worth
making a planner-control GUC to force the old behavior.  I'm not
very excited about that, but I bet some people will be.

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: Early WIP/PoC for inlining CTEs

Andres Freund
Hi,

On 2018-07-24 19:57:49 -0400, Tom Lane wrote:

> Andres Freund <[hidden email]> writes:
> > On 2018-07-24 19:49:19 -0400, Tom Lane wrote:
> >> However, a singly-referenced SELECT CTE could reasonably be treated as
> >> equivalent to a sub-select-in-FROM, and then you would have the same
> >> mechanisms for preventing inlining as you do for those cases,
> >> e.g. OFFSET 0.  And sticking in OFFSET 0 would be backwards-compatible
> >> too: your code would still work the same in older releases, unlike if
> >> we invent new syntax for this.
>
> > I still think this is just doubling down on prior mistakes.
>
> Not following what you think a better alternative is?

An explicit keyword controlling the behaviour. WITH ... foo AS
MATERIALIZED (query) or whatnot.


> I'd be the first to agree that OFFSET 0 is a hack, but people are used
> to it.

So are they to CTEs being blocking.

Consider e.g. the case of being able to push down constraints into CTEs
(which have been inlined to allow for that).  Even in queries with a
non-0 OFFSET you can push down in a number of cases, making CTE inlining
useful.  You can't tack on an OFFSET 0 controlling that, without going
for a superflous subquery that just has an OFFSET 0, which is fairly
ridiculous.  What if we learn to inline subqueries with some offsets?


> Assuming that we go for inline-by-default for at least some cases,
> there's a separate discussion to be had about whether it's worth
> making a planner-control GUC to force the old behavior.  I'm not
> very excited about that, but I bet some people will be.

Yea, I am not either. I think we shouldn't go for it.

Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: Early WIP/PoC for inlining CTEs

Andrew Gierth
In reply to this post by Tom Lane-2
>>>>> "Tom" == Tom Lane <[hidden email]> writes:

 Tom> We can't inline wCTEs (those containing insert/update/delete)
 Tom> without risk of semantics change.

Clearly.

 Tom> I'd also not favor changing the semantics for CTEs that are read
 Tom> more than once by the parent query.

This one's more debatable. There will still be cases where a CTE
referenced multiple times will be better inlined.

(It's obviously trivial to make the posted code do it that way, just by
checking cterefcount.)

 Tom> However, a singly-referenced SELECT CTE could reasonably be
 Tom> treated as equivalent to a sub-select-in-FROM,

In the PoC code I also excluded SELECT FOR UPDATE from inlining.

(There's already a difference between how SELECT FOR UPDATE works for
CTEs compared to subqueries and views, the comments mention it)

There might also be some merit in checking for volatility?

--
Andrew (irc:RhodiumToad)

Reply | Threaded
Open this post in threaded view
|

Re: Early WIP/PoC for inlining CTEs

Andrew Gierth
In reply to this post by Andres Freund
>>>>> "Andres" == Andres Freund <[hidden email]> writes:

 Andres> Even in queries with a non-0 OFFSET you can push down in a
 Andres> number of cases,

really?

--
Andrew (irc:RhodiumToad)

Reply | Threaded
Open this post in threaded view
|

Re: Early WIP/PoC for inlining CTEs

Andres Freund
On 2018-07-25 01:08:44 +0100, Andrew Gierth wrote:
> >>>>> "Andres" == Andres Freund <[hidden email]> writes:
>
>  Andres> Even in queries with a non-0 OFFSET you can push down in a
>  Andres> number of cases,
>
> really?

Yea. I guess it's a bit dependant on what kind of behaviour you consider
as "pushing down".  I'm doubtful it's worth the analytical complexity on
ensuring it's safe, however.  With knowledge from the outer query you
e.g. can: trim the target list; remove outer joins below the OFFSET 0;
push down a restriction into an outer join below the OFFSET if that's
guaranteed to only return max one row, and not needed if not matching
the restrcition. I'm sure you can come up with more?

Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: Early WIP/PoC for inlining CTEs

David Fetter
In reply to this post by Andrew Gierth
On Tue, Jul 24, 2018 at 11:28:21PM +0100, Andrew Gierth wrote:

> About a year ago I was briefly in discussion/collaboration with Adam Sah
> regarding the topic of inlining CTEs into the query rather than treating
> them as optimization barriers. We didn't take it very far (he sent me
> some stuff, I wrote some stuff and sent it back, things kind of got
> dropped at that point); but there's been some recent discussion of this
> and some people have expressed an interest in seeing the code.
>
> So I'm posting the parts that I wrote for the benefit of anyone wanting
> to pick up the issue again. The assumption of this code is that some
> form of syntax would exist to mark materialized CTEs and set the
> "ctematerialized" flag.
>
> I haven't rebased this or tested it since last year; this patch is
> against b81eba6a65.
Please find attached a version rebased atop 167075be3ab1547e18 with
what I believe are appropriate changes to regression test output.  The
other changes to the regression tests output are somewhat puzzling, as
they change the actual results of queries.  I've also attached both
the "leftover" diff and the files to which it should be applied.

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

0001-Inlining-CTEs-v0002.patch (16K) Download Attachment
regression.diffs (1016 bytes) Download Attachment
with.out (50K) Download Attachment
window.out (120K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Early WIP/PoC for inlining CTEs

Andrew Gierth
>>>>> "David" == David Fetter <[hidden email]> writes:

 David> Please find attached a version rebased atop 167075be3ab1547e18
 David> with what I believe are appropriate changes to regression test
 David> output. The other changes to the regression tests output are
 David> somewhat puzzling, as they change the actual results of queries.

Both of those changes are the result of volatile CTEs being inlined more
than once (in one case, as part of an explicit test to ensure that CTEs
are being materialized and not multiply evaluated).

If you look for the XXX comment in the patch, it should be easy to add a
check that skips inlining if cterefcount > 1 and
contains_volatile_functions is true.

--
Andrew (irc:RhodiumToad)

Reply | Threaded
Open this post in threaded view
|

Re: Early WIP/PoC for inlining CTEs

Nico Williams
In reply to this post by David Fetter
On Wed, Jul 25, 2018 at 07:42:37AM +0200, David Fetter wrote:
> Please find attached a version rebased atop 167075be3ab1547e18 with
> what I believe are appropriate changes to regression test output.  The
> other changes to the regression tests output are somewhat puzzling, as
> they change the actual results of queries.  I've also attached both
> the "leftover" diff and the files to which it should be applied.

I think the SQL programmer needs some control over whether a CTE is:

 - a materialized view -- and therefore a barrier
 - a view (which can then be inlined by the optimizer)

It is possible to add a keyword for this purpose in the WITH syntax:

    WITH   VIEW (...) AS a_view
         , MATERIALIZED VIEW (...) AS a_barrier
    ...;

This would be a lot like creating TEMP views, but without the catalog
overhead.

(I wonder how hard it would be to partiion the OID namespace into
temp/persistent ranges so that temp schema elements need not be written
into the catalog.)

Nico
--

Reply | Threaded
Open this post in threaded view
|

Re: Early WIP/PoC for inlining CTEs

Andrew Gierth
>>>>> "Nico" == Nico Williams <[hidden email]> writes:

 Nico> It is possible to add a keyword for this purpose in the WITH syntax:

 Nico>     WITH   VIEW (...) AS a_view

The existing (and standard) syntax is WITH ctename AS (query).

Syntaxes that have been suggested for explicitly controlling the
materialization are along the lines of:

WITH ctename AS [[NOT] MATERIALIZED] (query)

(MATERIALIZED is already an un-reserved keyword)

--
Andrew (irc:RhodiumToad)

Reply | Threaded
Open this post in threaded view
|

Re: Early WIP/PoC for inlining CTEs

Nico Williams
On Wed, Jul 25, 2018 at 05:08:43PM +0100, Andrew Gierth wrote:
>  Nico> It is possible to add a keyword for this purpose in the WITH syntax:
>
>  Nico>     WITH   VIEW (...) AS a_view
>
> The existing (and standard) syntax is WITH ctename AS (query).

Oy, I flubbed that up.

> Syntaxes that have been suggested for explicitly controlling the
> materialization are along the lines of:
>
> WITH ctename AS [[NOT] MATERIALIZED] (query)
>
> (MATERIALIZED is already an un-reserved keyword)

Works for me.

Reply | Threaded
Open this post in threaded view
|

Re: Early WIP/PoC for inlining CTEs

Nico Williams
In reply to this post by Tom Lane-2
On Tue, Jul 24, 2018 at 07:57:49PM -0400, Tom Lane wrote:

> Andres Freund <[hidden email]> writes:
> > On 2018-07-24 19:49:19 -0400, Tom Lane wrote:
> >> However, a singly-referenced SELECT CTE could reasonably be treated as
> >> equivalent to a sub-select-in-FROM, and then you would have the same
> >> mechanisms for preventing inlining as you do for those cases,
> >> e.g. OFFSET 0.  And sticking in OFFSET 0 would be backwards-compatible
> >> too: your code would still work the same in older releases, unlike if
> >> we invent new syntax for this.
>
> > I still think this is just doubling down on prior mistakes.
>
> Not following what you think a better alternative is?  I'd be the
> first to agree that OFFSET 0 is a hack, but people are used to it.
>
> Assuming that we go for inline-by-default for at least some cases,
> there's a separate discussion to be had about whether it's worth
> making a planner-control GUC to force the old behavior.  I'm not
> very excited about that, but I bet some people will be.

It is widely known that CTEs in PG are optimizer barriers.

That actually is useful, and I do make use of that fact (though I'm not
proud of it).

My proposal is that PG add an extension for specifying that a CTE is to
be materialized (barrier) or not (then inlined).

Nico
--

Reply | Threaded
Open this post in threaded view
|

Re: Early WIP/PoC for inlining CTEs

David Fetter
In reply to this post by Andrew Gierth
On Wed, Jul 25, 2018 at 04:18:42PM +0100, Andrew Gierth wrote:

> >>>>> "David" == David Fetter <[hidden email]> writes:
>
>  David> Please find attached a version rebased atop 167075be3ab1547e18
>  David> with what I believe are appropriate changes to regression test
>  David> output. The other changes to the regression tests output are
>  David> somewhat puzzling, as they change the actual results of queries.
>
> Both of those changes are the result of volatile CTEs being inlined more
> than once (in one case, as part of an explicit test to ensure that CTEs
> are being materialized and not multiply evaluated).
>
> If you look for the XXX comment in the patch, it should be easy to add a
> check that skips inlining if cterefcount > 1 and
> contains_volatile_functions is true.
Thanks for the broad hints!

Please find attached the next version, which passes 'make check'.

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

0001-Inlining-CTEs-v0003.patch (16K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Early WIP/PoC for inlining CTEs

Andreas Karlsson
In reply to this post by Andrew Gierth
On 07/25/2018 06:08 PM, Andrew Gierth wrote:
> WITH ctename AS [[NOT] MATERIALIZED] (query)

I think "NOT MATERIALIZED" would be a bit misleading since the planner
may choose to materialize anyway, so I suggest skipping that part of the
syntax unless there is a really strong reason for having it.

Andreas

1234 ... 6