On Thu, Oct 8, 2020 at 12:12 PM Hou, Zhijie <[hidden email]> wrote: Hi Thank you zhijie, I will fix them in next version. Best Regards Andy Fan |
This patch has stopped moving for a while, any suggestion about how to move on is appreciated. Best Regards Andy Fan |
On 26/11/2020 16:58, Andy Fan wrote:
> This patch has stopped moving for a while, any suggestion about > how to move on is appreciated. The question on whether UniqueKey.exprs should be a list of EquivalenceClasses or PathKeys is unresolved. I don't have an opinion on that, but I'd suggest that you pick one or the other and just go with it. If it turns out to be a bad choice, then we'll change it. Quickly looking at the patches, there's one thing I think no one's mentioned yet, but looks really ugly to me: > + /* Make sure the path->parent point to current joinrel, can't update it in-place. */ > + foreach(lc, outer_rel->pathlist) > + { > + Size sz = size_of_path(lfirst(lc)); > + Path *path = palloc(sz); > + memcpy(path, lfirst(lc), sz); > + path->parent = joinrel; > + add_path(joinrel, path); > + } Copying a Path and modifying it like that is not good, there's got to be a better way to do this. Perhaps wrap the original Paths in ProjectionPaths, where the ProjectionPath's parent is the joinrel and dummypp=true. - Heikki |
Hi,
On 11/30/20 5:04 AM, Heikki Linnakangas wrote: > On 26/11/2020 16:58, Andy Fan wrote: >> This patch has stopped moving for a while, any suggestion about >> how to move on is appreciated. > > The question on whether UniqueKey.exprs should be a list of > EquivalenceClasses or PathKeys is unresolved. I don't have an opinion > on that, but I'd suggest that you pick one or the other and just go > with it. If it turns out to be a bad choice, then we'll change it. In this case I think it is matter of deciding if we are going to use EquivalenceClasses or Exprs before going further; there has been work ongoing in this area for a while, so having a clear direction from a committer would be greatly appreciated. Deciding would also help potential reviewers to give more feedback on the features implemented on top of the base. Should there be a new thread with the minimum requirements in order to get closer ? Best regards, Jesper |
On 30/11/2020 16:30, Jesper Pedersen wrote:
> On 11/30/20 5:04 AM, Heikki Linnakangas wrote: >> On 26/11/2020 16:58, Andy Fan wrote: >>> This patch has stopped moving for a while, any suggestion about >>> how to move on is appreciated. >> >> The question on whether UniqueKey.exprs should be a list of >> EquivalenceClasses or PathKeys is unresolved. I don't have an opinion >> on that, but I'd suggest that you pick one or the other and just go >> with it. If it turns out to be a bad choice, then we'll change it. > > In this case I think it is matter of deciding if we are going to use > EquivalenceClasses or Exprs before going further; there has been work > ongoing in this area for a while, so having a clear direction from a > committer would be greatly appreciated. Plain Exprs are not good enough, because you need to know which operator the expression is unique on. Usually, it's the default = operator in the default btree opclass for the datatype, but it could be something else, too. There's some precedence for PathKeys, as we generate PathKeys to represent the DISTINCT column in PlannerInfo->distinct_pathkeys. On the other hand, I've always found it confusing that we use PathKeys to represent DISTINCT and GROUP BY, which are not actually sort orderings. Perhaps it would make sense to store EquivalenceClass+opfamily in UniqueKey, and also replace distinct_pathkeys and group_pathkeys with UniqueKeys. That's just my 2 cents though, others more familiar with this planner code might have other opinions... - Heikki |
Hi
I look into the patch again and have some comments. 1. + Size oid_cmp_len = sizeof(Oid) * ind1->ncolumns; + + return ind1->ncolumns == ind2->ncolumns && + ind1->unique == ind2->unique && + memcmp(ind1->indexkeys, ind2->indexkeys, sizeof(int) * ind1->ncolumns) == 0 && + memcmp(ind1->opfamily, ind2->opfamily, oid_cmp_len) == 0 && + memcmp(ind1->opcintype, ind2->opcintype, oid_cmp_len) == 0 && + memcmp(ind1->sortopfamily, ind2->sortopfamily, oid_cmp_len) == 0 && + equal(get_tlist_exprs(ind1->indextlist, true), + get_tlist_exprs(ind2->indextlist, true)); The length of sortopfamily,opfamily and opcintype seems ->nkeycolumns not ->ncolumns. I checked function get_relation_info where init the IndexOptInfo. (If there are more places where can change the length, please correct me) 2. + COPY_SCALAR_FIELD(ncolumns); + COPY_SCALAR_FIELD(nkeycolumns); + COPY_SCALAR_FIELD(unique); + COPY_SCALAR_FIELD(immediate); + /* We just need to know if it is NIL or not */ + COPY_SCALAR_FIELD(indpred); + COPY_SCALAR_FIELD(predOK); + COPY_POINTER_FIELD(indexkeys, from->ncolumns * sizeof(int)); + COPY_POINTER_FIELD(indexcollations, from->ncolumns * sizeof(Oid)); + COPY_POINTER_FIELD(opfamily, from->ncolumns * sizeof(Oid)); + COPY_POINTER_FIELD(opcintype, from->ncolumns * sizeof(Oid)); + COPY_POINTER_FIELD(sortopfamily, from->ncolumns * sizeof(Oid)); + COPY_NODE_FIELD(indextlist); The same as 1. Should use nkeycolumns if I am right. 3. + foreach(lc, newnode->indextlist) + { + TargetEntry *tle = lfirst_node(TargetEntry, lc); + /* Index on expression is ignored */ + Assert(IsA(tle->expr, Var)); + tle->expr = (Expr *) find_parent_var(appinfo, (Var *) tle->expr); + newnode->indexkeys[idx] = castNode(Var, tle->expr)->varattno; + idx++; + } The count variable 'idx' can be replaces by foreach_current_index(). Best regards, houzj |
In reply to this post by Heikki Linnakangas
Thank you Heikki for your attention. On Mon, Nov 30, 2020 at 11:20 PM Heikki Linnakangas <[hidden email]> wrote: On 30/11/2020 16:30, Jesper Pedersen wrote: Actually I can't understand this, could you explain more? Based on my current knowledge, when we run "SELECT DISTINCT a FROM t", we never care about which operator to use for the unique. There's some precedence for PathKeys, as we generate PathKeys to OK, I have the same confusion now:) Perhaps it would make sense to store EquivalenceClass+opfamily in I can understand why we need EquivalenceClass for UniqueKey, but I can't understand why we need opfamily here. For anyone who is interested with these patchsets, here is my plan about this now. 1). I will try EquivalenceClass rather than Expr in UniqueKey and add opfamily if needed. 2). I will start a new thread to continue this topic. The current thread is too long which may scare some people who may have interest in it. 3). I will give up patch 5 & 6 for now. one reason I am not happy with the current implementation, and the other reason is I want to make the patchset smaller to make the reviewer easier. I will not give up them forever, after the main part of this patchset is committed, I will continue with them in a new thread. Thanks everyone for your input. Best Regards Andy Fan |
On 05/12/2020 17:10, Andy Fan wrote: > Actually I can't understand this, could you explain more? Based on my > current > knowledge, when we run "SELECT DISTINCT a FROM t", we never care about > which operator to use for the unique. SortGroupClause includes 'eqop' field, which determines the operator that the expression needs to made unique with. The syntax doesn't let you set it to anything else than the default btree opclass of the datatype, though. But you can specify it for ORDER BY, and we use SortGroupClauses to represent both sorting and grouping. Also, if you use the same struct to also represent columns that you know to be unique, and not just the DISTINCT clause in the query, then you need the operator. For example, if you create a unique index on non-default opfamily. > There's some precedence for PathKeys, as we generate PathKeys to > represent the DISTINCT column in PlannerInfo->distinct_pathkeys. On the > other hand, I've always found it confusing that we use PathKeys to > represent DISTINCT and GROUP BY, which are not actually sort orderings. > > > OK, I have the same confusion now:) > > Perhaps it would make sense to store EquivalenceClass+opfamily in > UniqueKey, and also replace distinct_pathkeys and group_pathkeys with > UniqueKeys. > > > I can understand why we need EquivalenceClass for UniqueKey, but I can't > understand why we need opfamily here. Thinking a bit harder, I guess we don't. Because EquivalenceClass includes the operator family already, in the ec_opfamilies field. > For anyone who is interested with these patchsets, here is my plan > about this now. 1). I will try EquivalenceClass rather than Expr in > UniqueKey and add opfamily if needed. 2). I will start a new thread > to continue this topic. The current thread is too long which may > scare some people who may have interest in it. 3). I will give up > patch 5 & 6 for now. one reason I am not happy with the current > implementation, and the other reason is I want to make the patchset > smaller to make the reviewer easier. I will not give up them forever, > after the main part of this patchset is committed, I will continue > with them in a new thread. Thanks everyone for your input. - Heikki |
Heikki Linnakangas <[hidden email]> writes:
>> I can understand why we need EquivalenceClass for UniqueKey, but I can't >> understand why we need opfamily here. > Thinking a bit harder, I guess we don't. Because EquivalenceClass > includes the operator family already, in the ec_opfamilies field. No. EquivalenceClasses only care about equality, which is why they might potentially mention several opfamilies that share an equality operator. If you care about sort order, you *cannot* rely on an EquivalenceClass to depict that. Now, abstract uniqueness also only cares about equality, but if you are going to implement it via sort- and-unique then you need to settle on a sort order. I agree we are overspecifying DISTINCT by settling on a sort operator at parse time, rather than considering all the possibilities at plan time. But given that opfamilies sharing equality are mostly a hypothetical use-case, I'm not in a big hurry to fix it. Before we had ASC/DESC indexes, there was a real use-case for making a "reverse sort" opclass, with the same equality as the type's regular opclass but the opposite sort order. But that's ancient history now, and I've seen few other plausible use-cases. I have not been following this thread closely enough to understand why we need a new "UniqueKeys" data structure at all. But if the motivation is only to remove this overspecification, I humbly suggest that it ain't worth the trouble. regards, tom lane |
Thank you Tom and Heikki for your input. On Sun, Dec 6, 2020 at 4:40 AM Tom Lane <[hidden email]> wrote: Heikki Linnakangas <[hidden email]> writes: I think UniqueKey only cares about equality. Even DISTINCT / groupBy can be implemented with sort, but UniqueKey only care about the result of DISTINCT/GROUPBY, so it doesn't matter IIUC.
Currently the UniqueKey is defined as a List of Expr, rather than EquivalenceClasses. A complete discussion until now can be found at [1] (The messages I replied to also care a lot and the information is completed). This patch has stopped at this place for a while, I'm planning to try EquivalenceClasses, but any suggestion would be welcome. But if the Best Regards Andy Fan |
Hi,
On 12/5/20 10:38 PM, Andy Fan wrote: > Currently the UniqueKey is defined as a List of Expr, rather than > EquivalenceClasses. > A complete discussion until now can be found at [1] (The messages I replied > to also > care a lot and the information is completed). This patch has stopped at > this place for > a while, I'm planning to try EquivalenceClasses, but any suggestion would > be welcome. > > Unfortunately I think we need a RfC style patch of both versions in their minimum implementation. Hopefully this will make it easier for one or more committers to decide on the right direction since they can do a side-by-side comparison of the two solutions. Just my $0.02. Thanks for working on this Andy ! Best regards, Jesper |
On Mon, Dec 7, 2020 at 4:16 PM Jesper Pedersen <[hidden email]> wrote: Hi, I do get the exact same idea. Actually I have made EquivalenceClasses works with baserel last weekend and then I realized it is hard to compare the 2 situations without looking into the real/Poc code, even for very experienced people. I will submit a new patch after I get the partitioned relation, subquery works. Hope I can make it in one week. Just my $0.02. Best Regards Andy Fan |
In reply to this post by Andy Fan
On Sun, Dec 6, 2020 at 9:09 AM Andy Fan <[hidden email]> wrote:
>> >> I have not been following this thread closely enough to understand >> why we need a new "UniqueKeys" data structure at all. > > > Currently the UniqueKey is defined as a List of Expr, rather than EquivalenceClasses. > A complete discussion until now can be found at [1] (The messages I replied to also > care a lot and the information is completed). This patch has stopped at this place for > a while, I'm planning to try EquivalenceClasses, but any suggestion would be welcome. > >> >> But if the >> motivation is only to remove this overspecification, I humbly suggest >> that it ain't worth the trouble. AFAIK, the simple answer is we need some way to tell that certain expressions together form a unique key for a given relation. E.g. group by clause forms a unique key for the output of GROUP BY. Pathkeys have a stronger requirement that the relation is ordered on that expression, which may not be the case with uniqueness e.g. output of GROUP BY produced by hash grouping. To me it's Pathkeys - ordering, so we could use Pathkeys with reduced strength. But that might affect a lot of places which depend upon stronger pathkeys. -- Best Wishes, Ashutosh Bapat |
In reply to this post by Andy Fan
On Sun, 6 Dec 2020 at 04:10, Andy Fan <[hidden email]> wrote:
> For anyone who is interested with these patchsets, here is my plan about this > now. 1). I will try EquivalenceClass rather than Expr in UniqueKey and add opfamily > if needed. I agree that we should be storing them in EquivalenceClasses. Apart from what was mentioned already it also allow the optimisation to work in cases like: create table t (a int not null unique, b int); select distinct b from t where a = b; David |
In reply to this post by Andy Fan
Hi Andy,
On Mon, Dec 7, 2020 at 9:15 PM Andy Fan <[hidden email]> wrote: > > > > On Mon, Dec 7, 2020 at 4:16 PM Jesper Pedersen <[hidden email]> wrote: >> >> Hi, >> >> On 12/5/20 10:38 PM, Andy Fan wrote: >> > Currently the UniqueKey is defined as a List of Expr, rather than >> > EquivalenceClasses. >> > A complete discussion until now can be found at [1] (The messages I replied >> > to also >> > care a lot and the information is completed). This patch has stopped at >> > this place for >> > a while, I'm planning to try EquivalenceClasses, but any suggestion would >> > be welcome. >> > >> > >> >> Unfortunately I think we need a RfC style patch of both versions in >> their minimum implementation. >> >> Hopefully this will make it easier for one or more committers to decide >> on the right direction since they can do a side-by-side comparison of >> the two solutions. >> > > I do get the exact same idea. Actually I have made EquivalenceClasses > works with baserel last weekend and then I realized it is hard to compare > the 2 situations without looking into the real/Poc code, even for very > experienced people. I will submit a new patch after I get the partitioned > relation, subquery works. Hope I can make it in one week. Status update for a commitfest entry. Are you planning to submit a new patch? Or is there any blocker for this work? This patch entry on CF app has been in state Waiting on Author for a while. If there is any update on that, please reflect on CF app. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/ |
Hi Masahiko: On Fri, Jan 22, 2021 at 9:15 PM Masahiko Sawada <[hidden email]> wrote: Hi Andy, I agree that the current status is "Waiting on author", and no block issue for others. I plan to work on this in 1 month. I have to get my current urgent case completed first. Sorry for the delay action and thanks for asking. Best Regards Andy Fan (https://www.aliyun.com/) |
On Sun, Jan 24, 2021 at 6:26 PM Andy Fan <[hidden email]> wrote:
I'd start to continue this work today. At the same time, I will split the multi-patch series into some dedicated small chunks for easier review. The first one is just for adding a notnullattrs in RelOptInfo struct, in thread [1]. Best Regards Andy Fan (https://www.aliyun.com/) |
Free forum by Nabble | Edit this page |