On Sun, 24 May 2020 at 04:14, Dmitry Dolgov <[hidden email]> wrote:
> > > On Fri, May 22, 2020 at 08:40:17AM +1200, David Rowley wrote: > > I imagine we'll set some required UniqueKeys during > > standard_qp_callback() > > In standard_qp_callback, because pathkeys are computed at this point I > guess? Yes. In particular, we set the pathkeys for DISTINCT clauses there. > > and then we'll try to create some Skip Scan > > paths (which are marked with UniqueKeys) if the base relation does not > > already have UniqueKeys that satisfy the required UniqueKeys that were > > set during standard_qp_callback(). > > For a simple distinct query those UniqueKeys would be set based on > distinct clause. If I understand correctly, the very same is implemented > right now in create_distinct_paths, just after building all index paths, > so wouldn't it be just a duplication? I think we need to create the skip scan paths when we create the other paths for base relations. We shouldn't be adjusting existing index paths during create_distinct_paths(). The last code I saw for the skip scans patch did something like if (IsA(path, IndexScanPath)) in create_distinct_paths(), but that's only ever going to work when the query is to a single relation. You'll never see IndexScanPaths in the upper planner's paths when there are joins. You'd see join type paths instead. It is possible to make use of skip scans for DISTINCT when the query has joins. We'd just need to ensure the join does not duplicate the unique rows from the skip scanned relation. > In general UniqueKeys in the skip scan patch were created from > distinctClause in build_index_paths (looks similar to what you've > described) and then based on them created index skip scan paths. So my > expectations were that the patch from this thread would work similar. The difference will be that you'd be setting some distinct_uniquekeys in standard_qp_callback() to explicitly request that some skip scan paths be created for the uniquekeys, whereas the patch here just does not bother doing DISTINCT if the upper relation already has unique keys that state that the DISTINCT is not required. The skip scans patch should check if the RelOptInfo for the uniquekeys set in standard_qp_callback() are already mentioned in the RelOptInfo's uniquekeys. If they are then there's no point in skip scanning as the rel is already unique for the distinct_uniquekeys. David |
> On Mon, May 25, 2020 at 06:34:30AM +1200, David Rowley wrote:
> > > For a simple distinct query those UniqueKeys would be set based on > > distinct clause. If I understand correctly, the very same is implemented > > right now in create_distinct_paths, just after building all index paths, > > so wouldn't it be just a duplication? > > I think we need to create the skip scan paths when we create the other > paths for base relations. We shouldn't be adjusting existing index > paths during create_distinct_paths(). The last code I saw for the > skip scans patch did something like if (IsA(path, IndexScanPath)) in > create_distinct_paths() It's not the case since the late March. > > In general UniqueKeys in the skip scan patch were created from > > distinctClause in build_index_paths (looks similar to what you've > > described) and then based on them created index skip scan paths. So my > > expectations were that the patch from this thread would work similar. > > The difference will be that you'd be setting some distinct_uniquekeys > in standard_qp_callback() to explicitly request that some skip scan > paths be created for the uniquekeys, whereas the patch here just does > not bother doing DISTINCT if the upper relation already has unique > keys that state that the DISTINCT is not required. The skip scans > patch should check if the RelOptInfo for the uniquekeys set in > standard_qp_callback() are already mentioned in the RelOptInfo's > uniquekeys. If they are then there's no point in skip scanning as the > rel is already unique for the distinct_uniquekeys. It sounds like it makes semantics of UniqueKey a bit more confusing, isn't it? At the moment it says: Represents the unique properties held by a RelOptInfo. With the proposed changes it would be "unique properties, that are held" and "unique properties, that are requested", which are partially duplicated, but stored in some different fields. From the skip scan patch perspective it's probably doesn't make any difference, seems like the implementation would be almost the same, just created UniqueKeys would be of different type. But I'm afraid potentiall future users of UniqueKeys could be easily confused. |
On Mon, 25 May 2020 at 19:14, Dmitry Dolgov <[hidden email]> wrote:
> > > On Mon, May 25, 2020 at 06:34:30AM +1200, David Rowley wrote: > > The difference will be that you'd be setting some distinct_uniquekeys > > in standard_qp_callback() to explicitly request that some skip scan > > paths be created for the uniquekeys, whereas the patch here just does > > not bother doing DISTINCT if the upper relation already has unique > > keys that state that the DISTINCT is not required. The skip scans > > patch should check if the RelOptInfo for the uniquekeys set in > > standard_qp_callback() are already mentioned in the RelOptInfo's > > uniquekeys. If they are then there's no point in skip scanning as the > > rel is already unique for the distinct_uniquekeys. > > It sounds like it makes semantics of UniqueKey a bit more confusing, > isn't it? At the moment it says: > > Represents the unique properties held by a RelOptInfo. > > With the proposed changes it would be "unique properties, that are held" > and "unique properties, that are requested", which are partially > duplicated, but stored in some different fields. From the skip scan > patch perspective it's probably doesn't make any difference, seems like > the implementation would be almost the same, just created UniqueKeys > would be of different type. But I'm afraid potentiall future users of > UniqueKeys could be easily confused. If there's some comment that says UniqueKeys are for RelOptInfos, then perhaps that comment just needs to be expanded to mention the Path uniqueness when we add the uniquekeys field to Path. I think the main point of basing skip scans on top of this uniquekeys patch is to ensure it's the right thing for the job. I don't think it's realistic to be maintaining two different sets of infrastructure which serve a very similar purpose. It's important we make UniqueKeys general purpose enough to support future useful forms of optimisation. Basing skip scans on it seems like a good exercise towards that. I'm not expecting that we need to make zero changes here to allow it to work well with skip scans. David |
In reply to this post by David Rowley
On Mon, May 25, 2020 at 2:34 AM David Rowley <[hidden email]> wrote: On Sun, 24 May 2020 at 04:14, Dmitry Dolgov <[hidden email]> wrote: Actually I have some issues to understand from here, then try to read index skip scan patch to fully understand what is the requirement, but that doesn't get it so far[1]. So what is the "UniqueKeys" in "UniqueKeys during standard_qp_callback()" and what is the "pathkeys" in "pathkeys are computed at this point” means? I tried to think it as root->distinct_pathkeys, however I didn't fully understand where root->distinct_pathkeys is used for as well. Best Regards Andy Fan |
On Fri, 5 Jun 2020 at 14:36, Andy Fan <[hidden email]> wrote:
> On Mon, May 25, 2020 at 2:34 AM David Rowley <[hidden email]> wrote: >> >> On Sun, 24 May 2020 at 04:14, Dmitry Dolgov <[hidden email]> wrote: >> > >> > > On Fri, May 22, 2020 at 08:40:17AM +1200, David Rowley wrote: >> > > I imagine we'll set some required UniqueKeys during >> > > standard_qp_callback() >> > >> > In standard_qp_callback, because pathkeys are computed at this point I >> > guess? >> >> Yes. In particular, we set the pathkeys for DISTINCT clauses there. >> > > Actually I have some issues to understand from here, then try to read index > skip scan patch to fully understand what is the requirement, but that doesn't > get it so far[1]. So what is the "UniqueKeys" in "UniqueKeys during > standard_qp_callback()" and what is the "pathkeys" in "pathkeys are computed > at this point” means? I tried to think it as root->distinct_pathkeys, however I > didn't fully understand where root->distinct_pathkeys is used for as well. In standard_qp_callback(), what we'll do with uniquekeys is pretty much what we already do with pathkeys there. Basically pathkeys are set there to have the planner attempt to produce a plan that satisfies those pathkeys. Notice at the end of standard_qp_callback() we set the pathkeys according to the first upper planner operation that'll need to make use of those pathkeys. e.g, If there's a GROUP BY and a DISTINCT in the query, then use the pathkeys for GROUP BY, since that must occur before DISTINCT. Likely uniquekeys will want to follow the same rules there for the operations that can make use of paths with uniquekeys, which in this case, I believe, will be the same as the example I just mentioned for pathkeys, except we'll only be able to support GROUP BY without any aggregate functions. David > [1] https://www.postgresql.org/message-id/CAKU4AWq%3DwWkAo-CDOQ5Ea6UwYvZCgb501w6iqU0rtnTT-zg6bQ%40mail.gmail.com |
On Fri, Jun 5, 2020 at 10:57 AM David Rowley <[hidden email]> wrote: On Fri, 5 Jun 2020 at 14:36, Andy Fan <[hidden email]> wrote: the pathkeys according to the first upper planner operation that'll Thanks for your explanation. Looks I understand now based on your comments. Take root->group_pathkeys for example, the similar information also available in root->parse->groupClauses but we do use of root->group_pathkeys with pathkeys_count_contained_in function in many places, that is mainly because the content between between the 2 is different some times, like the case in pathkey_is_redundant. Likely uniquekeys will want to follow the All the places I want to use UniqueKey so far (like distinct, group by and others) have an input_relation (RelOptInfo), and the UniqueKey information can be get there. at the same time, all the pathkey in PlannerInfo is used for Upper planner but UniqueKey may be used in current planner some time, like reduce_semianti_joins/ remove_useless_join, I am not sure if we must maintain uniquekey in PlannerInfo. Best Regards Andy Fan |
In reply to this post by David Rowley
> On Fri, Jun 05, 2020 at 12:26:15PM +1200, David Rowley wrote:
> On Mon, 25 May 2020 at 19:14, Dmitry Dolgov <[hidden email]> wrote: > > > > > On Mon, May 25, 2020 at 06:34:30AM +1200, David Rowley wrote: > > > The difference will be that you'd be setting some distinct_uniquekeys > > > in standard_qp_callback() to explicitly request that some skip scan > > > paths be created for the uniquekeys, whereas the patch here just does > > > not bother doing DISTINCT if the upper relation already has unique > > > keys that state that the DISTINCT is not required. The skip scans > > > patch should check if the RelOptInfo for the uniquekeys set in > > > standard_qp_callback() are already mentioned in the RelOptInfo's > > > uniquekeys. If they are then there's no point in skip scanning as the > > > rel is already unique for the distinct_uniquekeys. > > > > It sounds like it makes semantics of UniqueKey a bit more confusing, > > isn't it? At the moment it says: > > > > Represents the unique properties held by a RelOptInfo. > > > > With the proposed changes it would be "unique properties, that are held" > > and "unique properties, that are requested", which are partially > > duplicated, but stored in some different fields. From the skip scan > > patch perspective it's probably doesn't make any difference, seems like > > the implementation would be almost the same, just created UniqueKeys > > would be of different type. But I'm afraid potentiall future users of > > UniqueKeys could be easily confused. > > If there's some comment that says UniqueKeys are for RelOptInfos, then > perhaps that comment just needs to be expanded to mention the Path > uniqueness when we add the uniquekeys field to Path. My concerns are more about having two different sets of distinct uniquekeys: * one prepared in standard_qp_callback for skip scan (I guess those should be added to PlannerInfo?) * one in create_distinct_paths as per current implementation with what seems to be similar content. > I think the main point of basing skip scans on top of this uniquekeys > patch is to ensure it's the right thing for the job. I don't think > it's realistic to be maintaining two different sets of infrastructure > which serve a very similar purpose. It's important we make UniqueKeys > general purpose enough to support future useful forms of optimisation. > Basing skip scans on it seems like a good exercise towards that. I'm > not expecting that we need to make zero changes here to allow it to > work well with skip scans. Sure, no one suggests to have two ways of saying "this thing is unique". I'm just trying to figure out how to make skip scan and uniquekeys play together without having rough edges. |
On Sat, 6 Jun 2020 at 21:15, Dmitry Dolgov <[hidden email]> wrote:
> My concerns are more about having two different sets of distinct > uniquekeys: > > * one prepared in standard_qp_callback for skip scan (I guess those > should be added to PlannerInfo?) Yes. Those must be set so that we know if and what we should try to create Skip Scan Index paths for. Just like we'll create index paths for PlannerInfo.query_pathkeys. > * one in create_distinct_paths as per current implementation > > with what seems to be similar content. I think we need to have UniqueKeys in RelOptInfo so we can describe what a relation is unique by. There's no point for example in creating skip scan paths for a relation that's already unique on whatever we might try to skip scan on. e.g someone does: SELECT DISTINCT unique_and_indexed_column FROM tab; Since there's a unique index on unique_and_indexed_column then we needn't try to create a skipscan path for it. However, the advantages of having UniqueKeys on the RelOptInfo goes a little deeper than that. We can make use of it anywhere where we currently do relation_has_unique_index_for() for. Plus we get what Andy wants and can skip useless DISTINCT operations when the result is already unique on the distinct clause. Sure we could carry all the relation's unique properties around in Paths, but that's not the right place. It's logically a property of the relation, not the path specifically. RelOptInfo is a good place to store the properties of relations. The idea of the meaning of uniquekeys within a path is that the path is specifically making those keys unique. We're not duplicating the RelOptInfo's uniquekeys there. If we have a table like: CREATE TABLE tab ( a INT PRIMARY KEY, b INT NOT NULL ); CREATE INDEX tab_b_idx ON tab (b); Then I'd expect a query such as: SELECT DISTINCT b FROM tab; to have the uniquekeys for tab's RelOptInfo set to {a}, and the seqscan and index scan paths uniquekey properties set to NULL, but the skipscan index path uniquekeys for tab_b_idx set to {b}. Then when we go create the distinct paths Andy's work will see that there's no RelOptInfo uniquekeys for the distinct clause, but the skip scan work will loop over the unique_pathlist and find that we have a skipscan path with the required uniquekeys, a.k.a {b}. Does that make sense? David |
> On Sun, Jun 07, 2020 at 06:51:22PM +1200, David Rowley wrote:
> > > * one in create_distinct_paths as per current implementation > > > > with what seems to be similar content. > > I think we need to have UniqueKeys in RelOptInfo so we can describe > what a relation is unique by. There's no point for example in > creating skip scan paths for a relation that's already unique on > whatever we might try to skip scan on. e.g someone does: > > SELECT DISTINCT unique_and_indexed_column FROM tab; > > Since there's a unique index on unique_and_indexed_column then we > needn't try to create a skipscan path for it. > > However, the advantages of having UniqueKeys on the RelOptInfo goes a > little deeper than that. We can make use of it anywhere where we > currently do relation_has_unique_index_for() for. Plus we get what > Andy wants and can skip useless DISTINCT operations when the result is > already unique on the distinct clause. Sure we could carry all the > relation's unique properties around in Paths, but that's not the right > place. It's logically a property of the relation, not the path > specifically. RelOptInfo is a good place to store the properties of > relations. > > The idea of the meaning of uniquekeys within a path is that the path > is specifically making those keys unique. We're not duplicating the > RelOptInfo's uniquekeys there. > > If we have a table like: > > CREATE TABLE tab ( > a INT PRIMARY KEY, > b INT NOT NULL > ); > > CREATE INDEX tab_b_idx ON tab (b); > > Then I'd expect a query such as: SELECT DISTINCT b FROM tab; to have > the uniquekeys for tab's RelOptInfo set to {a}, and the seqscan and > index scan paths uniquekey properties set to NULL, but the skipscan > index path uniquekeys for tab_b_idx set to {b}. Then when we go > create the distinct paths Andy's work will see that there's no > RelOptInfo uniquekeys for the distinct clause, but the skip scan work > will loop over the unique_pathlist and find that we have a skipscan > path with the required uniquekeys, a.k.a {b}. > > Does that make sense? Yes, from this point of view it makes sense. I've already posted the first version of index skip scan based on this implementation [1]. There could be rought edges, but overall I hope we're on the same page. [1]: https://www.postgresql.org/message-id/flat/20200609102247.jdlatmfyeecg52fi%40localhost |
I just did another self-review about this patch and took some suggestions based on the discussion above. The attached is the v9 version. When you check the uniquekey patch, README.uniquekey should be a good place to start with. Main changes in v9 includes: 1. called populate_baserel_uniquekeys after check_index_predicates. 2. removed the UniqueKey->onerow flag since we can tell it by exprs == NIL. 3. expression index code improvement. 4. code & comments refactoring. As for the Index Skip Scan, I still have not merged the changes in the Index Skip Scan patch[1]. We may need some addition for that, but probably not need to modify the existing code. After we can finalize it, we can add it in that patch. I will keep a close eye on it as well. Best Regards Andy Fan ![]() ![]() ![]() ![]() ![]() ![]() |
Fixed a test case in v10. Best Regards Andy Fan ![]() ![]() ![]() ![]() ![]() ![]() |
Hi Andy, A small thing I found: +static List * +get_exprs_from_uniqueindex(IndexOptInfo *unique_index, + List *const_exprs, + List *const_expr_opfamilies, + Bitmapset *used_varattrs, + bool *useful, + bool *multi_nullvals) … + indexpr_item = list_head(unique_index->indexprs); + for(c = 0; c < unique_index->ncolumns; c++) + { I believe the for loop must be over unique_index->nkeycolumns, rather than columns. It shouldn’t include the extra non-key columns. This can currently lead to invalid memory accesses as well a few lines later when it does an array access
of unique_index->opfamily[c] – this array only has nkeycolumns entries. -Floris From: Andy Fan <[hidden email]> Fixed a test case in v10. -- Best Regards Andy Fan |
Hi Floris: On Thu, Jul 23, 2020 at 3:22 AM Floris Van Nee <[hidden email]> wrote:
You are correct, I would include this in the next version patch, Thank you for this checking! -- Andy Fan Best Regards
Best Regards Andy Fan |
On Tue, Aug 04, 2020 at 06:59:50AM +0800, Andy Fan wrote:
> You are correct, I would include this in the next version patch, Thank you > for this checking! Regression tests are failing with this patch set applied. The CF bot says so, and I can reproduce that locally as well. Could you look at that please? I have switched the patch to "waiting on author". -- Michael |
On Mon, Sep 7, 2020 at 3:22 PM Michael Paquier <[hidden email]> wrote: On Tue, Aug 04, 2020 at 06:59:50AM +0800, Andy Fan wrote: Thank you Michael for checking it, I can reproduce the same locally after rebasing to the latest master. The attached v11 has fixed it and includes the fix Floris found. The status of this patch is we are still in discussion about which data type should UniqueKey->expr use. Both David [1] and I [2] shared some thinking about EquivalenceClasses, but neither of us have decided on it. So I still didn't change anything about that now. I can change it once we have decided on it. [1] https://www.postgresql.org/message-id/CAApHDvoDMyw%3DhTuW-258yqNK4bhW6CpguJU_GZBh4x%2Brnoem3w%40mail.gmail.com [2] https://www.postgresql.org/message-id/CAKU4AWqy3Uv67%3DPR8RXG6LVoO-cMEwfW_LMwTxHdGrnu%2Bcf%2BdA%40mail.gmail.com Best Regards Andy Fan ![]() ![]() ![]() ![]() ![]() ![]() |
> On Wed, Sep 09, 2020 at 07:51:12AM +0800, Andy Fan wrote:
> > Thank you Michael for checking it, I can reproduce the same locally after > rebasing to the latest master. The attached v11 has fixed it and includes > the fix Floris found. > > The status of this patch is we are still in discussion about which data > type should > UniqueKey->expr use. Both David [1] and I [2] shared some thinking about > EquivalenceClasses, but neither of us have decided on it. So I still didn't > change > anything about that now. I can change it once we have decided on it. > > [1] > https://www.postgresql.org/message-id/CAApHDvoDMyw%3DhTuW-258yqNK4bhW6CpguJU_GZBh4x%2Brnoem3w%40mail.gmail.com > > [2] > https://www.postgresql.org/message-id/CAKU4AWqy3Uv67%3DPR8RXG6LVoO-cMEwfW_LMwTxHdGrnu%2Bcf%2BdA%40mail.gmail.com Hi, In the Index Skip Scan thread Peter mentioned couple of issues that I believe need to be addressed here. In fact one about valgrind errors was already fixed as far as I see (nkeycolumns instead of ncolumns), another one was: /code/postgresql/patch/build/../source/src/backend/optimizer/path/uniquekeys.c: In function ‘populate_baserel_uniquekeys’: /code/postgresql/patch/build/../source/src/backend/optimizer/path/uniquekeys.c:797:13: warning: ‘expr’ may be used uninitialized in this function [-Wmaybe-uninitialized] 797 | else if (!list_member(unique_index->rel->reltarget->exprs, expr)) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Other than that I wanted to ask what are the plans to proceed with this patch? It's been a while since the question was raised in which format to keep unique key expressions, and as far as I can see no detailed suggestions or patch changes were proposed as a follow up. Obviously I would love to see the first two preparation patches committed to avoid dependencies between patches, and want to suggest an incremental approach with simple format for start (what we have right now) with the idea how to extend it in the future to cover more cases. |
On Wed, Oct 7, 2020 at 9:55 PM Dmitry Dolgov <[hidden email]> wrote: > On Wed, Sep 09, 2020 at 07:51:12AM +0800, Andy Fan wrote: I can fix this warning in the next version, thanks for reporting it. It can be fixed like below or just adjust the if-elseif-else pattern. +++ b/src/backend/optimizer/path/uniquekeys.c @@ -760,6 +760,7 @@ get_exprs_from_uniqueindex(IndexOptInfo *unique_index, { /* Index on system column is not supported */ Assert(false); + expr = NULL; /* make compiler happy */ } Other than that I wanted to ask what are the plans to proceed with this dedicated time to review which would be the hardest part for the reviewers. I don't have a good suggestion, however. -- Best Regards Andy Fan |
Hi
I have a look over this patch and find some typos in 0002. 1.Some typos about unique: There are some spelling mistakes about "unique" in code comments and README. Such as: "+However we define the UnqiueKey as below." 2.function name about initililze_uniquecontext_for_joinrel: May be it should be initialize_ uniquecontext_for_joinrel. 3.some typos in comment: + * baserelation's basicrestrictinfo. so it must be in ON clauses. I think it shoule be " basicrestrictinfo " => "baserestrictinfo". Besides, I think list_copy can be used to simplify the following code. (But It seems the type of expr is still in discussion, so this may has no impact ) + List *exprs = NIL; ... + foreach(lc, unionrel->reltarget->exprs) + { + exprs = lappend(exprs, lfirst(lc)); + } Best regards, |
In reply to this post by Andy Fan
> On Thu, Oct 08, 2020 at 09:34:51AM +0800, Andy Fan wrote:
> > > Other than that I wanted to ask what are the plans to proceed with this > > patch? It's been a while since the question was raised in which format > > to keep unique key expressions, and as far as I can see no detailed > > suggestions or patch changes were proposed as a follow up. Obviously I > > would love to see the first two preparation patches committed to avoid > > dependencies between patches, and want to suggest an incremental > > approach with simple format for start (what we have right now) with the > > idea how to extend it in the future to cover more cases. > > > > I think the hardest part of this series is commit 2, it probably needs > lots of > dedicated time to review which would be the hardest part for the reviewers. > I don't have a good suggestion, however. Sure, and I would review the patch as well. But as far as I understand the main issue is "how to store uniquekey expressions", and as long as it is not decided, no additional review will move the patch forward I guess. |
On Thu, Oct 8, 2020 at 6:39 PM Dmitry Dolgov <[hidden email]> wrote: > On Thu, Oct 08, 2020 at 09:34:51AM +0800, Andy Fan wrote: Thank you very much! But as far as I understand I don't think so:) The patch may have other issues as well. For example, logic error or duplicated code or cases needing improvement and so on. Best Regards Andy Fan |
Free forum by Nabble | Edit this page |