The semantics of NOT IN (SELECT ...) are subtly different from the semantics
of NOT EXISTS (SELECT ...). These differences center on how NULLs are treated, and in general can result in statements that are harder to optimize and slower to execute than the apparently similar NOT EXISTS statement. A little over a year ago, Christian Antognini authored the blog "/How Well a Query Optimizer Handles Subqueries?/" summarizing his findings about the performance of PostgreSQL, MySQL, and Oracle on various subqueries: https://antognini.ch/2017/12/how-well-a-query-optimizer-handles-subqueries/ His position was that you can classify the optimizations as correct or incorrect, and based on that he provided the following comparison summary (see below). In short, PostgreSQL was the worst of the three systems: "Summary The number of queries that the query optimizers handle correctly are the following: Oracle Database 12.2: 72 out of 80 MySQL 8.0.3: 67 out of 80 PostgreSQL 10.0: 60 out of 80 Since not all queries are handled correctly, for best performance it is sometimes necessary to rewrite them." The subqueries that were found to be optimized "incorrectly" were almost entirely due to poor or absent NOT IN subquery optimization. The PostgreSQL community has been aware of the deficiencies in NOT IN optimization for quite some time. Based on an analysis of psgsql-performance posts between 2013 and 2015, Robert Haas identified NOT IN optimization as one of the common root causes of performance problems. We have been working on improved optimization of NOT IN, and we would like to share this optimizaton with the community. With respect to the test cases mentioned in the blog post mentioned above, it will elevate PostgreSQL from "worst" to "first". Generally the performance gains are large when the optimization applies, though we have found one test case where performance is worse. We are investigating this now to see if we can disable the optimization in that case. We would like to include a patch for this change in the current commitfest. This thread can be used to track comments about this optimization. ----- Jim Finnerty, AWS, Amazon Aurora PostgreSQL -- Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
Jim Finnerty, AWS, Amazon Aurora PostgreSQL
|
Jim Finnerty <[hidden email]> writes:
> We have been working on improved optimization of NOT IN, and we would like > to share this optimizaton with the community. The idea that's been kicked around in the past is to detect whether the subselect's output column(s) can be proved NOT NULL, and if so, convert to an antijoin just like NOT EXISTS. Is that what you're doing, or something different? > We would like to include a patch for this change in the current commitfest. Generally, people just send patches, they don't ask for permission first ;-) Having said that, we have a general policy that we don't like complex patches that first show up for the last commitfest of a dev cycle. So unless this is a pretty small patch, it's probably going to get delayed to v13. Still, we'd like to have it in the queue, so submit away ... regards, tom lane |
re: The idea that's been kicked around in the past is to detect whether the
subselect's output column(s) can be proved NOT NULL, and if so, convert to an antijoin just like NOT EXISTS basically, yes. this will handle nullability of both the outer and inner correlated expression(s), multiple expressions, presence or absence of predicates in the WHERE clause, and whether the correlated expressions are on the null-padded side of an outer join. If it is judged to be more efficient, then it transforms the NOT IN sublink into an anti-join. some complications enter into the decision to transform NOT IN to anti-join based on whether a bitmap plan will/not be used, or whether it will/not be eligible for PQ. ----- Jim Finnerty, AWS, Amazon Aurora PostgreSQL -- Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
Jim Finnerty, AWS, Amazon Aurora PostgreSQL
|
Jim Finnerty <[hidden email]> writes:
> re: The idea that's been kicked around in the past is to detect whether the > subselect's output column(s) can be proved NOT NULL, and if so, convert > to an antijoin just like NOT EXISTS > basically, yes. this will handle nullability of both the outer and inner > correlated expression(s), multiple expressions, presence or absence of > predicates in the WHERE clause, and whether the correlated expressions are > on the null-padded side of an outer join. If it is judged to be more > efficient, then it transforms the NOT IN sublink into an anti-join. Hmm, that seems overcomplicated ... > some complications enter into the decision to transform NOT IN to anti-join > based on whether a bitmap plan will/not be used, or whether it will/not be > eligible for PQ. ... and that even more so, considering that this decision really needs to be taken long before cost estimates would be available. As far as I can see, there should be no situation where we'd not want to transform to antijoin if we can prove it's semantically valid to do so. If there are cases where that comes out as a worse plan, that indicates a costing error that would be something to address separately (because it'd also be a problem for other antijoin cases). Also, as long as it nearly always wins, I'm not going to cry too hard if there are corner cases where it makes the wrong choice. That's not something that's possible to avoid completely. regards, tom lane |
We can always correctly transform a NOT IN to a correlated NOT EXISTS. In
almost all cases it is more efficient to do so. In the one case that we've found that is slower it does come down to a more general costing issue, so that's probably the right way to think about it. ----- Jim Finnerty, AWS, Amazon Aurora PostgreSQL -- Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
Jim Finnerty, AWS, Amazon Aurora PostgreSQL
|
On Thu, 21 Feb 2019 at 16:27, Jim Finnerty <[hidden email]> wrote:
> We can always correctly transform a NOT IN to a correlated NOT EXISTS. In > almost all cases it is more efficient to do so. In the one case that we've > found that is slower it does come down to a more general costing issue, so > that's probably the right way to think about it. I worked on this over 4 years ago [1]. I think the patch there is not completely broken and seems just to need a few things fixed. I rebased it on top of current master and looked at it. I think the main remaining issue is fixing the code that ensures the outer side join quals can't be NULL. The code that's there looks broken still since it attempts to use quals from any inner joined rel for proofs that NULLs will be removed. That might not work so well in a case like: SELECT * FROM t1 LEFT JOIN t2 ON t1.a = t2.a AND t2.b NOT IN(select b from t3), however, I'd need to think harder about that since if there was such a qual then the planner should convert the left join into an inner join. But anyway, the function expressions_are_not_nullable() was more intended to work with targetlists to ensure exprs there can't be NULL. I just had done a poor job of trying to modify that into allowing it to take exprs from any random place, likely that should be a new function and expressions_are_not_nullable() should be put back to what Tom ended up with. I've attached the rebased and still broken version. [1] https://www.postgresql.org/message-id/CAApHDvqRB-iFBy68%3DdCgqS46aRep7AuN2pou4KTwL8kX9YOcTQ%40mail.gmail.com -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services |
On Fri, 22 Feb 2019 at 09:44, David Rowley <[hidden email]> wrote:
> I've attached the rebased and still broken version. I set about trying to make a less broken version of this. A quick reminder of the semantics of NOT IN: 1. WHERE <nullable_column> NOT IN(SELECT <not null column> FROM table); If table is non-empty: will filter out rows where <nullable_column> is NULL and only show values that are not in <not null column> If table is empty: Filters nothing. 2. WHERE <nonnullable_column> NOT IN(SELECT <null column> FROM table); If table contains NULLs in the <null column> no records will match. The previous patch handled #2 correctly but neglected to do anything about #1. For #1 the only way we can implement this as a planner only change is to insist that the outer side expressions also are not null. If we could have somehow tested if "table" was non-empty then we could have added a IS NOT NULL clause to the outer query and converted to an anti-join, but ... can't know that during planning and can't add the IS NOT NULL regardless as, if "table" is empty we will filter NULLs when we shouldn't. In the attached, I set about fixing #1 by determining if the outer expressions could be NULL by checking 1. If expression is a Var from an inner joined relation it can't be NULL if there's a NOT NULL constraint on the column; or 2. If expression is a Var from an inner joined relation and there is a strict WHERE/ON clause, the expression can't be NULL; or 3. If expression is a Var from an outer joined relation check for quals that were specified in the same syntactical level as the NOT IN for proofs that NULL will be filtered. An example of #3 is: SELECT * FROM t1 LEFT JOIN t2 on t1.a = t2.a WHERE t2.a IS NOT NULL AND t2.a NOT IN(SELECT a FROM t3); -- t2 becomes INNER JOINed later in planning, but... or; SELECT * FROM t1 LEFT JOIN t2 on t1.a = t2.a AND t2.a NOT IN(SELECT a FROM t3); In the latter of the two, the t1.a = t2.a join conditions ensures that NULLs can't exist where the NOT IN is evaluated. I implemented #3 by passing the quals down to pull_up_sublinks_qual_recurse(). At the top level call 'node' and 'notnull_proofs' are the same, but that changes in recursive calls like the one we make inside the is_andclause() condition. Comments welcome. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services |
I'm attaching a working patch following the discussion.
You can also find the following patch description is the commit message: NOT IN to ANTI JOIN transformation The semantics of ANTI JOIN were created to match the semantics of NOT EXISTS, which enables NOT EXISTS subqueries to be efficiently executed as a kind of join. NOT IN subqueries have different NULL semantics than NOT EXISTS, but since there is no special join operator for NOT IN it is generally executed as a nested sub-plan. It is possible, however, to transform NOT IN to a correlated NOT EXISTS so that it can be executed it as an ANTI JOIN with additional correlated predicates. A general transformation from NOT IN to NOT EXISTS for the single-expression case (the multi-expression case is just ANDs of the single-expressions) is: t1.x NOT IN (SELECT t2.y from t2 where p) <=> NOT EXISTS (select 1 from t2 where (y=x or y is NULL or x is NULL) and p). If x or y is non-nullable, we can safely remove the predicate "x is NULL" or "y is NULL", and if there is no predicate p, then "x is NULL" may be factored out of the subquery. Experiments show that if we can remove one or the other ORed predicates, or if we can factor out the "x is NULL", then execution is typically much faster. Basically, for the single expression case (we also handle the multi expression case), we try to do the following transformation: When p does not exist: t1.x not in (t2.y) => ANTI JOIN t1(Filter: x is not null), t2 on join condition: t1.x=t2.y or t2.y is null. When x is non-nullable: t1.x not in (t2.y where p) => ANTI JOIN t1, t2 on join condition: (t1.x=t2.y or t2.y is null) and p. We implemented a nullability test routine is_node_nonnullable(). Currently it handles Var, TargetEntry, CoalesceExpr and Const. Outer joins are taken into consideration in the nullability test. We adjust and apply reduce_outer_joins() before the transformation so that the outer joins have an opportunity to be converted to inner joins prior to the transformation. Using this transformation, we measured performance improvements of two to three orders of magnitude on most queries in a development environment. In our performance experiments, table s (small) has 11 rows, table l (large) has 1 million rows. s.n and l.n have NULL values. s.nn and l.nn are NOT NULL. s.n not in (l.n) 1150 ms -> 0.49 ms s.nn not in (l.nn) 1120 ms -> 0.45 ms l.n not in (l.n) over 20 min -> 1700 ms l.nn not in (l.nn) over 20 min -> 220 ms l.n not in (s.n) 63 ms -> 750 ms l.nn not in (s.nn) 58 ms -> 46 ms For the only case that performance drops - l.n not in (s.n). It is likely to be resolved by ending the nested loop anti join early as soon as we find NULL inner tuple entry/entries that satisfies the join condition during execution. This is still under investigation. Comments are welcome. With Regards, --- Zheng Li, AWS, Amazon Aurora PostgreSQL On 2/25/19, 7:32 AM, "David Rowley" <[hidden email]> wrote: On Fri, 22 Feb 2019 at 09:44, David Rowley <[hidden email]> wrote: > I've attached the rebased and still broken version. I set about trying to make a less broken version of this. A quick reminder of the semantics of NOT IN: 1. WHERE <nullable_column> NOT IN(SELECT <not null column> FROM table); If table is non-empty: will filter out rows where <nullable_column> is NULL and only show values that are not in <not null column> If table is empty: Filters nothing. 2. WHERE <nonnullable_column> NOT IN(SELECT <null column> FROM table); If table contains NULLs in the <null column> no records will match. The previous patch handled #2 correctly but neglected to do anything about #1. For #1 the only way we can implement this as a planner only change is to insist that the outer side expressions also are not null. If we could have somehow tested if "table" was non-empty then we could have added a IS NOT NULL clause to the outer query and converted to an anti-join, but ... can't know that during planning and can't add the IS NOT NULL regardless as, if "table" is empty we will filter NULLs when we shouldn't. In the attached, I set about fixing #1 by determining if the outer expressions could be NULL by checking 1. If expression is a Var from an inner joined relation it can't be NULL if there's a NOT NULL constraint on the column; or 2. If expression is a Var from an inner joined relation and there is a strict WHERE/ON clause, the expression can't be NULL; or 3. If expression is a Var from an outer joined relation check for quals that were specified in the same syntactical level as the NOT IN for proofs that NULL will be filtered. An example of #3 is: SELECT * FROM t1 LEFT JOIN t2 on t1.a = t2.a WHERE t2.a IS NOT NULL AND t2.a NOT IN(SELECT a FROM t3); -- t2 becomes INNER JOINed later in planning, but... or; SELECT * FROM t1 LEFT JOIN t2 on t1.a = t2.a AND t2.a NOT IN(SELECT a FROM t3); In the latter of the two, the t1.a = t2.a join conditions ensures that NULLs can't exist where the NOT IN is evaluated. I implemented #3 by passing the quals down to pull_up_sublinks_qual_recurse(). At the top level call 'node' and 'notnull_proofs' are the same, but that changes in recursive calls like the one we make inside the is_andclause() condition. Comments welcome. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services |
Resend the patch with a whitespace removed so that "git apply patch" works directly.
--- Zheng Li, AWS, Amazon Aurora PostgreSQL On 2/25/19, 12:39 PM, "Li, Zheng" <[hidden email]> wrote: I'm attaching a working patch following the discussion. You can also find the following patch description is the commit message: NOT IN to ANTI JOIN transformation The semantics of ANTI JOIN were created to match the semantics of NOT EXISTS, which enables NOT EXISTS subqueries to be efficiently executed as a kind of join. NOT IN subqueries have different NULL semantics than NOT EXISTS, but since there is no special join operator for NOT IN it is generally executed as a nested sub-plan. It is possible, however, to transform NOT IN to a correlated NOT EXISTS so that it can be executed it as an ANTI JOIN with additional correlated predicates. A general transformation from NOT IN to NOT EXISTS for the single-expression case (the multi-expression case is just ANDs of the single-expressions) is: t1.x NOT IN (SELECT t2.y from t2 where p) <=> NOT EXISTS (select 1 from t2 where (y=x or y is NULL or x is NULL) and p). If x or y is non-nullable, we can safely remove the predicate "x is NULL" or "y is NULL", and if there is no predicate p, then "x is NULL" may be factored out of the subquery. Experiments show that if we can remove one or the other ORed predicates, or if we can factor out the "x is NULL", then execution is typically much faster. Basically, for the single expression case (we also handle the multi expression case), we try to do the following transformation: When p does not exist: t1.x not in (t2.y) => ANTI JOIN t1(Filter: x is not null), t2 on join condition: t1.x=t2.y or t2.y is null. When x is non-nullable: t1.x not in (t2.y where p) => ANTI JOIN t1, t2 on join condition: (t1.x=t2.y or t2.y is null) and p. We implemented a nullability test routine is_node_nonnullable(). Currently it handles Var, TargetEntry, CoalesceExpr and Const. Outer joins are taken into consideration in the nullability test. We adjust and apply reduce_outer_joins() before the transformation so that the outer joins have an opportunity to be converted to inner joins prior to the transformation. Using this transformation, we measured performance improvements of two to three orders of magnitude on most queries in a development environment. In our performance experiments, table s (small) has 11 rows, table l (large) has 1 million rows. s.n and l.n have NULL values. s.nn and l.nn are NOT NULL. s.n not in (l.n) 1150 ms -> 0.49 ms s.nn not in (l.nn) 1120 ms -> 0.45 ms l.n not in (l.n) over 20 min -> 1700 ms l.nn not in (l.nn) over 20 min -> 220 ms l.n not in (s.n) 63 ms -> 750 ms l.nn not in (s.nn) 58 ms -> 46 ms For the only case that performance drops - l.n not in (s.n). It is likely to be resolved by ending the nested loop anti join early as soon as we find NULL inner tuple entry/entries that satisfies the join condition during execution. This is still under investigation. Comments are welcome. With Regards, --- Zheng Li, AWS, Amazon Aurora PostgreSQL On 2/25/19, 7:32 AM, "David Rowley" <[hidden email]> wrote: On Fri, 22 Feb 2019 at 09:44, David Rowley <[hidden email]> wrote: > I've attached the rebased and still broken version. I set about trying to make a less broken version of this. A quick reminder of the semantics of NOT IN: 1. WHERE <nullable_column> NOT IN(SELECT <not null column> FROM table); If table is non-empty: will filter out rows where <nullable_column> is NULL and only show values that are not in <not null column> If table is empty: Filters nothing. 2. WHERE <nonnullable_column> NOT IN(SELECT <null column> FROM table); If table contains NULLs in the <null column> no records will match. The previous patch handled #2 correctly but neglected to do anything about #1. For #1 the only way we can implement this as a planner only change is to insist that the outer side expressions also are not null. If we could have somehow tested if "table" was non-empty then we could have added a IS NOT NULL clause to the outer query and converted to an anti-join, but ... can't know that during planning and can't add the IS NOT NULL regardless as, if "table" is empty we will filter NULLs when we shouldn't. In the attached, I set about fixing #1 by determining if the outer expressions could be NULL by checking 1. If expression is a Var from an inner joined relation it can't be NULL if there's a NOT NULL constraint on the column; or 2. If expression is a Var from an inner joined relation and there is a strict WHERE/ON clause, the expression can't be NULL; or 3. If expression is a Var from an outer joined relation check for quals that were specified in the same syntactical level as the NOT IN for proofs that NULL will be filtered. An example of #3 is: SELECT * FROM t1 LEFT JOIN t2 on t1.a = t2.a WHERE t2.a IS NOT NULL AND t2.a NOT IN(SELECT a FROM t3); -- t2 becomes INNER JOINed later in planning, but... or; SELECT * FROM t1 LEFT JOIN t2 on t1.a = t2.a AND t2.a NOT IN(SELECT a FROM t3); In the latter of the two, the t1.a = t2.a join conditions ensures that NULLs can't exist where the NOT IN is evaluated. I implemented #3 by passing the quals down to pull_up_sublinks_qual_recurse(). At the top level call 'node' and 'notnull_proofs' are the same, but that changes in recursive calls like the one we make inside the is_andclause() condition. Comments welcome. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services |
On Tue, 26 Feb 2019 at 11:51, Li, Zheng <[hidden email]> wrote:
> Resend the patch with a whitespace removed so that "git apply patch" works directly. I had a quick look at this and it seems to be broken for the empty table case I mentioned up thread. Quick example: Setup: create table t1 (a int); create table t2 (a int not null); insert into t1 values(NULL),(1),(2); select * from t1 where a not in(select a from t2); Patched: a --- 1 2 (2 rows) Master: a --- 1 2 (3 rows) This will be due to the fact you're adding an a IS NOT NULL qual to the scan of a: postgres=# explain select * from t1 where a not in(select a from t2); QUERY PLAN ------------------------------------------------------------------ Hash Anti Join (cost=67.38..152.18 rows=1268 width=4) Hash Cond: (t1.a = t2.a) -> Seq Scan on t1 (cost=0.00..35.50 rows=2537 width=4) Filter: (a IS NOT NULL) -> Hash (cost=35.50..35.50 rows=2550 width=4) -> Seq Scan on t2 (cost=0.00..35.50 rows=2550 width=4) (6 rows) but as I mentioned, you can't do that as t2 might be empty and there's no way to know that during planning. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services |
Greenplum Database does this optimization. The idea is to use a new join type, let's call it JOIN_LASJ_NOTIN, and its semantic regarding NULL is defined as below: 1. If there is a NULL in the outer side, and the inner side is empty, the NULL should be part of the outputs. 2. If there is a NULL in the outer side, and the inner side is not empty, the NULL should not be part of the outputs. 3. If there is a NULL in the inner side, no outputs should be produced. An example plan looks like: gpadmin=# explain (costs off) select * from t1 where a not in(select a from t2); QUERY PLAN ----------------------------------- Hash Left Anti Semi (Not-In) Join Hash Cond: (t1.a = t2.a) -> Seq Scan on t1 -> Hash -> Seq Scan on t2 (5 rows) Thanks Richard On Tue, Feb 26, 2019 at 7:20 AM David Rowley <[hidden email]> wrote: On Tue, 26 Feb 2019 at 11:51, Li, Zheng <[hidden email]> wrote: |
In reply to this post by David Rowley-3
The problem is that the special optimization that was proposed for the case
where the subquery has no WHERE clause isn't valid when the subquery returns no rows. That additional optimization needs to be removed, and preferably replaced with some sort of efficient run-time test. ----- Jim Finnerty, AWS, Amazon Aurora PostgreSQL -- Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
Jim Finnerty, AWS, Amazon Aurora PostgreSQL
|
On Wed, 27 Feb 2019 at 03:07, Jim Finnerty <[hidden email]> wrote:
> > The problem is that the special optimization that was proposed for the case > where the subquery has no WHERE clause isn't valid when the subquery returns > no rows. That additional optimization needs to be removed, and preferably > replaced with some sort of efficient run-time test. That is one option, however, the join type that Richard mentions, to satisfy point #3, surely only can work for Hash joins and perhaps Merge joins that required a sort, assuming there's some way for the sort to communicate about if it found NULLs or not. Either way, we need to have looked at the entire inner side to ensure there are no nulls. Probably it would be possible to somehow team that up with a planner check to see if the inner exprs could be NULL then just implement points #1 and #2 for other join methods. If you're proposing to do that for this thread then I can take my planner only patch somewhere else. I only posted my patch as I pretty much already had what I thought you were originally talking about. However, please be aware there are current patents around adding execution time smarts in this area, so it's probably unlikely you'll find a way to do this in the executor that does not infringe on those. Probably no committer would want to touch it. I think my patch covers a good number of use cases and as far as I understand, does not go near any current patents. Please let me know if I should move my patch to another thread. I don't want to hi-jack this if you're going in another direction. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services |
I agree we will need some runtime smarts (such as a new anti join type as pointed out by Richard) to "ultimately" account for all the cases of NOT IN queries.
However, given that the March CommitFest is imminent and the runtime smarts patent concerns David had pointed out (which I was not aware of before), we would not move that direction at the moment. I propose that we collaborate to build one patch from the two patches submitted in this thread for the CF. The two patches are for the same purpose and similar. However, they differ in the following ways as far as I can tell: Nullability Test: -David's patch uses strict predicates for nullability test. -Our patch doesn't use strict predicates, but it accounts for COALESCE and null-padded rows from outer join. In addition, we made reduce_outer_joins() work before the transformation which makes the nullability test more accurate. Anti Join Transformation: -Dvaid's patch does the transformation when both inner and outer outputs are non-nullable. -With the latest fix (for the empty table case), our patch does the transformation as long as the outer is non-nullable regardless of the inner nullability, experiments show that the results are always faster. David, please let me know what you think. If you would like to collaborate, I'll start merging with your code on using strict predicates to make a better Nullability Test. Thanks, Zheng |
"Li, Zheng" <[hidden email]> writes:
> However, given that the March CommitFest is imminent and the runtime smarts patent concerns David had pointed out (which I was not aware of before), we would not move that direction at the moment. > I propose that we collaborate to build one patch from the two patches submitted in this thread for the CF. TBH, I think it's very unlikely that any patch for this will be seriously considered for commit in v12. It would be against project policy, and spending a lot of time reviewing the patch would be quite unfair to other patches that have been in the queue longer. Therefore, I'd suggest that you not bend things out of shape just to have some patch to submit before March 1. It'd be better to work with the goal of having a coherent patch ready for the first v13 CF, probably July-ish. regards, tom lane |
In reply to this post by zhengli
On Wed, 27 Feb 2019 at 13:05, Li, Zheng <[hidden email]> wrote:
> -With the latest fix (for the empty table case), our patch does the transformation as long as the outer is non-nullable regardless of the inner nullability, experiments show that the results are always faster. Hi Zheng, I'm interested to know how this works without testing for inner nullability. If any of the inner side's join exprs are NULL then no records can match. What do you propose to work around that? -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services |
In reply to this post by Tom Lane-2
On Wed, 27 Feb 2019 at 13:13, Tom Lane <[hidden email]> wrote:
> > "Li, Zheng" <[hidden email]> writes: > > However, given that the March CommitFest is imminent and the runtime smarts patent concerns David had pointed out (which I was not aware of before), we would not move that direction at the moment. > > > I propose that we collaborate to build one patch from the two patches submitted in this thread for the CF. > > TBH, I think it's very unlikely that any patch for this will be seriously > considered for commit in v12. It would be against project policy, and > spending a lot of time reviewing the patch would be quite unfair to other > patches that have been in the queue longer. Therefore, I'd suggest that > you not bend things out of shape just to have some patch to submit before > March 1. It'd be better to work with the goal of having a coherent patch > ready for the first v13 CF, probably July-ish. FWIW, I did add this to the March CF, but I set the target version to 13. I wasn't considering this for PG12. I see Zheng was, but I agree with you on PG13 being the target for this. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services |
I'm totally fine with setting the target to PG13.
-- I'm interested to know how this works without testing for inner nullability. If any of the inner side's join exprs are NULL then no records can match. What do you propose to work around that? -- We still check for inner side's nullability, when it is nullable we append a "var is NULL" to the anti join condition. So every outer tuple is going to evaluate to true on the join condition when there is indeed a null entry in the inner. Actually I think the nested loop anti join can end early in this case, but I haven't find a way to do it properly, this may be one other reason why we need a new join type for NOT IN. e.g. explain select count(*) from s where u not in (select n from l); QUERY PLAN ------------------------------------------------------------------------------------ Aggregate (cost=2892.88..2892.89 rows=1 width=8) -> Nested Loop Anti Join (cost=258.87..2892.88 rows=1 width=0) -> Seq Scan on s (cost=0.00..1.11 rows=11 width=4) -> Bitmap Heap Scan on l (cost=258.87..262.88 rows=1 width=4) Recheck Cond: ((s.u = n) OR (n IS NULL)) -> BitmapOr (cost=258.87..258.87 rows=1 width=0) -> Bitmap Index Scan on l_n (cost=0.00..4.43 rows=1 width=0) Index Cond: (s.u = n) -> Bitmap Index Scan on l_n (cost=0.00..4.43 rows=1 width=0) Index Cond: (n IS NULL) Zheng |
On Wed, 27 Feb 2019 at 13:41, Li, Zheng <[hidden email]> wrote:
> We still check for inner side's nullability, when it is nullable we > append a "var is NULL" to the anti join condition. So every outer > tuple is going to evaluate to true on the join condition when there > is indeed a null entry in the inner. That's possible, at least providing the var is NULL is an OR condition, but the problem there is that you force the plan into a nested loop join. That's unfortunately not going to perform very well when the number of rows to process is large. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services |
In reply to this post by David Rowley-3
On Wed, Feb 27, 2019 at 4:52 AM David Rowley <[hidden email]> wrote: On Wed, 27 Feb 2019 at 03:07, Jim Finnerty <[hidden email]> wrote: Thanks for pointing out the patent concerns. I was not aware of that before. Could you please provide some clue where I can find more info about the patents? Thanks Richard |
Free forum by Nabble | Edit this page |