Index Skip Scan (new UniqueKeys)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Index Skip Scan (new UniqueKeys)

Dmitry Dolgov
Hi,

Here is a new version of index skip scan patch, based on v8 patch for
UniqueKeys implementation from [1]. I want to start a new thread to
simplify navigation, hopefully I didn't forget anyone who actively
participated in the discussion.

To simplify reviewing I've split it into several parts:

* First two are taken from [1] just for the reference and to make cfbot happy.

* Extend-UniqueKeys consists changes that needs to be done for
  UniqueKeys to be used in skip scan. Essentially this is a reduced
  version of previous implementation from Jesper & David, based on the
  new UniqueKeys infrastructure, as it does the very same thing.

* Index-Skip-Scan contains not am specific code and the overall
  infrastructure, including configuration elements and so on.

* Btree-implementation contains btree specific code to implement amskip,
  introduced in the previous patch.

* The last one contains just documentation bits.

Interesting enough with a new UniqueKey implementation skipping is
applied in some tests unexpectedly. For those I've disabled
indexskipscan to avoid confusion.

[1]: https://www.postgresql.org/message-id/flat/CAKU4AWrwZMAL%3DuaFUDMf4WGOVkEL3ONbatqju9nSXTUucpp_pw%40mail.gmail.com

v8-0001-Introduce-RelOptInfo-notnullattrs-attribute.patch (2K) Download Attachment
v8-0002-Introuduce-RelOptInfo.uniquekeys-attribute.patch (49K) Download Attachment
v35-0001-Extend-UniqueKeys.patch (13K) Download Attachment
v35-0002-Index-skip-scan.patch (39K) Download Attachment
v35-0003-Btree-implementation-of-skipping.patch (41K) Download Attachment
v35-0004-Index-skip-scan-documentation.patch (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Index Skip Scan (new UniqueKeys)

Andy Fan


On Tue, Jun 9, 2020 at 6:20 PM Dmitry Dolgov <[hidden email]> wrote:
Hi,

Here is a new version of index skip scan patch, based on v8 patch for
UniqueKeys implementation from [1]. I want to start a new thread to
simplify navigation, hopefully I didn't forget anyone who actively
participated in the discussion.

To simplify reviewing I've split it into several parts:

* First two are taken from [1] just for the reference and to make cfbot happy.

* Extend-UniqueKeys consists changes that needs to be done for
  UniqueKeys to be used in skip scan. Essentially this is a reduced
  version of previous implementation from Jesper & David, based on the
  new UniqueKeys infrastructure, as it does the very same thing.

* Index-Skip-Scan contains not am specific code and the overall
  infrastructure, including configuration elements and so on.

* Btree-implementation contains btree specific code to implement amskip,
  introduced in the previous patch.

* The last one contains just documentation bits.

Interesting enough with a new UniqueKey implementation skipping is
applied in some tests unexpectedly. For those I've disabled
indexskipscan to avoid confusion.

[1]: https://www.postgresql.org/message-id/flat/CAKU4AWrwZMAL%3DuaFUDMf4WGOVkEL3ONbatqju9nSXTUucpp_pw%40mail.gmail.com

Thanks for the patch.

I just get the rough idea of patch, looks we have to narrow down the user cases
where we can use this method.  Consider the below example:

create table j1(i int, im5 int,  im100 int, im1000 int);
insert into j1 select i, i%5, i%100, i%1000 from generate_series(1, 10000000)i;
create index on j1(im5, i);
insert into j1 values (1, 1, 0, 0);
analyze j1;

demo=# select distinct * from j1 where i < 2;
 i | im5 | im100 | im1000
---+-----+-------+--------
 1 |   1 |     1 |      1
(1 row)

demo=# set enable_indexskipscan to off;
SET
demo=# select distinct * from j1 where i < 2;
 i | im5 | im100 | im1000
---+-----+-------+--------
 1 |   1 |     0 |      0
 1 |   1 |     1 |      1
(2 rows)

drop index j1_im5_i_idx;

create index on j1(im5, i, im100);
demo=# select distinct im5, i, im100 from j1 where i < 2;
 im5 | i | im100
-----+---+-------
   1 | 1 |     0
   1 | 1 |     1
(2 rows)
demo=# set enable_indexskipscan to on;
SET
demo=# select distinct im5, i, im100 from j1 where i < 2;
 im5 | i | im100
-----+---+-------
   1 | 1 |     0
(1 row)


--
Best Regards
Andy Fan
Reply | Threaded
Open this post in threaded view
|

Re: Index Skip Scan (new UniqueKeys)

Dmitry Dolgov
> On Thu, Jun 11, 2020 at 04:14:07PM +0800, Andy Fan wrote:
>
> I just get the rough idea of patch, looks we have to narrow down the
> user cases where we can use this method. Consider the below example:

Hi

Not exactly narrow down, but rather get rid of wrong usage of skipping
for index scan. Since skipping for it was added later than for index
only scan I can imagine there are still blind spots, so good that you've
looked. In this particular case, when index expressions do not fully
cover those expressionse result need to be distinct on, skipping just
doesn't have enough information and should not be used. I'll add it to
the next version, thanks!


Reply | Threaded
Open this post in threaded view
|

Re: Index Skip Scan (new UniqueKeys)

Peter Geoghegan-4
In reply to this post by Dmitry Dolgov
On Tue, Jun 9, 2020 at 3:20 AM Dmitry Dolgov <[hidden email]> wrote:
> * Btree-implementation contains btree specific code to implement amskip,
>   introduced in the previous patch.

The way that you're dealing with B-Tree tuples here needs to account
for posting list tuples:

> +               currItem = &so->currPos.items[so->currPos.lastItem];
> +               itup = (IndexTuple) (so->currTuples + currItem->tupleOffset);
> +               nextOffset = ItemPointerGetOffsetNumber(&itup->t_tid);

But I wonder more generally what the idea here is. The following
comments that immediately follow provide some hints:

> +               /*
> +                * To check if we returned the same tuple, try to find a
> +                * startItup on the current page. For that we need to update
> +                * scankey to match the whole tuple and set nextkey to return
> +                * an exact tuple, not the next one. If the nextOffset is the
> +                * same as before, it means we are in the loop, return offnum
> +                * to the original position and jump further
> +                */

Why does it make sense to use the offset number like this? It isn't
stable or reliable. The patch goes on to do this:

> +                   startOffset = _bt_binsrch(scan->indexRelation,
> +                                             so->skipScanKey,
> +                                             so->currPos.buf);
> +
> +                   page = BufferGetPage(so->currPos.buf);
> +                   maxoff = PageGetMaxOffsetNumber(page);
> +
> +                   if (nextOffset <= startOffset)
> +                   {

Why compare a heap TID's offset number (an offset number for a heap
page) to another offset number for a B-Tree leaf page? They're
fundamentally different things.

--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

RE: Index Skip Scan (new UniqueKeys)

Floris Van Nee
In reply to this post by Dmitry Dolgov
Hi Dmitry,

Also took another look at the patch now, and found a case of incorrect data. It looks related to the new way of creating the paths, as I can't recall seeing this in earlier versions.

create table t1 as select a,b,b%5 as c, random() as d from generate_series(1, 10) a, generate_series(1,100) b;
create index on t1 (a,b,c);

postgres=# explain select distinct on (a) * from t1 order by a,b desc,c;
                                  QUERY PLAN                                  
-------------------------------------------------------------------------------
 Sort  (cost=2.92..2.94 rows=10 width=20)
   Sort Key: a, b DESC, c
   ->  Index Scan using t1_a_b_c_idx on t1  (cost=0.28..2.75 rows=10 width=20)
         Skip scan: true
(4 rows)


With the 'order by a, b desc, c' we expect the value of column 'b' to always be 100. With index_skipscan enabled, it always gives 1 though. It's not correct that the planner chooses a skip scan followed by sort in this case.

-Floris



Reply | Threaded
Open this post in threaded view
|

Re: Index Skip Scan (new UniqueKeys)

Dmitry Dolgov
In reply to this post by Peter Geoghegan-4
> On Wed, Jul 08, 2020 at 03:44:26PM -0700, Peter Geoghegan wrote:
>
> On Tue, Jun 9, 2020 at 3:20 AM Dmitry Dolgov <[hidden email]> wrote:
> > * Btree-implementation contains btree specific code to implement amskip,
> >   introduced in the previous patch.
>
> The way that you're dealing with B-Tree tuples here needs to account
> for posting list tuples:
>
> > +               currItem = &so->currPos.items[so->currPos.lastItem];
> > +               itup = (IndexTuple) (so->currTuples + currItem->tupleOffset);
> > +               nextOffset = ItemPointerGetOffsetNumber(&itup->t_tid);

Do you mean this last part with t_tid, which could also have a tid array
in case of posting tuple format?

> > +               /*
> > +                * To check if we returned the same tuple, try to find a
> > +                * startItup on the current page. For that we need to update
> > +                * scankey to match the whole tuple and set nextkey to return
> > +                * an exact tuple, not the next one. If the nextOffset is the
> > +                * same as before, it means we are in the loop, return offnum
> > +                * to the original position and jump further
> > +                */
>
> Why does it make sense to use the offset number like this? It isn't
> stable or reliable. The patch goes on to do this:
>
> > +                   startOffset = _bt_binsrch(scan->indexRelation,
> > +                                             so->skipScanKey,
> > +                                             so->currPos.buf);
> > +
> > +                   page = BufferGetPage(so->currPos.buf);
> > +                   maxoff = PageGetMaxOffsetNumber(page);
> > +
> > +                   if (nextOffset <= startOffset)
> > +                   {
>
> Why compare a heap TID's offset number (an offset number for a heap
> page) to another offset number for a B-Tree leaf page? They're
> fundamentally different things.

Well, it's obviously wrong, thanks for noticing. What is necessary is to
compare two index tuples, the start and the next one, to test if they're
the same (in which case if I'm not mistaken probably we can compare item
pointers). I've got this question when I was about to post a new version
with changes to address feedback from Andy, now I'll combine them and
send a cumulative patch.


Reply | Threaded
Open this post in threaded view
|

Re: Index Skip Scan (new UniqueKeys)

Dmitry Dolgov
In reply to this post by Floris Van Nee
> On Fri, Jul 10, 2020 at 05:03:37PM +0000, Floris Van Nee wrote:
>
> Also took another look at the patch now, and found a case of incorrect
> data. It looks related to the new way of creating the paths, as I
> can't recall seeing this in earlier versions.
>
> create table t1 as select a,b,b%5 as c, random() as d from generate_series(1, 10) a, generate_series(1,100) b;
> create index on t1 (a,b,c);
>
> postgres=# explain select distinct on (a) * from t1 order by a,b desc,c;
>                                   QUERY PLAN
> -------------------------------------------------------------------------------
>  Sort  (cost=2.92..2.94 rows=10 width=20)
>    Sort Key: a, b DESC, c
>    ->  Index Scan using t1_a_b_c_idx on t1  (cost=0.28..2.75 rows=10 width=20)
>          Skip scan: true
> (4 rows)

Good point, thanks for looking at this. With the latest planner version
there are indeed more possibilities to use skipping. It never occured to
me that some of those paths will still rely on index scan returning full
data set. I'll look in details and add verification to prevent putting
something like this on top of skip scan in the next version.


Reply | Threaded
Open this post in threaded view
|

RE: Index Skip Scan (new UniqueKeys)

Floris Van Nee
>
> Good point, thanks for looking at this. With the latest planner version there
> are indeed more possibilities to use skipping. It never occured to me that
> some of those paths will still rely on index scan returning full data set. I'll look
> in details and add verification to prevent putting something like this on top of
> skip scan in the next version.

I believe the required changes are something like in attached patch. There were a few things I've changed:
- build_uniquekeys was constructing the list incorrectly. For a DISTINCT a,b, it would create two unique keys, one with a and one with b. However, it should be one unique key with (a,b).
- the uniquekeys that is built, still contains some redundant keys, that are normally eliminated from the path keys lists.
- the distinct_pathkeys may be NULL, even though there's a possibility for skipping. But it wouldn't create the uniquekeys in this case. This makes the planner not choose skip scans even though it could. For example in queries that do SELECT DISTINCT ON (a) * FROM t1 WHERE a=1 ORDER BY a,b; Since a is constant, it's eliminated from regular pathkeys.
- to combat the issues mentioned earlier, there's now a check in build_index_paths that checks if the query_pathkeys matches the useful_pathkeys. Note that we have to use the path keys here rather than any of the unique keys. The unique keys are only Expr nodes - they do not contain the necessary information about ordering. Due to elimination of some constant path keys, we have to search the attributes of the index to find the correct prefix to use in skipping.
- creating the skip scan path did not actually fill the Path's unique keys. It should just set this to query_uniquekeys.

I've attached the first two unique-keys patches (v9, 0001, 0002)), your patches, but rebased on v9 of unique keys (0003-0006) + a diff patch (0007) that applies my suggested changes on top of it.

-Floris


0001-Introduce-RelOptInfo-notnullattrs-attribute.patch (6K) Download Attachment
0002-Introduce-UniqueKey-attributes-on-RelOptInfo-struct.patch (80K) Download Attachment
0003-Extend-UniqueKeys.patch (17K) Download Attachment
0004-Index-skip-scan.patch (52K) Download Attachment
0005-Btree-implementation-of-skipping.patch (54K) Download Attachment
0006-Index-skip-scan-documentation.patch (6K) Download Attachment
0007-planner-fixes.patch (15K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

RE: Index Skip Scan (new UniqueKeys)

Floris Van Nee
>
> I've attached the first two unique-keys patches (v9, 0001, 0002)), your
> patches, but rebased on v9 of unique keys (0003-0006) + a diff patch (0007)
> that applies my suggested changes on top of it.
>

I just realized there's another thing that looks a bit strange too. From reading the thread, I thought it should be the case that in create_distinct_paths, it is checked whether the uniqueness in the unique_pathlist matches the uniqueness that is needed by the query.
This means that I think what it should be comparing is this:
- The generated index path should have some path-level unique keys set
- The path-level unique keys must be at least as strict as the path-level unique keys. Eg. if path-level is unique on (a), then query-level must be (a), or possibly (a,b).

I've changed the patch to compare the path-level keys (set in create index path) with the query-level keys in create_distinct_path. Currently, I don't think the previous implementation was an actual issue leading to incorrect queries, but it would be causing problems if we tried to extend the uniqueness for distinct to join rels etc.

One question about the unique keys - probably for Andy or David: I've looked in the archives to find arguments for/against using Expr nodes or EquivalenceClasses in the Unique Keys patch. However, I couldn't really find a clear answer about why the current patch uses Expr rather than EquivalenceClasses. At some point David mentioned "that probably Expr nodes were needed rather than EquivalenceClasses", but it's not really clear to me why. What were the thoughts behind this?

-Floris


0001-Introduce-RelOptInfo-notnullattrs-attribute.patch (6K) Download Attachment
0002-Introduce-UniqueKey-attributes-on-RelOptInfo-struct.patch (80K) Download Attachment
0003-Extend-UniqueKeys.patch (17K) Download Attachment
0004-Index-skip-scan.patch (52K) Download Attachment
0005-Btree-implementation-of-skipping.patch (54K) Download Attachment
0006-Index-skip-scan-documentation.patch (6K) Download Attachment
0007-planner-fixes.patch (19K) Download Attachment