Hi!
In many cases GiST index can be build fast using z-order sorting. I've looked into proof of concept by Nikita Glukhov [0] and it looks very interesting. So, I've implemented yet another version of B-tree-like GiST build. It's main use case and benefits can be summarized with small example: postgres=# create table x as select point (random(),random()) from generate_series(1,3000000,1); SELECT 3000000 Time: 5061,967 ms (00:05,062) postgres=# create index ON x using gist (point ) with (fast_build_sort_function=gist_point_sortsupport); CREATE INDEX Time: 6140,227 ms (00:06,140) postgres=# create index ON x using gist (point ); CREATE INDEX Time: 32061,200 ms (00:32,061) As you can see, Z-order build is on order of magnitude faster. Select performance is roughly the same. Also, index is significantly smaller. Nikita's PoC is faster because it uses parallel build, but it intervenes into B-tree code a lot (for reuse). This patchset is GiST-isolated. My biggest concern is that passing function to relation option seems a bit hacky. You can pass there any function matching sort support signature. Embedding this function into opclass makes no sense: it does not affect scan anyhow. In current version, docs and tests are not implemented. I want to discuss overall design. Do we really want yet another GiST build, if it is 3-10 times faster? Thanks! Best regards, Andrey Borodin. [0] https://github.com/postgres/postgres/compare/master...glukhovn:gist_btree_build ![]() ![]() ![]() |
Hello, This is very interesting. In my pipeline currently GiST index rebuild is the biggest time consuming step. I believe introducing optional concept of order in the GiST opclass will be beneficial not only for fast build, but for other tasks later: - CLUSTER can order the table using that notion, in parallel way. - btree_gist can be even closer to btree by getting the tuples sorted inside page. - tree descend on insertion in future can traverse the list in more opportunistic way, calculating penalty for siblings-by-order first. I believe everywhere the idea of ordering is needed it's provided by giving a btree opclass. How about giving a link to btree opclass inside a gist opclass? On Mon, Aug 26, 2019 at 10:59 AM Andrey Borodin <[hidden email]> wrote: Hi! Darafei Praliaskouski Support me: http://patreon.com/komzpa |
In reply to this post by Andrey Borodin-2
On 26/08/2019 10:59, Andrey Borodin wrote:
> Hi! > > In many cases GiST index can be build fast using z-order sorting. > > I've looked into proof of concept by Nikita Glukhov [0] and it looks very interesting. > So, I've implemented yet another version of B-tree-like GiST build. Cool! > My biggest concern is that passing function to relation option seems > a bit hacky. You can pass there any function matching sort support > signature. Embedding this function into opclass makes no sense: it > does not affect scan anyhow. I think it should be a new, optional, GiST "AM support function", in pg_amproc. That's how the sort support functions are defined for B-tree opfamilies, too. - Heikki |
In reply to this post by Andrey Borodin-2
On Mon, Aug 26, 2019 at 10:59 AM Andrey Borodin <[hidden email]> wrote:
> In many cases GiST index can be build fast using z-order sorting. > > I've looked into proof of concept by Nikita Glukhov [0] and it looks very interesting. > So, I've implemented yet another version of B-tree-like GiST build. > It's main use case and benefits can be summarized with small example: > > postgres=# create table x as select point (random(),random()) from generate_series(1,3000000,1); > SELECT 3000000 > Time: 5061,967 ms (00:05,062) > postgres=# create index ON x using gist (point ) with (fast_build_sort_function=gist_point_sortsupport); > CREATE INDEX > Time: 6140,227 ms (00:06,140) > postgres=# create index ON x using gist (point ); > CREATE INDEX > Time: 32061,200 ms (00:32,061) > > As you can see, Z-order build is on order of magnitude faster. Select performance is roughly the same. Also, index is significantly smaller. Cool! These experiments bring me to following thoughts. Can we not only build, but also maintain GiST indexes in B-tree-like manner? If we put Z-value together with MBR to the non-leaf keys, that should be possible. Maintaining it in B-tree-like manner would have a lot of advantages. 1) Binary search in non-leaf pages instead of probing each key is much faster. 2) Split algorithm is also simpler and faster. 3) We can refind existing index tuple in predictable time of O(log N). And this doesn't depend on MBR overlapping degree. This is valuable for future table AMs, which would have notion of deleting individual index tuples (for instance, zheap promises to eventually support delete-marking indexes). Eventually, we may come to the idea of B-tree indexes with user-defined additional keys in non-leaf tuples, which could be used for scanning. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company |
On Thu, Aug 29, 2019 at 3:48 PM Alexander Korotkov
<[hidden email]> wrote: > > As you can see, Z-order build is on order of magnitude faster. Select performance is roughly the same. Also, index is significantly smaller. > > Cool! These experiments bring me to following thoughts. Can we not > only build, but also maintain GiST indexes in B-tree-like manner? If > we put Z-value together with MBR to the non-leaf keys, that should be > possible. Maintaining it in B-tree-like manner would have a lot of > advantages. I'm not an expert on GiST, but that seems like it would have a lot of advantages in the long term. It is certainly theoretically appealing. Could this make it easier to use merge join with containment operators? I'm thinking of things like geospatial joins, which can generally only be performed as nested loop joins at the moment. This is often wildly inefficient. -- Peter Geoghegan |
On Fri, Aug 30, 2019 at 2:34 AM Peter Geoghegan <[hidden email]> wrote:
> On Thu, Aug 29, 2019 at 3:48 PM Alexander Korotkov > <[hidden email]> wrote: > > > As you can see, Z-order build is on order of magnitude faster. Select performance is roughly the same. Also, index is significantly smaller. > > > > Cool! These experiments bring me to following thoughts. Can we not > > only build, but also maintain GiST indexes in B-tree-like manner? If > > we put Z-value together with MBR to the non-leaf keys, that should be > > possible. Maintaining it in B-tree-like manner would have a lot of > > advantages. > > I'm not an expert on GiST, but that seems like it would have a lot of > advantages in the long term. It is certainly theoretically appealing. > > Could this make it easier to use merge join with containment > operators? I'm thinking of things like geospatial joins, which can > generally only be performed as nested loop joins at the moment. This > is often wildly inefficient. AFAICS, spatial joins aren't going to be as easy as just merge joins on Z-value. When searching for close points in two datasets (closer than given threshold) we can scan some ranges of Z-value in one dataset while iterating on another. But dealing with prolonged spatial objects in not that easy. In order to determine if there are matching values in given Z-range you also need to be aware on size of objects inside that Z-range. So, before merge-like join you need to not only sort, but build some index-like structure. Alternatively you can encode size in Z-value. But this increases dimensionality of space and decreases efficiency of join. Also, spatial join can be made using two indexes, even just current GiST without Z-values. We've prototyped that, see [1]. Links 1. https://github.com/pgsphere/pgsphere/blob/crossmatch_cnode/crossmatch.c ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company |
On Thu, Aug 29, 2019 at 8:22 PM Alexander Korotkov
<[hidden email]> wrote: > Alternatively you can encode size in Z-value. But this increases > dimensionality of space and decreases efficiency of join. Also, > spatial join can be made using two indexes, even just current GiST > without Z-values. We've prototyped that, see [1]. I'm pretty sure that spatial joins generally need two spatial indexes (usually R-Trees). There seems to have been quite a lot of research in it in the 1990s. -- Peter Geoghegan |
In reply to this post by Alexander Korotkov
30 авг. 2019 г., в 3:47, Alexander Korotkov <[hidden email]> написал(а): for two sets of tuples X and Y if for any i,o from N, Xi < Yo does not guaranty union(X) < union (Y) For example consider this z-ordered keyspace (picture attached) ![]() union(5, 9) is z-order-smaller than union(4,4) I'm not even sure we can use sorted search for choosing subtree for insertion. How do you think, should I supply GiST-build patch with docs and tests and add it to CF? Or do we need more design discussion before? Best regards, Andrey Borodin.
|
> 30 авг. 2019 г., в 16:44, Andrey Borodin <[hidden email]> написал(а): > > > How do you think, should I supply GiST-build patch with docs and tests and add it to CF? Or do we need more design discussion before? PFA v2: now sort support is part of opclass. There's a problem with Z-ordering NaN which causes regression tests to fail. So I decided not to add patch to CF (despite having few minutes to do so). How to correctly Z-order NaN? So that it would be consistent with semantics of union() and consistent() functions. If one of values is NaN, then we consider all it's bits to be 1? BTW patch uses union { float f; uint32 i; } I hope it's OK, because AFAIK we do not have non-IEEE-754 platforms now. Thanks! Best regards, Andrey Borodin. ![]() ![]() |
In reply to this post by Andrey Borodin-2
On Fri, Aug 30, 2019 at 2:44 PM Andrey Borodin <[hidden email]> wrote:
> > 30 авг. 2019 г., в 3:47, Alexander Korotkov <[hidden email]> написал(а): > > 1) Binary search in non-leaf pages instead of probing each key is much faster. > > > That's a neat idea, but key union breaks ordering, even for z-order. > for two sets of tuples X and Y > if for any i,o from N, Xi < Yo > does not guaranty union(X) < union (Y) > > For example consider this z-ordered keyspace (picture attached) > > union(5, 9) is z-order-smaller than union(4,4) > > I'm not even sure we can use sorted search for choosing subtree for insertion. Sorry, I didn't explain my proposal in enough details. I didn't mean B-tree separator keys would be the same as union key (MBR). I mean B-tree on Z-values, which maintains union key in addition to separator keys. So, you select downlink to insert using separator Z-values and then also extend union key (if needed). It's probably still not enough detail yet. I'll try to spend more time for more detailed description later. > > How do you think, should I supply GiST-build patch with docs and tests and add it to CF? Or do we need more design discussion before? +1 for adding to CF. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company |
In reply to this post by Peter Geoghegan-4
On Fri, Aug 30, 2019 at 6:28 AM Peter Geoghegan <[hidden email]> wrote:
> On Thu, Aug 29, 2019 at 8:22 PM Alexander Korotkov > <[hidden email]> wrote: > > Alternatively you can encode size in Z-value. But this increases > > dimensionality of space and decreases efficiency of join. Also, > > spatial join can be made using two indexes, even just current GiST > > without Z-values. We've prototyped that, see [1]. > > I'm pretty sure that spatial joins generally need two spatial indexes > (usually R-Trees). There seems to have been quite a lot of research in > it in the 1990s. Sure, our prototype was an implementation of one of such papers. My point is that advantages of Z-value ordered GiST for spatial joins are not yet clear for me. Except faster build and smaller index, which are general advantages. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company |
In reply to this post by Andrey Borodin-2
> 1 сент. 2019 г., в 15:53, Andrey Borodin <[hidden email]> написал(а): > > > <v2-0001-Add-sort-support-for-point-gist_point_sortsupport.patch><v2-0002-Implement-GiST-build-using-sort-support.patch> Here's V3 of the patch set. Changes: 1. Added some documentation of new sort support routines 2. Fixed bug with dirty pages I did not add sort support procs to built-in boxes, circles and polys, since it may be not optimal index for them. However, for points Z-order is quite good as a default. Tests only pass with fixes for GiST KNN from Alexander in other thread. Thanks! Best regards, Andrey Borodin. ![]() ![]() |
On Sun, Sep 08, 2019 at 01:54:35PM +0500, Andrey Borodin wrote:
> Here's V3 of the patch set. > Changes: > 1. Added some documentation of new sort support routines > 2. Fixed bug with dirty pages > > I did not add sort support procs to built-in boxes, circles and > polys, since it may be not optimal index for them. However, for > points Z-order is quite good as a defaul t. The latest patch does not apply. Could you send a rebase? I have moved the patch to next CF, waiting on author for now. -- Michael |
> 1 дек. 2019 г., в 7:06, Michael Paquier <[hidden email]> написал(а): > > On Sun, Sep 08, 2019 at 01:54:35PM +0500, Andrey Borodin wrote: >> Here's V3 of the patch set. >> Changes: >> 1. Added some documentation of new sort support routines >> 2. Fixed bug with dirty pages >> >> I did not add sort support procs to built-in boxes, circles and >> polys, since it may be not optimal index for them. However, for >> points Z-order is quite good as a defaul t. > > The latest patch does not apply. Could you send a rebase? I have > moved the patch to next CF, waiting on author for now. PFA rebased patch. Best regards, Andrey Borodin. ![]() ![]() |
On Mon, Dec 30, 2019 at 7:43 PM Andrey Borodin <[hidden email]> wrote:
> PFA rebased patch. Hi Andrey, This looks really interesting, and I am sure there are a lot of GIS people who would love to see dramatically faster and smaller indexes in PG13. I don't know enough to comment on the details, but here are some superficial comments: + method is also optional and is used diring fast GiST build. -> during + /* esteblish order between x and y */ -> establish +/* Compute Z-oder for point */ static inline uint64 point_zorder_internal(Point *p) -> order Could this function please have a comment that explains why it works? I mean, just a breadcrumb... the name of the technique or something... so that uninitiated hackers can google their way to a clue (is it "Morton encoding"?) MSVC says: src/backend/access/gist/gistproc.c(1582): error C2065: 'INT32_MAX' : undeclared identifier GCC says: gistbuild.c: In function ‘gist_indexsortbuild’: gistbuild.c:256:4: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement] IndexTuple *itvec = gistextractpage(lower_page, &vect_len); ^ |
On Wed, Feb 19, 2020 at 8:00 PM Thomas Munro <[hidden email]> wrote:
> Could this function please have a comment that explains why it works? > I mean, just a breadcrumb... the name of the technique or something... > so that uninitiated hackers can google their way to a clue (is it > "Morton encoding"?) Ok I think I get it now after doing some homework. 1. We expect floats to be in IEEE format, and the sort order of IEEE floats is mostly correlated to the binary sort order of the bits reinterpreted as an int. It isn't in some special cases, but for this use case we don't really care about that, we're just trying to encourage locality. 2. We generate a Morton code that interleaves the bits of N integers to produce a single integer that preserves locality: things that were close in the N dimensional space are close in the resulting integer. Cool. +static int +my_fastcmp(Datum x, Datum y, SortSupport ssup) +{ + /* esteblish order between x and y */ + + return z1 == z2 ? 0 : z1 > z2 ? 1 : -1; +} This example code from the documentation looks wrong, probably missing eg int64 z1 = DatumGetInt64(x). |
On Thu, Feb 20, 2020 at 10:14 AM Thomas Munro <[hidden email]> wrote:
> 1. We expect floats to be in IEEE format, and the sort order of IEEE > floats is mostly correlated to the binary sort order of the bits > reinterpreted as an int. It isn't in some special cases, but for this > use case we don't really care about that, we're just trying to > encourage locality. I suppose there is a big jump in integer value (whether signed or unsigned) as you cross from positive to negative floats, and then the sort order is reversed. I have no idea if either of those things is a problem worth fixing. That made me wonder if there might also be an endianness problem. It seems from some quick googling that all current architectures have integers and floats of the same endianness. Apparently this wasn't always the case, and some ARMs have a weird half-flipped arrangement for 64 bit floats, but not 32 bit floats as you are using here. |
Hi Thomas!
Thanks for looking into this! I’ll fix your notices asap. > On 24 февр. 2020 г., at 01:58, Thomas Munro <[hidden email]> wrote: > > On Thu, Feb 20, 2020 at 10:14 AM Thomas Munro <[hidden email]> wrote: >> 1. We expect floats to be in IEEE format, and the sort order of IEEE >> floats is mostly correlated to the binary sort order of the bits >> reinterpreted as an int. It isn't in some special cases, but for this >> use case we don't really care about that, we're just trying to >> encourage locality. > > I suppose there is a big jump in integer value (whether signed or > unsigned) as you cross from positive to negative floats, and then the > sort order is reversed. I have no idea if either of those things is a > problem worth fixing. That made me wonder if there might also be an > endianness problem. It seems from some quick googling that all > current architectures have integers and floats of the same endianness. > Apparently this wasn't always the case, and some ARMs have a weird > half-flipped arrangement for 64 bit floats, but not 32 bit floats as > you are using here. Yes, this leap is a problem for point as generic data type. And I do not know how to fix it. It can cause inefficient Index Scans when searching near (0,0) and query window touches simultaneously all quadrants (4x slower). But everything will be just fine when all data is in 2nd quadrant. Actually, we do not need to add this hacky code to core: we can provide colum-wise ordering or something similar as an example. This feature is aimed at PostGIS and they already possess bit tricks tricks [0]. I’ve taken this union code from PostGIS. Thanks! Best regards, Andrey Borodin. [0] https://github.com/postgis/postgis/blob/master/postgis/gserialized_gist_nd.c#L1150 |
Hi!
> On 24 февр. 2020 г., at 13:50, Andrey M. Borodin <[hidden email]> wrote: > > Hi Thomas! > > Thanks for looking into this! I’ll fix your notices asap. PFA v5. Thomas, I've used your wording almost exactly with explanation how point_zorder_internal() works. It has more explanation power than my attempts to compose good comment. There is one design decision that worries me most: should we use opclass function or index option to provide this sorting information? It is needed only during index creation, actually. And having extra i-class only for fast build seems excessive. I think we can provide both ways and let opclass developers decide? Thanks! Best regards, Andrey Borodin. ![]() ![]() |
On 2020-02-29 13:13, Andrey M. Borodin wrote:
> Hi! > >> On 24 февр. 2020 г., at 13:50, Andrey M. Borodin >> <[hidden email]> wrote: >> >> Hi Thomas! >> >> Thanks for looking into this! I’ll fix your notices asap. > > PFA v5. > Thomas, I've used your wording almost exactly with explanation how > point_zorder_internal() works. It has more explanation power than my > attempts > to compose good comment. Small typo alert: In v5-0002-Implement-GiST-build-using-sort-support.patch there is: + method is also optional and is used diring fast GiST build. 'diring' should be 'during' |
Free forum by Nabble | Edit this page |