Yet another fast GiST build

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
122 messages Options
1234567
Reply | Threaded
Open this post in threaded view
|

Re: Yet another fast GiST build

Heikki Linnakangas
On 11/09/2020 09:02, Andrey M. Borodin wrote:

>> 10 сент. 2020 г., в 17:43, Heikki Linnakangas <[hidden email]>
>> написал(а):
>>
>> One more patch version attached. I fixed some memory leaks, and
>> fixed the abbreviation on 32-bit systems, and a bunch of small
>> comment changes and such.
>>
>> <v18-0001-Add-support-for-building-GiST-index-by-sorting.patch>
>
> The patch looks fine to me. On my machine GiST for points is builded
> 10x faster than before the patch.
Another patch version, fixed a few small bugs pointed out by assertion
failures in the regression tests.

- Heikki

v19-0001-Add-support-for-building-GiST-index-by-sorting.patch (49K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Yet another fast GiST build

Andrey Borodin-2


> 15 сент. 2020 г., в 16:36, Heikki Linnakangas <[hidden email]> написал(а):
>
> Another patch version, fixed a few small bugs pointed out by assertion failures in the regression tests.
>
> - Heikki
> <v19-0001-Add-support-for-building-GiST-index-by-sorting.patch>

These changes in create_index.out do not seem correct to me

 SELECT * FROM point_tbl ORDER BY f1 <-> '0,1';
         f1        
 -------------------
- (0,0)
  (1e-300,-1e-300)
+ (0,0)

I did not figure out the root cause yet. We do not touch anything related to distance computation..

Best regards, Andrey Borodin.

Reply | Threaded
Open this post in threaded view
|

Re: Yet another fast GiST build

Heikki Linnakangas
On 15/09/2020 19:46, Andrey M. Borodin wrote:

>
>
>> 15 сент. 2020 г., в 16:36, Heikki Linnakangas <[hidden email]> написал(а):
>>
>> Another patch version, fixed a few small bugs pointed out by assertion failures in the regression tests.
>>
>> - Heikki
>> <v19-0001-Add-support-for-building-GiST-index-by-sorting.patch>
>
> These changes in create_index.out do not seem correct to me
>
>   SELECT * FROM point_tbl ORDER BY f1 <-> '0,1';
>           f1
>   -------------------
> - (0,0)
>    (1e-300,-1e-300)
> + (0,0)
>
> I did not figure out the root cause yet. We do not touch anything related to distance computation..

Ah yeah, that's subtle. Those rows are considered to be equally distant
from (0, 1), given the precision of the <-> operator:

regression=#  SELECT f1, f1 <-> '0,1' FROM point_tbl ORDER BY f1 <-> '0,1';
         f1         |     ?column?
-------------------+------------------
  (0,0)             |                1
  (1e-300,-1e-300)  |                1
  (-3,4)            | 4.24264068711929
  (-10,0)           | 10.0498756211209
  (10,10)           | 13.4536240470737
  (-5,-12)          | 13.9283882771841
  (5.1,34.5)        |  33.885985303662
  (1e+300,Infinity) |         Infinity
  (NaN,NaN)         |              NaN
                    |
(10 rows)

It is arbitrary which one you get first.

It's not very nice to have a not-well defined order of rows in the
expected output, as it could change in the future if we change the index
build algorithm again. But we have plenty of cases that depend on the
physical row order, and it's not like this changes very often, so I
think it's ok to just memorize the new order in the expected output.

- Heikki


Reply | Threaded
Open this post in threaded view
|

Re: Yet another fast GiST build

Andrey Borodin-2


> 15 сент. 2020 г., в 22:07, Heikki Linnakangas <[hidden email]> написал(а):
>
> regression=#  SELECT f1, f1 <-> '0,1' FROM point_tbl ORDER BY f1 <-> '0,1';
>        f1         |     ?column?
> -------------------+------------------
> (0,0)             |                1
> (1e-300,-1e-300)  |                1
> (-3,4)            | 4.24264068711929
> (-10,0)           | 10.0498756211209
> (10,10)           | 13.4536240470737
> (-5,-12)          | 13.9283882771841
> (5.1,34.5)        |  33.885985303662
> (1e+300,Infinity) |         Infinity
> (NaN,NaN)         |              NaN
>                   |
> (10 rows)
>
> It is arbitrary which one you get first.
>
> It's not very nice to have a not-well defined order of rows in the expected output, as it could change in the future if we change the index build algorithm again. But we have plenty of cases that depend on the physical row order, and it's not like this changes very often, so I think it's ok to just memorize the new order in the expected output.


I think this is valid reasoning. GiST choose subtree algorithm is not deterministic, it calls random(), but not in tested paths.
I was thinking that machine epsilon is near 1e-300, but I was wrong. It's actually near 1e-15.

Actually, I just want to understand what changes between v18 and v19 changed on-page order of items. I look into patch diff and cannot figure it out. There are only logging changes. How this affects scan?

Best regards, Andrey Borodin.

Reply | Threaded
Open this post in threaded view
|

Re: Yet another fast GiST build

Heikki Linnakangas
On 16/09/2020 10:27, Andrey M. Borodin wrote:
> Actually, I just want to understand what changes between v18 and v19
> changed on-page order of items. I look into patch diff and cannot
> figure it out. There are only logging changes. How this affects
> scan?

The test was failing with v18 too.

- Heikki


Reply | Threaded
Open this post in threaded view
|

Re: Yet another fast GiST build

Kyotaro Horiguchi-4
In reply to this post by Andrey Borodin-2
At Wed, 16 Sep 2020 12:27:09 +0500, "Andrey M. Borodin" <[hidden email]> wrote in
> I was thinking that machine epsilon is near 1e-300, but I was
> wrong. It's actually near 1e-15.

FWIW, the mantissa of double is effectively 52+1 bits, about 15.9
digits. so 1+(1e-16) is basically indistincitve from
1+(2e-16). Actually two double precisions 1+2e-16 and 1+3e-16 are
indistinctive from each other.

> Actually, I just want to understand what changes between v18 and v19 changed on-page order of items. I look into patch diff and cannot figure it out. There are only logging changes. How this affects scan?

FWIW, I saw the same symptom by my another patch after adding a value
to POINT_TBL. (But I didn't pursue the cause further..)

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


Reply | Threaded
Open this post in threaded view
|

Re: Yet another fast GiST build

Heikki Linnakangas
In reply to this post by Heikki Linnakangas
On 15/09/2020 14:36, Heikki Linnakangas wrote:
> Another patch version, fixed a few small bugs pointed out by assertion
> failures in the regression tests.

Pushed. Thanks everyone!

- Heikki


Reply | Threaded
Open this post in threaded view
|

Re: Yet another fast GiST build

Andrey Borodin-2


> 17 сент. 2020 г., в 13:38, Heikki Linnakangas <[hidden email]> написал(а):
>
> On 15/09/2020 14:36, Heikki Linnakangas wrote:
>> Another patch version, fixed a few small bugs pointed out by assertion
>> failures in the regression tests.
>
> Pushed. Thanks everyone!

That's wonderful! Thank you, Heikki!

Best regards, Andrey Borodin.

Reply | Threaded
Open this post in threaded view
|

Re: Yet another fast GiST build

Justin Pryzby
In reply to this post by Heikki Linnakangas
On Thu, Sep 17, 2020 at 11:38:47AM +0300, Heikki Linnakangas wrote:
> On 15/09/2020 14:36, Heikki Linnakangas wrote:
> > Another patch version, fixed a few small bugs pointed out by assertion
> > failures in the regression tests.
>
> Pushed. Thanks everyone!

+/* FIXME: bump this before pushing! */
 #define CATALOG_VERSION_NO     202009031



Reply | Threaded
Open this post in threaded view
|

Re: Yet another fast GiST build

Tom Lane-2
In reply to this post by Heikki Linnakangas
Heikki Linnakangas <[hidden email]> writes:
> Pushed. Thanks everyone!

It appears that hyrax (CLOBBER_CACHE_ALWAYS) is not very happy
with this:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax&dt=2020-09-19%2021%3A27%3A23

We have a recent pass from prion, showing that -DRELCACHE_FORCE_RELEASE
-DCATCACHE_FORCE_RELEASE doesn't cause a problem, so maybe hyrax's
result is just random cosmic rays or something.  But I doubt it.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Yet another fast GiST build

Tom Lane-2
I wrote:
> It appears that hyrax (CLOBBER_CACHE_ALWAYS) is not very happy
> with this:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax&dt=2020-09-19%2021%3A27%3A23

I reproduced that and traced it to a missing RelationOpenSmgr call.
Fixed now.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Yet another fast GiST build

Justin Pryzby
On Sun, Sep 20, 2020 at 05:10:05PM -0400, Tom Lane wrote:
> I wrote:
> > It appears that hyrax (CLOBBER_CACHE_ALWAYS) is not very happy
> > with this:
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax&dt=2020-09-19%2021%3A27%3A23
>
> I reproduced that and traced it to a missing RelationOpenSmgr call.
> Fixed now.

This also appears to break checksums.

postgres=#  CREATE TABLE pvactst (i INT, a INT[], p POINT) with (autovacuum_enabled = off);
postgres=#  CREATE INDEX gist_pvactst ON pvactst USING gist (p);
postgres=#  INSERT INTO pvactst SELECT i, array[1,2,3], point(i, i+1) FROM generate_series(1,1000) i;
WARNING:  page verification failed, calculated checksum 34313 but expected 0
ERROR:  invalid page in block 0 of relation base/12859/16389

I was able to make this work like so:

@@ -449,6 +450,7 @@ gist_indexsortbuild(GISTBuildState *state)
 
        /* Write out the root */
        PageSetLSN(pagestate->page, GistBuildLSN);
+       PageSetChecksumInplace(pagestate->page, GIST_ROOT_BLKNO, state->indexrel->rd_smgr);
        smgrwrite(state->indexrel->rd_smgr, MAIN_FORKNUM, GIST_ROOT_BLKNO,
                          pagestate->page, true);
        if (RelationNeedsWAL(state->indexrel))
@@ -555,6 +557,7 @@ gist_indexsortbuild_flush_ready_pages(GISTBuildState *state)
 
                PageSetLSN(page, GistBuildLSN);
 
+               PageSetChecksumInplace(page, state->pages_written, state->indexrel->rd_smgr);
                smgrextend(state->indexrel->rd_smgr,
                                   MAIN_FORKNUM,
                                   state->pages_written++,

--
Justin


Reply | Threaded
Open this post in threaded view
|

Re: Yet another fast GiST build

Tom Lane-2
Justin Pryzby <[hidden email]> writes:
> This also appears to break checksums.

I was wondering about that, because the typical pattern for use of
smgrextend for indexes seems to be

        RelationOpenSmgr(rel);
        PageSetChecksumInplace(page, lastblock);
        smgrextend(rel->rd_smgr, MAIN_FORKNUM, lastblock, zerobuf.data, false);

and gist_indexsortbuild wasn't doing either of the first two things.

gist_indexsortbuild_flush_ready_pages looks like it might be
a few bricks shy of a load too.  But my local CLOBBER_CACHE_ALWAYS
run hasn't gotten to anything except the pretty-trivial index
made in point.sql, so I don't have evidence about it.

Another interesting point is that all the other index AMs seem to WAL-log
the new page before the smgrextend call, whereas this code is doing it
in the other order.  I strongly doubt that both patterns are equally
correct.  Could be that the other AMs are in the wrong though.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Yet another fast GiST build

Heikki Linnakangas
On 21/09/2020 02:06, Tom Lane wrote:
> Justin Pryzby <[hidden email]> writes:
>> This also appears to break checksums.

Thanks, I'll go fix it.

> I was wondering about that, because the typical pattern for use of
> smgrextend for indexes seems to be
>
> RelationOpenSmgr(rel);
> PageSetChecksumInplace(page, lastblock);
> smgrextend(rel->rd_smgr, MAIN_FORKNUM, lastblock, zerobuf.data, false);
>
> and gist_indexsortbuild wasn't doing either of the first two things.
>
> gist_indexsortbuild_flush_ready_pages looks like it might be
> a few bricks shy of a load too.  But my local CLOBBER_CACHE_ALWAYS
> run hasn't gotten to anything except the pretty-trivial index
> made in point.sql, so I don't have evidence about it.

I don't think a relcache invalidation can happen on the index we're
building. Other similar callers call RelationOpenSmgr(rel) before every
write though (e.g. _bt_blwritepage()), so perhaps it's better to copy
that pattern here too.

> Another interesting point is that all the other index AMs seem to WAL-log
> the new page before the smgrextend call, whereas this code is doing it
> in the other order.  I strongly doubt that both patterns are equally
> correct.  Could be that the other AMs are in the wrong though.

My thinking was that it's better to call smgrextend() first, so that if
you run out of disk space, you get the error before WAL-logging it. That
reduces the chance that WAL replay will run out of disk space. A lot of
things are different during WAL replay, so it's quite likely that WAL
replay runs out of disk space anyway if you're living on the edge, but
still.

I didn't notice that the other callers are doing it the other way round,
though. I think they need to, so that they can stamp the page with the
LSN of the WAL record. But GiST build is special in that regard, because
it stamps all pages with GistBuildLSN.

- Heikki


Reply | Threaded
Open this post in threaded view
|

Re: Yet another fast GiST build

Heikki Linnakangas
On 21/09/2020 11:08, Heikki Linnakangas wrote:
> I think they need to, so that they can stamp the page with the LSN of
> the WAL record. But GiST build is special in that regard, because it
> stamps all pages with GistBuildLSN.

Actually, don't we have a problem with that, even before this patch?
Even though we set the LSN to the magic GistBuildLSN value when we build
the index, WAL replay will write the LSN of the record instead. That
would mess with the LSN-NSN interlock. After WAL replay (or in a
streaming replica), a scan on the GiST index might traverse right-links
unnecessarily.

- Heikki


Reply | Threaded
Open this post in threaded view
|

Re: Yet another fast GiST build

Andrey Borodin-2


> 21 сент. 2020 г., в 13:45, Heikki Linnakangas <[hidden email]> написал(а):
>
> On 21/09/2020 11:08, Heikki Linnakangas wrote:
>> I think they need to, so that they can stamp the page with the LSN of
>> the WAL record. But GiST build is special in that regard, because it
>> stamps all pages with GistBuildLSN.
>
> Actually, don't we have a problem with that, even before this patch? Even though we set the LSN to the magic GistBuildLSN value when we build the index, WAL replay will write the LSN of the record instead. That would mess with the LSN-NSN interlock. After WAL replay (or in a streaming replica), a scan on the GiST index might traverse right-links unnecessarily.

I think we don't set rightlinks during index build.

Best regards, Andrey Borodin.

Reply | Threaded
Open this post in threaded view
|

Re: Yet another fast GiST build

Heikki Linnakangas
In reply to this post by Tom Lane-2
On 21/09/2020 02:06, Tom Lane wrote:
> Justin Pryzby <[hidden email]> writes:
>> This also appears to break checksums.

Fixed, thanks for the report!

> I was wondering about that, because the typical pattern for use of
> smgrextend for indexes seems to be
>
> RelationOpenSmgr(rel);
> PageSetChecksumInplace(page, lastblock);
> smgrextend(rel->rd_smgr, MAIN_FORKNUM, lastblock, zerobuf.data, false);
>
> and gist_indexsortbuild wasn't doing either of the first two things.
>
> gist_indexsortbuild_flush_ready_pages looks like it might be
> a few bricks shy of a load too.  But my local CLOBBER_CACHE_ALWAYS
> run hasn't gotten to anything except the pretty-trivial index
> made in point.sql, so I don't have evidence about it.

I added a RelationOpenSmgr() call there too, although it's not needed
currently. It seems to be enough to do it before the first smgrextend()
call. But if you removed or refactored the first call someohow, so it
was not the first call anymore, it would be easy to miss that you'd
still need the RelationOpenSmgr() call there. It's more consistent with
the code in nbtsort.c now, too.

- Heikki


Reply | Threaded
Open this post in threaded view
|

Re: Yet another fast GiST build

Heikki Linnakangas
In reply to this post by Andrey Borodin-2
On 21/09/2020 12:06, Andrey M. Borodin wrote:

>> 21 сент. 2020 г., в 13:45, Heikki Linnakangas <[hidden email]>
>> написал(а):
>>
>> Actually, don't we have a problem with that, even before this
>> patch? Even though we set the LSN to the magic GistBuildLSN value
>> when we build the index, WAL replay will write the LSN of the
>> record instead. That would mess with the LSN-NSN interlock. After
>> WAL replay (or in a streaming replica), a scan on the GiST index
>> might traverse right-links unnecessarily.
>
> I think we don't set rightlinks during index build.

The new GiST sorting code does not, but the regular insert-based code does.

That's a bit questionable in the new code actually. Was that a conscious
decision? The right-links are only needed when there are concurrent page
splits, so I think it's OK, but the checks for InvalidBlockNumber in
gistScanPage() and gistFindPage() have comment "/* sanity check */".
Comment changes are needed, at least.

- Heikki


Reply | Threaded
Open this post in threaded view
|

Re: Yet another fast GiST build

Andrey Borodin-2


> 21 сент. 2020 г., в 17:15, Heikki Linnakangas <[hidden email]> написал(а):
>
> On 21/09/2020 12:06, Andrey M. Borodin wrote
>>
>> I think we don't set rightlinks during index build.
>
> The new GiST sorting code does not, but the regular insert-based code does.
>
> That's a bit questionable in the new code actually. Was that a conscious
> decision? The right-links are only needed when there are concurrent page
> splits, so I think it's OK, but the checks for InvalidBlockNumber in
> gistScanPage() and gistFindPage() have comment "/* sanity check */".
> Comment changes are needed, at least.

It was a conscious decision with incorrect motivation. I was thinking that it will help to reduce number of "false positive" inspecting right pages. But now I see that:
1. There should be no such "false positives" that we can avoid
2. Valid rightlinks could help to do amcheck verification in future

But thing that bothers me now: when we vacuum leaf page, we bump it's NSN. But we do not bump internal page LSN. Does this means we will follow rightlinks after vacuum? It seems superflous. And btw we did not adjust internal page tuples after vacuum...

Best regards, Andrey Borodin.

Reply | Threaded
Open this post in threaded view
|

Re: Yet another fast GiST build

Andrey Borodin-2


> 21 сент. 2020 г., в 18:29, Andrey M. Borodin <[hidden email]> написал(а):
>
> It was a conscious decision with incorrect motivation. I was thinking that it will help to reduce number of "false positive" inspecting right pages. But now I see that:
> 1. There should be no such "false positives" that we can avoid
> 2. Valid rightlinks could help to do amcheck verification in future

Well, point number 2 here is invalid. There exist one leaf page p, so that if we start traversing rightlink from p we will reach all leaf pages. But we practically have no means to find this page. This makes rightlinks not very helpful in amcheck for GiST.

But for consistency I think it worth to install them.

Thanks!

Best regards, Andrey Borodin.


0001-Install-rightlinks-on-GiST-pages-in-case-of-sorting-.patch (2K) Download Attachment
1234567