Yet another fast GiST build

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
133 messages Options
1 ... 34567
Reply | Threaded
Open this post in threaded view
|

Re: Yet another fast GiST build

Peter Geoghegan-4
On Mon, Dec 7, 2020 at 2:05 AM Andrey Borodin <[hidden email]> wrote:
> Here's version with tests and docs. I still have no idea how to print some useful information about tuples keys.

I suggest calling BuildIndexValueDescription() from your own custom
debug instrumentation code.

--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: Yet another fast GiST build

Andrey Borodin-2


> 7 дек. 2020 г., в 23:56, Peter Geoghegan <[hidden email]> написал(а):
>
> On Mon, Dec 7, 2020 at 2:05 AM Andrey Borodin <[hidden email]> wrote:
>> Here's version with tests and docs. I still have no idea how to print some useful information about tuples keys.
>
> I suggest calling BuildIndexValueDescription() from your own custom
> debug instrumentation code.
Thanks for the hint, Peter!
This function does exactly what I want to do. But I have no Relation inside gist_page_items(bytea) function... probably, I'll add gist_page_items(relname, blockno) overload to fetch keys.

Thanks!

Best regards, Andrey Borodin.

Reply | Threaded
Open this post in threaded view
|

Re: Yet another fast GiST build

Andrey Borodin-2


> 9 дек. 2020 г., в 14:47, Andrey Borodin <[hidden email]> написал(а):
>
>
>
>> 7 дек. 2020 г., в 23:56, Peter Geoghegan <[hidden email]> написал(а):
>>
>> On Mon, Dec 7, 2020 at 2:05 AM Andrey Borodin <[hidden email]> wrote:
>>> Here's version with tests and docs. I still have no idea how to print some useful information about tuples keys.
>>
>> I suggest calling BuildIndexValueDescription() from your own custom
>> debug instrumentation code.
> Thanks for the hint, Peter!
> This function does exactly what I want to do. But I have no Relation inside gist_page_items(bytea) function... probably, I'll add gist_page_items(relname, blockno) overload to fetch keys.
PFA patch with implementation.

Best regards, Andrey Borodin.





v3-0001-Add-functions-to-pageinspect-to-inspect-GiST-inde.patch (38K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Yet another fast GiST build

Heikki Linnakangas
On 10/12/2020 12:16, Andrey Borodin wrote:

>> 9 дек. 2020 г., в 14:47, Andrey Borodin <[hidden email]> написал(а):
>>> 7 дек. 2020 г., в 23:56, Peter Geoghegan <[hidden email]> написал(а):
>>>
>>> On Mon, Dec 7, 2020 at 2:05 AM Andrey Borodin <[hidden email]> wrote:
>>>> Here's version with tests and docs. I still have no idea how to print some useful information about tuples keys.
>>>
>>> I suggest calling BuildIndexValueDescription() from your own custom
>>> debug instrumentation code.
>> Thanks for the hint, Peter!
>> This function does exactly what I want to do. But I have no Relation inside gist_page_items(bytea) function... probably, I'll add gist_page_items(relname, blockno) overload to fetch keys.
>
> PFA patch with implementation.
I did a bit of cleanup on the function signature. The .sql script
claimed that gist_page_items() took bytea as argument, but in reality it
was a relation name, as text. I changed it so that it takes a page image
as argument, instead of reading the block straight from the index.
Mainly to make it consistent with brin_page_items(), if it wasn't for
that precedence I might've gone either way on it.

Fixed the docs accordingly, and ran pgindent. New patch version attached.

- Heikki

v4-0001-Add-functions-to-pageinspect-to-inspect-GiST-inde.patch (24K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Yet another fast GiST build

Andrey Borodin-2


> 12 янв. 2021 г., в 18:49, Heikki Linnakangas <[hidden email]> написал(а):
>
>> PFA patch with implementation.
>
> I did a bit of cleanup on the function signature. The .sql script claimed that gist_page_items() took bytea as argument, but in reality it was a relation name, as text. I changed it so that it takes a page image as argument, instead of reading the block straight from the index. Mainly to make it consistent with brin_page_items(), if it wasn't for that precedence I might've gone either way on it.
bt_page_items() takes relation name and block number, that was a reason for doing so. But all others *_page_items() (heap, brin, hash) are doing as in v4. So I think it's more common way.

>
> Fixed the docs accordingly, and ran pgindent. New patch version attached.

Thanks! Looks good to me.

One more question: will bytea tests run correctly on 32bit\different-endian systems?

Best regards, Andrey Borodin.

Reply | Threaded
Open this post in threaded view
|

Re: Yet another fast GiST build

Heikki Linnakangas
On 12/01/2021 18:19, Andrey Borodin wrote:
>> 12 янв. 2021 г., в 18:49, Heikki Linnakangas <[hidden email]>
>> написал(а):
>>
>> Fixed the docs accordingly, and ran pgindent. New patch version
>> attached.
>
> Thanks! Looks good to me.

Pushed, thanks!

> One more question: will bytea tests run correctly on
> 32bit\different-endian systems?

Good question. Somehow I thought we were printing esseantilly text
values as bytea. But they are Points, which consists of float8's. Since
I already pushed this, I'm going to just wait and see what the buildfarm
says, and fix if needed. I think the fix is going to be to just remove
the test for the bytea-variant, it doesn't seem that important to test.

- Heikki


Reply | Threaded
Open this post in threaded view
|

Re: Yet another fast GiST build

Andrey Borodin-2


> 13 янв. 2021 г., в 13:41, Heikki Linnakangas <[hidden email]> написал(а):
>
>> One more question: will bytea tests run correctly on
>> 32bit\different-endian systems?
>
> Good question. Somehow I thought we were printing esseantilly text values as bytea. But they are Points, which consists of float8's. Since I already pushed this, I'm going to just wait and see what the buildfarm says, and fix if needed. I think the fix is going to be to just remove the test for the bytea-variant, it doesn't seem that important to test.

Maybe we can just omit key_data from tests?

Best regards, Andrey Borodin.

Reply | Threaded
Open this post in threaded view
|

Re: Yet another fast GiST build

Heikki Linnakangas
On 13/01/2021 11:46, Andrey Borodin wrote:

>> 13 янв. 2021 г., в 13:41, Heikki Linnakangas <[hidden email]>
>> написал(а):
>>
>>> One more question: will bytea tests run correctly on
>>> 32bit\different-endian systems?
>>
>> Good question. Somehow I thought we were printing esseantilly text
>> values as bytea. But they are Points, which consists of float8's.
>> Since I already pushed this, I'm going to just wait and see what
>> the buildfarm says, and fix if needed. I think the fix is going to
>> be to just remove the test for the bytea-variant, it doesn't seem
>> that important to test.
>
> Maybe we can just omit key_data from tests?

Make sense, fixed it that way. Thanks!

- Heikki


Reply | Threaded
Open this post in threaded view
|

Re: Yet another fast GiST build

Heikki Linnakangas
On 13/01/2021 12:34, Heikki Linnakangas wrote:

> On 13/01/2021 11:46, Andrey Borodin wrote:
>>> 13 янв. 2021 г., в 13:41, Heikki Linnakangas <[hidden email]>
>>> написал(а):
>>>
>>>> One more question: will bytea tests run correctly on
>>>> 32bit\different-endian systems?
>>>
>>> Good question. Somehow I thought we were printing esseantilly text
>>> values as bytea. But they are Points, which consists of float8's.
>>> Since I already pushed this, I'm going to just wait and see what
>>> the buildfarm says, and fix if needed. I think the fix is going to
>>> be to just remove the test for the bytea-variant, it doesn't seem
>>> that important to test.
>>
>> Maybe we can just omit key_data from tests?
>
> Make sense, fixed it that way. Thanks!

Buildfarm animal thorntail is still not happy:

> --- /home/nm/farm/sparc64_deb10_gcc_64_ubsan/HEAD/pgsql.build/../pgsql/contrib/pageinspect/expected/gist.out 2021-01-13 13:38:09.721752365 +0300
> +++ /home/nm/farm/sparc64_deb10_gcc_64_ubsan/HEAD/pgsql.build/contrib/pageinspect/results/gist.out 2021-01-13 14:12:21.540046507 +0300
> @@ -3,21 +3,21 @@
>  CREATE INDEX test_gist_idx ON test_gist USING gist (p);
>  -- Page 0 is the root, the rest are leaf pages
>  SELECT * FROM gist_page_opaque_info(get_raw_page('test_gist_idx', 0));
> - lsn | nsn | rightlink  | flags
> ------+-----+------------+-------
> - 0/1 | 0/0 | 4294967295 | {}
> +    lsn     | nsn | rightlink  | flags
> +------------+-----+------------+-------
> + 0/1B8357F8 | 0/0 | 4294967295 | {}
>  (1 row)

Looks like the LSN on the page is not set to GistBuildLSN as expected.
Weird.

Thorntail is a sparc64 system, so little-endian, but the other
little-endian buildfarm members are not reporting this error. Any idea
what might be going on?

- Heikki


Reply | Threaded
Open this post in threaded view
|

Re: Yet another fast GiST build

Tom Lane-2
In reply to this post by Heikki Linnakangas
Heikki Linnakangas <[hidden email]> writes:
> On 13/01/2021 11:46, Andrey Borodin wrote:
>> Maybe we can just omit key_data from tests?

> Make sense, fixed it that way. Thanks!

thorntail, at least, is still unhappy.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Yet another fast GiST build

Heikki Linnakangas-8
In reply to this post by Heikki Linnakangas


On 13 January 2021 13:53:39 EET, Heikki Linnakangas <[hidden email]> wrote:

>Buildfarm animal thorntail is still not happy:
>
>> --- /home/nm/farm/sparc64_deb10_gcc_64_ubsan/HEAD/pgsql.build/../pgsql/contrib/pageinspect/expected/gist.out 2021-01-13 13:38:09.721752365 +0300
>> +++ /home/nm/farm/sparc64_deb10_gcc_64_ubsan/HEAD/pgsql.build/contrib/pageinspect/results/gist.out 2021-01-13 14:12:21.540046507 +0300
>> @@ -3,21 +3,21 @@
>>  CREATE INDEX test_gist_idx ON test_gist USING gist (p);
>>  -- Page 0 is the root, the rest are leaf pages
>>  SELECT * FROM gist_page_opaque_info(get_raw_page('test_gist_idx', 0));
>> - lsn | nsn | rightlink  | flags
>> ------+-----+------------+-------
>> - 0/1 | 0/0 | 4294967295 | {}
>> +    lsn     | nsn | rightlink  | flags
>> +------------+-----+------------+-------
>> + 0/1B8357F8 | 0/0 | 4294967295 | {}
>>  (1 row)
>
>Looks like the LSN on the page is not set to GistBuildLSN as expected.
>Weird.
>
>Thorntail is a sparc64 system, so little-endian, but the other
>little-endian buildfarm members are not reporting this error. Any idea
>what might be going on?

Status update on this: I am building Postgres in a qemu sparc64 emulated virtual machine, hoping to be able to reproduce this. It's very slow, so it will take hours still to complete.

I don't think this is a problem with the test, or with the new pageinspect functions, but a genuine bug in the gist building  code. Or there is something special on that animal that causes the just-created index pages to be dirtied. It does seem to happen consistently on thorntail, but not on other animals.

- Heikki


Reply | Threaded
Open this post in threaded view
|

Re: Yet another fast GiST build

Heikki Linnakangas
On 13 January 2021 20:04:10 EET, Heikki Linnakangas <[hidden email]> wrote:

>
>
>On 13 January 2021 13:53:39 EET, Heikki Linnakangas <[hidden email]> wrote:
>>Looks like the LSN on the page is not set to GistBuildLSN as expected.
>>Weird.
>>
>>Thorntail is a sparc64 system, so little-endian, but the other
>>little-endian buildfarm members are not reporting this error. Any idea
>>what might be going on?
>
>Status update on this: I am building Postgres in a qemu sparc64 emulated virtual machine, hoping to be able to reproduce this. It's very slow, so it will take hours still to complete.
>
>I don't think this is a problem with the test, or with the new pageinspect functions, but a genuine bug in the gist building  code. Or there is something special on that animal that causes the just-created index pages to be dirtied. It does seem to happen consistently on thorntail, but not on other animals.

Ah, silly me. Thorntail uses "wal_level=minimal". With that, I can readily reproduce this.

I'm still not sure why it happens, but now it should be straightforward to debug.

- Heikki


Reply | Threaded
Open this post in threaded view
|

Re: Yet another fast GiST build

Peter Geoghegan-4
In reply to this post by Heikki Linnakangas
On Tue, Jan 12, 2021 at 5:49 AM Heikki Linnakangas <[hidden email]> wrote:
> I did a bit of cleanup on the function signature. The .sql script
> claimed that gist_page_items() took bytea as argument, but in reality it
> was a relation name, as text. I changed it so that it takes a page image
> as argument, instead of reading the block straight from the index.
> Mainly to make it consistent with brin_page_items(), if it wasn't for
> that precedence I might've gone either way on it.

BTW it would be nice if gist_page_items() had a "dead" boolean output
argument for the item's LP_DEAD bit, just like bt_page_items(). I plan
on adding some testing for GiST's opportunistic index deletion soon. I
may also add some of the same enhancements that nbtree got today
(following commit d168b666).

This feature was originally heavily based on the nbtree LP_DEAD
deletion mechanism (now called simple deletion), and I see no reason
(or at least no good reason) why it shouldn't be possible to keep it
in sync (except maybe with bottom-up deletion, where that it at least
isn't straightforward/mechanical).

--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: Yet another fast GiST build

Andrey Borodin-2


> 14 янв. 2021 г., в 04:47, Peter Geoghegan <[hidden email]> написал(а):
>
> On Tue, Jan 12, 2021 at 5:49 AM Heikki Linnakangas <[hidden email]> wrote:
>> I did a bit of cleanup on the function signature. The .sql script
>> claimed that gist_page_items() took bytea as argument, but in reality it
>> was a relation name, as text. I changed it so that it takes a page image
>> as argument, instead of reading the block straight from the index.
>> Mainly to make it consistent with brin_page_items(), if it wasn't for
>> that precedence I might've gone either way on it.
>
> BTW it would be nice if gist_page_items() had a "dead" boolean output
> argument for the item's LP_DEAD bit, just like bt_page_items().
+1. PFA patch.

> I plan
> on adding some testing for GiST's opportunistic index deletion soon. I
> may also add some of the same enhancements that nbtree got today
> (following commit d168b666).
>
> This feature was originally heavily based on the nbtree LP_DEAD
> deletion mechanism (now called simple deletion), and I see no reason
> (or at least no good reason) why it shouldn't be possible to keep it
> in sync (except maybe with bottom-up deletion, where that it at least
> isn't straightforward/mechanical).
Sound great!

Best regards, Andrey Borodin.

0001-Add-bool-column-for-LP_DEAF-flag-to-GiST-pageinspect.patch (7K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Yet another fast GiST build

Peter Eisentraut-7
In reply to this post by Heikki Linnakangas
On 2021-01-12 14:49, Heikki Linnakangas wrote:

>>>> I suggest calling BuildIndexValueDescription() from your own custom
>>>> debug instrumentation code.
>>> Thanks for the hint, Peter!
>>> This function does exactly what I want to do. But I have no Relation inside gist_page_items(bytea) function... probably, I'll add gist_page_items(relname, blockno) overload to fetch keys.
>>
>> PFA patch with implementation.
>
> I did a bit of cleanup on the function signature. The .sql script
> claimed that gist_page_items() took bytea as argument, but in reality it
> was a relation name, as text. I changed it so that it takes a page image
> as argument, instead of reading the block straight from the index.
> Mainly to make it consistent with brin_page_items(), if it wasn't for
> that precedence I might've gone either way on it.

I noticed this patch while working on another patch for pageinspect [0],
and this one appears to introduce a problem similar to the one the other
patch attempts to fix: The "itemlen" output parameters are declared to
be of type smallint, but the underlying C data is of type uint16
(OffsetNumber).  I don't know the details of gist enough to determine
whether overflow is possible here.  If not, perhaps a check or at least
a comment would be useful.  Otherwise, these parameters should be of
type int in SQL.

[0]:
https://www.postgresql.org/message-id/09e2dd82-4eb6-bba1-271a-d2b58bf6c71f@...


Reply | Threaded
Open this post in threaded view
|

Re: Yet another fast GiST build

Andrey Borodin-2


> 15 янв. 2021 г., в 10:24, Peter Eisentraut <[hidden email]> написал(а):
>
> I noticed this patch while working on another patch for pageinspect [0], and this one appears to introduce a problem similar to the one the other patch attempts to fix: The "itemlen" output parameters are declared to be of type smallint, but the underlying C data is of type uint16 (OffsetNumber).  I don't know the details of gist enough to determine whether overflow is possible here.  If not, perhaps a check or at least a comment would be useful.  Otherwise, these parameters should be of type int in SQL.

Item offsets cannot exceed maximum block size of 32768. And even 32768/sizeof(ItemId). Thus overflow is impossible.
Interesting question is wether pageinspect should protect itself from corrupted input?
Generating description from bogus tuple, probably, can go wrong.

Best regards, Andrey Borodin.

Reply | Threaded
Open this post in threaded view
|

Re: Yet another fast GiST build

Tom Lane-2
In reply to this post by Heikki Linnakangas
Heikki Linnakangas <[hidden email]> writes:
> On 12/01/2021 18:19, Andrey Borodin wrote:
>> Thanks! Looks good to me.

> Pushed, thanks!

I noticed that gist_page_items() thinks it can hold inter_call_data->rel
open across a series of calls.  That's completely unsafe: the executor
might not run the call series to completion (see LIMIT), resulting in
relcache leak complaints.  I suspect that it might have cache-flush
hazards even without that.  I think this code needs to be rewritten to do
all the interesting work in the first call.  Or maybe better, return the
results as a tuplestore so you don't have to do multiple calls at all.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Yet another fast GiST build

Peter Geoghegan-4
On Sun, Jan 17, 2021 at 12:50 PM Tom Lane <[hidden email]> wrote:
> I noticed that gist_page_items() thinks it can hold inter_call_data->rel
> open across a series of calls.  That's completely unsafe: the executor
> might not run the call series to completion (see LIMIT), resulting in
> relcache leak complaints.

It also has the potential to run into big problems should the user
input a raw page image with an regclass-argument-incompatible tuple
descriptor. Maybe that's okay (this is a tool for experts), but it
certainly is a consideration.

--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: Yet another fast GiST build

Heikki Linnakangas
On 18/01/2021 00:35, Peter Geoghegan wrote:
> On Sun, Jan 17, 2021 at 12:50 PM Tom Lane <[hidden email]> wrote:
>> I noticed that gist_page_items() thinks it can hold inter_call_data->rel
>> open across a series of calls.  That's completely unsafe: the executor
>> might not run the call series to completion (see LIMIT), resulting in
>> relcache leak complaints.

Fixed, thanks! I changed it to return a tuplestore.

> It also has the potential to run into big problems should the user
> input a raw page image with an regclass-argument-incompatible tuple
> descriptor. Maybe that's okay (this is a tool for experts), but it
> certainly is a consideration.

I'm not sure I understand. It's true that the raw page image can contain
data from a different index, or any garbage really. And the function
will behave badly if you do that. That's an accepted risk with
pageinspect functions, that's why they're superuser-only, although some
of them are more tolerant of corrupt pages than others. The
gist_page_items_bytea() variant doesn't try to parse the key data and is
less likely to crash on bad input.

- Heikki


Reply | Threaded
Open this post in threaded view
|

Re: Yet another fast GiST build

Peter Geoghegan-4
On Sun, Jan 17, 2021 at 2:52 PM Heikki Linnakangas <[hidden email]> wrote:
> I'm not sure I understand. It's true that the raw page image can contain
> data from a different index, or any garbage really. And the function
> will behave badly if you do that. That's an accepted risk with
> pageinspect functions, that's why they're superuser-only, although some
> of them are more tolerant of corrupt pages than others. The
> gist_page_items_bytea() variant doesn't try to parse the key data and is
> less likely to crash on bad input.

I personally agree with you - it's not like there aren't other ways
for superusers to crash the server (most of which seem very similar to
this gist_page_items() issue, in fact). I just think that it's worth
being clear about that being a trade-off that we've accepted.

--
Peter Geoghegan


1 ... 34567