Yet another fast GiST build

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
133 messages Options
1 ... 4567
Reply | Threaded
Open this post in threaded view
|

Re: Yet another fast GiST build

Peter Geoghegan-4
On Sun, Jan 17, 2021 at 3:04 PM Peter Geoghegan <[hidden email]> wrote:
> I personally agree with you - it's not like there aren't other ways
> for superusers to crash the server (most of which seem very similar to
> this gist_page_items() issue, in fact). I just think that it's worth
> being clear about that being a trade-off that we've accepted.

Can we rename gist_page_items_bytea() to gist_page_items(), and at the
same time rename the current gist_page_items() -- perhaps call it
gist_page_items_output()?

That way we could add a bt_page_items_output() function later, while
leaving everything consistent (actually not quite, since
bt_page_items() outputs text instead of bytea -- but that seems worth
fixing too). This also has the merit of making the unsafe "output"
variant into the special case.

--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: Yet another fast GiST build

Heikki Linnakangas
On 18/01/2021 01:10, Peter Geoghegan wrote:

> On Sun, Jan 17, 2021 at 3:04 PM Peter Geoghegan <[hidden email]> wrote:
>> I personally agree with you - it's not like there aren't other ways
>> for superusers to crash the server (most of which seem very similar to
>> this gist_page_items() issue, in fact). I just think that it's worth
>> being clear about that being a trade-off that we've accepted.
>
> Can we rename gist_page_items_bytea() to gist_page_items(), and at the
> same time rename the current gist_page_items() -- perhaps call it
> gist_page_items_output()?
>
> That way we could add a bt_page_items_output() function later, while
> leaving everything consistent (actually not quite, since
> bt_page_items() outputs text instead of bytea -- but that seems worth
> fixing too). This also has the merit of making the unsafe "output"
> variant into the special case.

bt_page_items() and bt_page_items_bytea() exist already. And
brin_page_items() also calls the output functions (there's no bytea
version of that). Perhaps it would've been better to make the
bytea-variants the default, but I'm afraid that ship has already sailed.

We're not terribly consistent; heap_page_items(), hash_page_items() and
gin_page_items() don't attempt to call the output functions.

Then again, I don't think we need to worry much about backwards
compatibility in pageinspect, so I guess we could rename them all. It
doesn't bother me enough to make me do it, but I won't object if you
want to.

- Heikki


Reply | Threaded
Open this post in threaded view
|

Re: Yet another fast GiST build

Ibrar Ahmed-4
In reply to this post by Heikki Linnakangas


On Mon, Jan 18, 2021 at 3:52 AM Heikki Linnakangas <[hidden email]> wrote:
On 18/01/2021 00:35, Peter Geoghegan wrote:
> On Sun, Jan 17, 2021 at 12:50 PM Tom Lane <[hidden email]> wrote:
>> I noticed that gist_page_items() thinks it can hold inter_call_data->rel
>> open across a series of calls.  That's completely unsafe: the executor
>> might not run the call series to completion (see LIMIT), resulting in
>> relcache leak complaints.

Fixed, thanks! I changed it to return a tuplestore.

> It also has the potential to run into big problems should the user
> input a raw page image with an regclass-argument-incompatible tuple
> descriptor. Maybe that's okay (this is a tool for experts), but it
> certainly is a consideration.

I'm not sure I understand. It's true that the raw page image can contain
data from a different index, or any garbage really. And the function
will behave badly if you do that. That's an accepted risk with
pageinspect functions, that's why they're superuser-only, although some
of them are more tolerant of corrupt pages than others. The
gist_page_items_bytea() variant doesn't try to parse the key data and is
less likely to crash on bad input.

- Heikki


The patch (0001-Add-bool-column-for-LP_DEAF-flag-to-GiST-pageinspect.patch
does not apply successfully and has multiple hanks failed. 
http://cfbot.cputube.org/patch_32_2824.log
patching file contrib/pageinspect/gistfuncs.c
Hunk #1 FAILED at 151.
Hunk #2 FAILED at 175.
Hunk #3 FAILED at 245.
Hunk #4 FAILED at 271.
...
Can we get a rebase? 

I am marking the patch "Waiting on Author"

--
Ibrar Ahmed
Reply | Threaded
Open this post in threaded view
|

Re: Yet another fast GiST build

Peter Geoghegan-4
On Mon, Mar 8, 2021 at 6:41 AM Ibrar Ahmed <[hidden email]> wrote:
> The patch (0001-Add-bool-column-for-LP_DEAF-flag-to-GiST-pageinspect.patch)
> does not apply successfully and has multiple hanks failed.

That's because it was committed.

--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: Yet another fast GiST build

Ibrar Ahmed-4


On Mon, Mar 8, 2021 at 8:59 PM Peter Geoghegan <[hidden email]> wrote:
On Mon, Mar 8, 2021 at 6:41 AM Ibrar Ahmed <[hidden email]> wrote:
> The patch (0001-Add-bool-column-for-LP_DEAF-flag-to-GiST-pageinspect.patch)
> does not apply successfully and has multiple hanks failed.

That's because it was committed.

Thanks for the clarification, its status was not changed which confused me :)
 
 
--
Peter Geoghegan


--
Ibrar Ahmed
Reply | Threaded
Open this post in threaded view
|

Re: Yet another fast GiST build

Andrey Borodin-2
Thanks, Ibrar!

> 8 марта 2021 г., в 21:15, Ibrar Ahmed <[hidden email]> написал(а):
>
>
>
> On Mon, Mar 8, 2021 at 8:59 PM Peter Geoghegan <[hidden email]> wrote:
> On Mon, Mar 8, 2021 at 6:41 AM Ibrar Ahmed <[hidden email]> wrote:
> > The patch (0001-Add-bool-column-for-LP_DEAF-flag-to-GiST-pageinspect.patch)
> > does not apply successfully and has multiple hanks failed.
>
> That's because it was committed.
>
> Thanks for the clarification, its status was not changed which confused me :)
>
There were numerous GiST-build-related patches in this thread. Yet uncommitted is a patch with sortsupport routines for btree_gist contrib module.
Here's its version which needs review.

Thanks for bringing this up!

Best regards, Andrey Borodin.


v5-0001-Sortsupport-for-sorting-GiST-build-for-gist_btree.patch (110K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Yet another fast GiST build

Heikki Linnakangas
On 08/03/2021 19:06, Andrey Borodin wrote:
> There were numerous GiST-build-related patches in this thread. Yet uncommitted is a patch with sortsupport routines for btree_gist contrib module.
> Here's its version which needs review.

Reviewing this now again. One thing caught my eye:

> +static int
> +gbt_bit_sort_build_cmp(Datum a, Datum b, SortSupport ssup)
> +{
> + return DatumGetInt32(DirectFunctionCall2(byteacmp,
> + PointerGetDatum(a),
> + PointerGetDatum(b)));
> +}

That doesn't quite match the sort order used by the comparison
functions, gbt_bitlt and such. The comparison functions compare the bits
first, and use the length as a tie-breaker. Using byteacmp() will
compare the "bit length" first. However, gbt_bitcmp() also uses
byteacmp(), so I'm a bit confused. So, huh?

- Heikki


Reply | Threaded
Open this post in threaded view
|

Re: Yet another fast GiST build

Heikki Linnakangas
On 07/04/2021 09:00, Heikki Linnakangas wrote:
> On 08/03/2021 19:06, Andrey Borodin wrote:
>> There were numerous GiST-build-related patches in this thread. Yet uncommitted is a patch with sortsupport routines for btree_gist contrib module.
>> Here's its version which needs review.

Committed with small fixes. I changed the all functions to use
*GetDatum() and DatumGet*() macros, instead of just comparing Datums
with < and >. Datum is unsigned, while int2, int4, int8 and money are
signed, so that changes the sort order around 0 for those types to be
the same as the picksplit and picksplit functions use. Not a correctness
issue, the sorting only affects the quality of the index, but let's be tidy.

This issue remains:

> Reviewing this now again. One thing caught my eye:
>
>> +static int
>> +gbt_bit_sort_build_cmp(Datum a, Datum b, SortSupport ssup)
>> +{
>> + return DatumGetInt32(DirectFunctionCall2(byteacmp,
>> + PointerGetDatum(a),
>> + PointerGetDatum(b)));
>> +}
>
> That doesn't quite match the sort order used by the comparison
> functions, gbt_bitlt and such. The comparison functions compare the bits
> first, and use the length as a tie-breaker. Using byteacmp() will
> compare the "bit length" first. However, gbt_bitcmp() also uses
> byteacmp(), so I'm a bit confused. So, huh?

Since we used byteacmp() previously for picksplit, too, this is
consistent with the order you got previously. It still seems wrong to me
and should be investigated, but it doesn't need to block this patch.

- Heikki


Reply | Threaded
Open this post in threaded view
|

Re: Yet another fast GiST build

Heikki Linnakangas
In reply to this post by Heikki Linnakangas
On 07/04/2021 09:00, Heikki Linnakangas wrote:

> On 08/03/2021 19:06, Andrey Borodin wrote:
>> There were numerous GiST-build-related patches in this thread. Yet uncommitted is a patch with sortsupport routines for btree_gist contrib module.
>> Here's its version which needs review.
>
> Reviewing this now again. One thing caught my eye:
>
>> +static int
>> +gbt_bit_sort_build_cmp(Datum a, Datum b, SortSupport ssup)
>> +{
>> + return DatumGetInt32(DirectFunctionCall2(byteacmp,
>> + PointerGetDatum(a),
>> + PointerGetDatum(b)));
>> +}
>
> That doesn't quite match the sort order used by the comparison
> functions, gbt_bitlt and such. The comparison functions compare the bits
> first, and use the length as a tie-breaker. Using byteacmp() will
> compare the "bit length" first. However, gbt_bitcmp() also uses
> byteacmp(), so I'm a bit confused. So, huh?

Ok, I think I understand that now. In btree_gist, the *_cmp() function
operates on non-leaf values, and *_lt(), *_gt() et al operate on leaf
values. For all other datatypes, the leaf and non-leaf representation is
the same, but for bit/varbit, the non-leaf representation is different.
The leaf representation is VarBit, and non-leaf is just the bits without
the 'bit_len' field. That's why it is indeed correct for gbt_bitcmp() to
just use byteacmp(), whereas gbt_bitlt() et al compares the 'bit_len'
field separately. That's subtle, and 100% uncommented.

What that means for this patch is that gbt_bit_sort_build_cmp() should
*not* call byteacmp(), but bitcmp(). Because it operates on the original
datatype stored in the table.

- Heikki


Reply | Threaded
Open this post in threaded view
|

Re: Yet another fast GiST build

Andrey Borodin-2


> 7 апр. 2021 г., в 13:23, Heikki Linnakangas <[hidden email]> написал(а):
>
> Committed with small fixes.

Thanks!

> 7 апр. 2021 г., в 14:56, Heikki Linnakangas <[hidden email]> написал(а):
>
> Ok, I think I understand that now. In btree_gist, the *_cmp() function operates on non-leaf values, and *_lt(), *_gt() et al operate on leaf values. For all other datatypes, the leaf and non-leaf representation is the same, but for bit/varbit, the non-leaf representation is different. The leaf representation is VarBit, and non-leaf is just the bits without the 'bit_len' field. That's why it is indeed correct for gbt_bitcmp() to just use byteacmp(), whereas gbt_bitlt() et al compares the 'bit_len' field separately. That's subtle, and 100% uncommented.
>
> What that means for this patch is that gbt_bit_sort_build_cmp() should *not* call byteacmp(), but bitcmp(). Because it operates on the original datatype stored in the table.

+1
Thanks for investigating this.
If I understand things right, adding test values with different lengths of bit sequences would not uncover the problem anyway?

Best regards, Andrey Borodin.

Reply | Threaded
Open this post in threaded view
|

Re: Yet another fast GiST build

Heikki Linnakangas
On 07/04/2021 15:12, Andrey Borodin wrote:

>> 7 апр. 2021 г., в 14:56, Heikki Linnakangas <[hidden email]>
>> написал(а):
>>
>> Ok, I think I understand that now. In btree_gist, the *_cmp()
>> function operates on non-leaf values, and *_lt(), *_gt() et al
>> operate on leaf values. For all other datatypes, the leaf and
>> non-leaf representation is the same, but for bit/varbit, the
>> non-leaf representation is different. The leaf representation is
>> VarBit, and non-leaf is just the bits without the 'bit_len' field.
>> That's why it is indeed correct for gbt_bitcmp() to just use
>> byteacmp(), whereas gbt_bitlt() et al compares the 'bit_len' field
>> separately. That's subtle, and 100% uncommented.
>>
>> What that means for this patch is that gbt_bit_sort_build_cmp()
>> should *not* call byteacmp(), but bitcmp(). Because it operates on
>> the original datatype stored in the table.
>
> +1 Thanks for investigating this. If I understand things right,
> adding test values with different lengths of bit sequences would not
> uncover the problem anyway?

That's right, the only consequence of a "wrong" sort order is that the
quality of the tree suffers, and scans need to scan more pages
unnecessarily.

I tried to investigate this by creating a varbit index with and without
sorting, and compared them with pageinspect, but in quick testing, I
wasn't able to find cases where the sorted version was badly ordered. I
guess I didn't find the right data set yet.

- Heikki


Reply | Threaded
Open this post in threaded view
|

Re: Yet another fast GiST build

Andrey Borodin-2


> 7 апр. 2021 г., в 16:18, Heikki Linnakangas <[hidden email]> написал(а):
>

I see there is a problem with "SET client_min_messages = DEBUG1;" on buildfarm. I think these checks were necessary to make sure test paths are triggered. When we know that code paths are tested, maybe we can omit checks?

Best regards, Andrey Borodin.

Reply | Threaded
Open this post in threaded view
|

Re: Yet another fast GiST build

Heikki Linnakangas
On 07/04/2021 22:41, Andrey Borodin wrote:
> I see there is a problem with "SET client_min_messages = DEBUG1;" on
> buildfarm. I think these checks were necessary to make sure test
> paths are triggered. When we know that code paths are tested, maybe
> we can omit checks?
Yeah. We don't have very reliable coverage of different GiST build
methods, as noted earlier in this thread. But that's not this patch's fault.

I've been investigating the varbit issue, and realized that all the
comparison functions in this patch for varlen datatypes are broken. The
representation that the sortsupport function sees is the one that the
'compress' function returns. The fixed-length versions got this right,
but the varlen versions assume that the input is a Datum of the original
datatype. It happens to not crash, because the representation returned
by gbt_var_compress() is also a varlena, and all of the comparison
functions tolerate the garbage inputs, but it's bogus. The correct
pattern would be something like this (without the debugging NOTICE, of
course):

> static int
> gbt_text_sort_build_cmp(Datum a, Datum b, SortSupport ssup)
> {
> GBT_VARKEY_R ra = gbt_var_key_readable((GBT_VARKEY *) PG_DETOAST_DATUM(a));
> GBT_VARKEY_R rb = gbt_var_key_readable((GBT_VARKEY *) PG_DETOAST_DATUM(b));
>
> int x = DatumGetInt32(DirectFunctionCall2Coll(bttextcmp,
> ssup->ssup_collation,
> PointerGetDatum(a),
> PointerGetDatum(b)));
> elog(NOTICE, "cmp: %s vs %s: %d",
> TextDatumGetCString(ra.lower),
> TextDatumGetCString(rb.lower),
> x);
> return x;
> }
- Heikki


1 ... 4567