Bug in GiST paring heap comparator

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Bug in GiST paring heap comparator

Alexander Korotkov
Hi!

Andrey Borodin noticed me that results of some KNN-GIST tests are
obviously wrong and don't match results of sort node.

SELECT * FROM point_tbl ORDER BY f1 <-> '0,1';
        f1
-------------------
 (10,10)
 (NaN,NaN)
 (0,0)
 (1e-300,-1e-300)
 (-3,4)
 (-10,0)
 (-5,-12)
 (5.1,34.5)

 (1e+300,Infinity)
(10 rows)

It appears to be related to implementation of comparison function in
pairing heap used as priority queue for KNN.  It used plain float
comparison, which doesn't handle Inf and Nan values well.  Attached
patch replaced it with float8_cmp_internal().

Also, note that with patch KNN results still don't fully match results
of sort node.

SELECT * FROM point_tbl ORDER BY f1 <-> '0,1';
        f1
-------------------
 (0,0)
 (1e-300,-1e-300)
 (-3,4)
 (-10,0)
 (10,10)
 (-5,-12)
 (5.1,34.5)
 (1e+300,Infinity)

 (NaN,NaN)
(10 rows)

NULL and '(NaN,NaN)' are swapped.  It happens so, because we assume
distance to NULL to be Inf, while float8_cmp_internal() assumes NaN to
be greater than NULL.  If even we would assume distance to NULL to be
Nan, it doesn't guarantee that NULLs would be last.  It looks like we
can handle this only by introduction array of "distance is null" flags
to GISTSearchItem.  But does it worth it?

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

gist-pairing-heap-cmp-fix-1.patch (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Bug in GiST paring heap comparator

Andrey Borodin-2


> 2 сент. 2019 г., в 9:54, Alexander Korotkov <[hidden email]> написал(а):
>
> It appears to be related to implementation of comparison function in
> pairing heap used as priority queue for KNN.  It used plain float
> comparison, which doesn't handle Inf and Nan values well.  Attached
> patch replaced it with float8_cmp_internal().

Thanks! This patch fixes tests of my new GiST build :)

While patch looks good to me, I want to add that that there's a lot of <= and > comparisons in gistproc.c in function:

static float8
computeDistance(bool isLeaf, BOX *box, Point *point)

Should we fix this too? Or add comment why current code is safe.

Thanks!

Best regards, Andrey Borodin.

Reply | Threaded
Open this post in threaded view
|

Re: Bug in GiST paring heap comparator

Heikki Linnakangas
In reply to this post by Alexander Korotkov
On 02/09/2019 07:54, Alexander Korotkov wrote:

> Andrey Borodin noticed me that results of some KNN-GIST tests are
> obviously wrong and don't match results of sort node.
>
> SELECT * FROM point_tbl ORDER BY f1 <-> '0,1';
>          f1
> -------------------
>   (10,10)
>   (NaN,NaN)
>   (0,0)
>   (1e-300,-1e-300)
>   (-3,4)
>   (-10,0)
>   (-5,-12)
>   (5.1,34.5)
>
>   (1e+300,Infinity)
> (10 rows)

So we've memorized incorrect results in the expected output. Ouch!

> It appears to be related to implementation of comparison function in
> pairing heap used as priority queue for KNN.  It used plain float
> comparison, which doesn't handle Inf and Nan values well.  Attached
> patch replaced it with float8_cmp_internal().
>
> Also, note that with patch KNN results still don't fully match results
> of sort node.
>
> SELECT * FROM point_tbl ORDER BY f1 <-> '0,1';
>          f1
> -------------------
>   (0,0)
>   (1e-300,-1e-300)
>   (-3,4)
>   (-10,0)
>   (10,10)
>   (-5,-12)
>   (5.1,34.5)
>   (1e+300,Infinity)
>
>   (NaN,NaN)
> (10 rows)
>
> NULL and '(NaN,NaN)' are swapped.  It happens so, because we assume
> distance to NULL to be Inf, while float8_cmp_internal() assumes NaN to
> be greater than NULL.  If even we would assume distance to NULL to be
> Nan, it doesn't guarantee that NULLs would be last.  It looks like we
> can handle this only by introduction array of "distance is null" flags
> to GISTSearchItem.  But does it worth it?

I don't think we have much choice, returning wrong results is not an
option. At first I thought we could set the "recheck" flag if we see
NULLs or NaNs, but that won't work because rechecking is not supported
with Index-Only Scans.

Looking at the corresponding SP-GiST code,
pairingheap_SpGistSearchItem_cmp() gets this right. I'd suggest
copy-pasting the implementation from there, so that they would be as
similar as possible. (SP-GiST gets away with just one isNull flag,
because it doesn't support multi-column indexes. In GiST, you need an
array, as you said.)

- Heikki


Reply | Threaded
Open this post in threaded view
|

Re: Bug in GiST paring heap comparator

Alexander Korotkov
On Mon, Sep 2, 2019 at 9:28 AM Heikki Linnakangas <[hidden email]> wrote:

> On 02/09/2019 07:54, Alexander Korotkov wrote:
> > NULL and '(NaN,NaN)' are swapped.  It happens so, because we assume
> > distance to NULL to be Inf, while float8_cmp_internal() assumes NaN to
> > be greater than NULL.  If even we would assume distance to NULL to be
> > Nan, it doesn't guarantee that NULLs would be last.  It looks like we
> > can handle this only by introduction array of "distance is null" flags
> > to GISTSearchItem.  But does it worth it?
>
> I don't think we have much choice, returning wrong results is not an
> option. At first I thought we could set the "recheck" flag if we see
> NULLs or NaNs, but that won't work because rechecking is not supported
> with Index-Only Scans.
>
> Looking at the corresponding SP-GiST code,
> pairingheap_SpGistSearchItem_cmp() gets this right. I'd suggest
> copy-pasting the implementation from there, so that they would be as
> similar as possible. (SP-GiST gets away with just one isNull flag,
> because it doesn't support multi-column indexes. In GiST, you need an
> array, as you said.)
Thank you for clarifying my doubts.

Second of attached patches solves this problem.  In this patch
GISTSearchItem have both array of distances and array of null flags at
the end.  They are accessed by a bit cumbersome
GISTSearchItemDistanceValues() and GISTSearchItemDistanceNulls()
macros.  I came up to this, because I didn't like to allocate null
flags or distances in separate chunk of memory.  Also I had idea to
wrap distance and null flag into struct, but I didn't like to change
signature of index_store_float8_orderby_distances() that much.

I'm going to push both if no objections.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

0001-gist-pairing-heap-cmp-fix-2.patch (2K) Download Attachment
0002-knn-gist-fix-null-distances-2.patch (20K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Bug in GiST paring heap comparator

Alexander Korotkov
On Fri, Sep 6, 2019 at 5:44 PM Alexander Korotkov
<[hidden email]> wrote:
> I'm going to push both if no objections.

So, pushed!

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Reply | Threaded
Open this post in threaded view
|

Re: Bug in GiST paring heap comparator

Nikita Glukhov
On 08.09.2019 22:32, Alexander Korotkov wrote:

> On Fri, Sep 6, 2019 at 5:44 PM Alexander Korotkov
> <[hidden email]> wrote:
>> I'm going to push both if no objections.
> So, pushed!

Two years ago there was a similar patch for this issue:
https://www.postgresql.org/message-id/1499c9d0-075a-3014-d2aa-ba59121b3728%40postgrespro.ru

Sorry that I looked at this thread too late.

--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Reply | Threaded
Open this post in threaded view
|

Re: Bug in GiST paring heap comparator

Alexander Korotkov
On Mon, Sep 9, 2019 at 8:32 PM Nikita Glukhov <[hidden email]> wrote:

> On 08.09.2019 22:32, Alexander Korotkov wrote:
>
> > On Fri, Sep 6, 2019 at 5:44 PM Alexander Korotkov
> > <[hidden email]> wrote:
> >> I'm going to push both if no objections.
> > So, pushed!
>
> Two years ago there was a similar patch for this issue:
> https://www.postgresql.org/message-id/1499c9d0-075a-3014-d2aa-ba59121b3728%40postgrespro.ru
>
> Sorry that I looked at this thread too late.

I see, thanks.

You patch seems a bit less cumbersome thanks to using GISTDistance
struct.  You don't have to introduce separate array with null flags.
But it's harder too keep index_store_float8_orderby_distances() used
by both GiST and SP-GiST.

What do you think?  You can rebase your patch can propose that as refactoring.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Reply | Threaded
Open this post in threaded view
|

Re: Bug in GiST paring heap comparator

Nikita Glukhov


On 09.09.2019 22:47, Alexander Korotkov wrote:
On Mon, Sep 9, 2019 at 8:32 PM Nikita Glukhov [hidden email] wrote:
On 08.09.2019 22:32, Alexander Korotkov wrote:

On Fri, Sep 6, 2019 at 5:44 PM Alexander Korotkov
[hidden email] wrote:
I'm going to push both if no objections.
So, pushed!
Two years ago there was a similar patch for this issue:
https://www.postgresql.org/message-id/1499c9d0-075a-3014-d2aa-ba59121b3728%40postgrespro.ru

Sorry that I looked at this thread too late.
I see, thanks.

You patch seems a bit less cumbersome thanks to using GISTDistance
struct.  You don't have to introduce separate array with null flags.
But it's harder too keep index_store_float8_orderby_distances() used
by both GiST and SP-GiST.

What do you think?  You can rebase your patch can propose that as refactoring.
Rebased patch with refactoring is attached.

During this rebase I found that SP-GiST code has similar problems with NULLs.
All SP-GiST functions do not check SK_ISNULL flag of ordering ScanKeys, and
that leads to segfaults.  In the following test, index is searched with 
non-NULL key first and then with NULL key, that leads to a crash:

CREATE TABLE t AS SELECT point(x, y) p 
FROM generate_series(0, 100) x, generate_series(0, 100) y;

CREATE INDEX ON t USING spgist (p);

-- crash
SELECT (SELECT p FROM t ORDER BY p <-> pt LIMIT 1)
FROM (VALUES (point '1,2'), (NULL)) pts(pt);


After adding SK_ISNULL checks and starting to produce NULL distances, we need to 
store NULL flags for distance somewhere.  Existing singleton flag for the whole
SPGistSearchItem is not sufficient anymore.


So, I introduced structure IndexOrderByDistance and used it everywhere in the 
GiST and SP-GiST code instead of raw double distances.


SP-GiST order-by-distance code can be further refactored so that user functions
do not have to worry about SK_ISNULL checks.

--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

0001-Fix-GiST-and-SP-GiST-ordering-by-distance-for-NULLs.patch (30K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Bug in GiST paring heap comparator

Alexander Korotkov
On Wed, Sep 11, 2019 at 3:34 AM Nikita Glukhov <[hidden email]> wrote:

> On 09.09.2019 22:47, Alexander Korotkov wrote:
>
> On Mon, Sep 9, 2019 at 8:32 PM Nikita Glukhov <[hidden email]> wrote:
>
> On 08.09.2019 22:32, Alexander Korotkov wrote:
>
> On Fri, Sep 6, 2019 at 5:44 PM Alexander Korotkov
> <[hidden email]> wrote:
>
> I'm going to push both if no objections.
>
> So, pushed!
>
> Two years ago there was a similar patch for this issue:
> https://www.postgresql.org/message-id/1499c9d0-075a-3014-d2aa-ba59121b3728%40postgrespro.ru
>
> Sorry that I looked at this thread too late.
>
> I see, thanks.
>
> You patch seems a bit less cumbersome thanks to using GISTDistance
> struct.  You don't have to introduce separate array with null flags.
> But it's harder too keep index_store_float8_orderby_distances() used
> by both GiST and SP-GiST.
>
> What do you think?  You can rebase your patch can propose that as refactoring.
>
> Rebased patch with refactoring is attached.
>
> During this rebase I found that SP-GiST code has similar problems with NULLs.
> All SP-GiST functions do not check SK_ISNULL flag of ordering ScanKeys, and
> that leads to segfaults.  In the following test, index is searched with
> non-NULL key first and then with NULL key, that leads to a crash:
>
> CREATE TABLE t AS SELECT point(x, y) p
> FROM generate_series(0, 100) x, generate_series(0, 100) y;
>
> CREATE INDEX ON t USING spgist (p);
>
> -- crash
> SELECT (SELECT p FROM t ORDER BY p <-> pt LIMIT 1)
> FROM (VALUES (point '1,2'), (NULL)) pts(pt);
>
>
> After adding SK_ISNULL checks and starting to produce NULL distances, we need to
> store NULL flags for distance somewhere.  Existing singleton flag for the whole
> SPGistSearchItem is not sufficient anymore.
>
>
> So, I introduced structure IndexOrderByDistance and used it everywhere in the
> GiST and SP-GiST code instead of raw double distances.
>
>
> SP-GiST order-by-distance code can be further refactored so that user functions
> do not have to worry about SK_ISNULL checks.

It doesn't seem SP-GiST inner_consistent and leaf_consistent functions
can handle NULL scan keys for now.  So should we let them handle NULL
orderby keys?  If we assume that NULL arguments produce NULL
distances, we can evade changing the code of opclasses.

Also, I've noticed IndexOrderByDistance is missed in typedefs.list.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Reply | Threaded
Open this post in threaded view
|

Re: Bug in GiST paring heap comparator

Nikita Glukhov

On 12.09.2019 16:45, Alexander Korotkov wrote:

> On Wed, Sep 11, 2019 at 3:34 AM Nikita Glukhov <[hidden email]> wrote:
>> On 09.09.2019 22:47, Alexander Korotkov wrote:
>>
>> On Mon, Sep 9, 2019 at 8:32 PM Nikita Glukhov <[hidden email]> wrote:
>>
>> On 08.09.2019 22:32, Alexander Korotkov wrote:
>>
>> On Fri, Sep 6, 2019 at 5:44 PM Alexander Korotkov
>> <[hidden email]> wrote:
>>
>> I'm going to push both if no objections.
>>
>> So, pushed!
>>
>> Two years ago there was a similar patch for this issue:
>> https://www.postgresql.org/message-id/1499c9d0-075a-3014-d2aa-ba59121b3728%40postgrespro.ru
>>
>> Sorry that I looked at this thread too late.
>>
>> I see, thanks.
>>
>> You patch seems a bit less cumbersome thanks to using GISTDistance
>> struct.  You don't have to introduce separate array with null flags.
>> But it's harder too keep index_store_float8_orderby_distances() used
>> by both GiST and SP-GiST.
>>
>> What do you think?  You can rebase your patch can propose that as refactoring.
>>
>> Rebased patch with refactoring is attached.
>>
>> During this rebase I found that SP-GiST code has similar problems with NULLs.
>> All SP-GiST functions do not check SK_ISNULL flag of ordering ScanKeys, and
>> that leads to segfaults.  In the following test, index is searched with
>> non-NULL key first and then with NULL key, that leads to a crash:
>>
>> CREATE TABLE t AS SELECT point(x, y) p
>> FROM generate_series(0, 100) x, generate_series(0, 100) y;
>>
>> CREATE INDEX ON t USING spgist (p);
>>
>> -- crash
>> SELECT (SELECT p FROM t ORDER BY p <-> pt LIMIT 1)
>> FROM (VALUES (point '1,2'), (NULL)) pts(pt);
>>
>>
>> After adding SK_ISNULL checks and starting to produce NULL distances, we need to
>> store NULL flags for distance somewhere.  Existing singleton flag for the whole
>> SPGistSearchItem is not sufficient anymore.
>>
>>
>> So, I introduced structure IndexOrderByDistance and used it everywhere in the
>> GiST and SP-GiST code instead of raw double distances.
>>
>>
>> SP-GiST order-by-distance code can be further refactored so that user functions
>> do not have to worry about SK_ISNULL checks.
> It doesn't seem SP-GiST inner_consistent and leaf_consistent functions
> can handle NULL scan keys for now.  So should we let them handle NULL
> orderby keys?  If we assume that NULL arguments produce NULL
> distances, we can evade changing the code of opclasses.
I have moved handling of NULL ordering keys from opclasses to the common
SP-GiST code, but really I don't like how it is implemented now. Maybe it's
worth to move handling of NULL order-by keys to the even more higher
level so,
that AM don't have to worry about NULLs?

Also I leaved usages of IndexOrderByDistance in opclasses. I think, that
may
help to minimize opclass changes in the future.

> Also, I've noticed IndexOrderByDistance is missed in typedefs.list.
Fixed.

--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

0001-Fix-GiST-and-SP-GiST-ordering-by-distance-for-NULLs-v2.patch (35K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Bug in GiST paring heap comparator

Alexander Korotkov
On Fri, Sep 13, 2019 at 5:23 PM Nikita Glukhov <[hidden email]> wrote:
> I have moved handling of NULL ordering keys from opclasses to the common
> SP-GiST code, but really I don't like how it is implemented now. Maybe it's
> worth to move handling of NULL order-by keys to the even more higher
> level so,
> that AM don't have to worry about NULLs?

Yes, optimizer could remove away "col op NULL" clauses from ORDER BY
if op is strict operator.  And then such clauses wouldn't be passed to
AM.  But I see this as future improvement.  For backpatching we should
solve this at AM side.

> Also I leaved usages of IndexOrderByDistance in opclasses. I think, that
> may
> help to minimize opclass changes in the future.

Could you please extract this as a separate patch.  We can consider
this for master, but we shouldn't backpatch this.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Reply | Threaded
Open this post in threaded view
|

Re: Bug in GiST paring heap comparator

Nikita Glukhov

On 13.09.2019 20:17, Alexander Korotkov wrote:

On Fri, Sep 13, 2019 at 5:23 PM Nikita Glukhov [hidden email] wrote:
I have moved handling of NULL ordering keys from opclasses to the common
SP-GiST code, but really I don't like how it is implemented now. Maybe it's
worth to move handling of NULL order-by keys to the even more higher
level so,
that AM don't have to worry about NULLs?
Yes, optimizer could remove away "col op NULL" clauses from ORDER BY
if op is strict operator.  And then such clauses wouldn't be passed to
AM.  But I see this as future improvement.  For backpatching we should
solve this at AM side.

Also I leaved usages of IndexOrderByDistance in opclasses. I think, that
may help to minimize opclass changes in the future.
Could you please extract this as a separate patch.  We can consider
this for master, but we shouldn't backpatch this.
Propagation of IndexOrderByDistance in SP-GiST was extracted into a separate 
patch #2.
--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

0001-Fix-GiST-and-SP-GiST-ordering-by-distance-for-NULLs-v3.patch (23K) Download Attachment
0002-Use-IndexOrderByDistance-in-SP-GiST-user-functions-v3.patch (14K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Bug in GiST paring heap comparator

Alexander Korotkov
On Mon, Sep 16, 2019 at 3:47 PM Nikita Glukhov <[hidden email]> wrote:

> On 13.09.2019 20:17, Alexander Korotkov wrote:
>
> On Fri, Sep 13, 2019 at 5:23 PM Nikita Glukhov <[hidden email]> wrote:
>
> I have moved handling of NULL ordering keys from opclasses to the common
> SP-GiST code, but really I don't like how it is implemented now. Maybe it's
> worth to move handling of NULL order-by keys to the even more higher
> level so,
> that AM don't have to worry about NULLs?
>
> Yes, optimizer could remove away "col op NULL" clauses from ORDER BY
> if op is strict operator.  And then such clauses wouldn't be passed to
> AM.  But I see this as future improvement.  For backpatching we should
> solve this at AM side.
>
> Also I leaved usages of IndexOrderByDistance in opclasses. I think, that
> may help to minimize opclass changes in the future.
>
> Could you please extract this as a separate patch.  We can consider
> this for master, but we shouldn't backpatch this.
>
> Propagation of IndexOrderByDistance in SP-GiST was extracted into a separate
> patch #2.
First patch is minor editing from me and commit message is attached.
I'm going to push it if no objections.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

0001-Fix-GiST-and-SP-GiST-ordering-by-distance-for-NULLs-v4.patch (31K) Download Attachment