GiST index on INT8, possible bug in query planner?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

GiST index on INT8, possible bug in query planner?

Jan Behrens
Dear colleagues,

I have developed two indices using PostgreSQL's awesome GiST support,
one of them available here:

http://www.public-software-group.org/pgLatLon

(which is a lightweight and MIT-licensed alternative to PostGIS for
certain simple tasks involving geographic coordinates on the WGS-84
spheroid)


Recently I had the requirement of creating a multi-column index on
an integer in the first column and a custom data type in the second
column of the index. Since integers are not supported by GiST indices
by default, I used the btree_gist extension by Teodor Sigaev,
Oleg Bartunov, Janko Richter, and Paul Jungwirth, see:
https://www.postgresql.org/docs/10/btree-gist.html

However, the GiST index seems not to work as expected by me when
64-bit integers are involved. I tried to create a minimal
proof-of-concept to demonstrate this. Consider the following setup:

CREATE EXTENSION btree_gist;

CREATE TABLE test4_btree (id SERIAL4, ctx INT4);
CREATE TABLE test8_btree (id SERIAL4, ctx INT8);
CREATE TABLE test4_gist (id SERIAL4, ctx INT4);
CREATE TABLE test8_gist (id SERIAL4, ctx INT8);

I create multi-column indices on all four tables, with "ctx" as primary
and "id" as secondary column:

CREATE INDEX ON test4_btree (ctx, id);
CREATE INDEX ON test8_btree (ctx, id);
CREATE INDEX ON test4_gist USING gist (ctx, id);
CREATE INDEX ON test8_gist USING gist (ctx, id);

Now we add some data:

INSERT INTO test4_btree (ctx) SELECT floor(random()*100)+1 FROM generate_series(1, 10000);
INSERT INTO test8_btree (ctx) SELECT floor(random()*100)+1 FROM generate_series(1, 10000);
INSERT INTO test4_gist (ctx) SELECT floor(random()*100)+1 FROM generate_series(1, 10000);
INSERT INTO test8_gist (ctx) SELECT floor(random()*100)+1 FROM generate_series(1, 10000);

Only the tables directly using the B-tree index ("test4_btree" and
"test8_btree") and the table where "ctx" is 32-bit wide seem to work
properly:

EXPLAIN SELECT * FROM test4_btree WHERE ctx = 1 AND id = 2;
-- uses Index Cond: ((ctx = 1) AND (id = 2))

EXPLAIN SELECT * FROM test8_btree WHERE ctx = 1 AND id = 2;
-- uses Index Cond: ((ctx = 1) AND (id = 2))

EXPLAIN SELECT * FROM test4_gist WHERE ctx = 1 AND id = 2;
-- uses Index Cond: ((ctx = 1) AND (id = 2))

EXPLAIN SELECT * FROM test8_gist WHERE ctx = 1 AND id = 2;
-- uses Index Cond: (id = 2)

The query planning for the select on table "test8_gist" does not
include "ctx" in the "Index Cond".


To verify that the above problem isn't just an optimization because of
a low row count, I created a larger example with different values:

CREATE EXTENSION btree_gist;

CREATE TABLE test4_btree (ctx INT4, src INT4);
CREATE TABLE test8_btree (ctx INT8, src INT4);
CREATE TABLE test4_gist (ctx INT4, src INT4);
CREATE TABLE test8_gist (ctx INT8, src INT4);

CREATE INDEX ON test4_btree (ctx, src);
CREATE INDEX ON test8_btree (ctx, src);
CREATE INDEX ON test4_gist USING gist (ctx, src);
CREATE INDEX ON test8_gist USING gist (ctx, src);

INSERT INTO test4_btree SELECT floor(random()*10000)+1, floor(random()*2)+1 FROM generate_series(1, 1000000);
INSERT INTO test8_btree SELECT floor(random()*10000)+1, floor(random()*2)+1 FROM generate_series(1, 1000000);
INSERT INTO test4_gist SELECT floor(random()*10000)+1, floor(random()*2)+1 FROM generate_series(1, 1000000);
INSERT INTO test8_gist SELECT floor(random()*10000)+1, floor(random()*2)+1 FROM generate_series(1, 1000000);

EXPLAIN SELECT count(1) FROM test4_btree WHERE ctx = 1 AND src = 2;
-- uses Index Cond: ((ctx = 1) AND (src = 2))

EXPLAIN SELECT count(1) FROM test8_btree WHERE ctx = 1 AND src = 2;
-- uses Index Cond: ((ctx = 1) AND (src = 2))

EXPLAIN SELECT count(1) FROM test4_gist WHERE ctx = 1 AND src = 2;
-- uses Index Cond: ((ctx = 1) AND (src = 2))

EXPLAIN SELECT count(1) FROM test8_gist WHERE ctx = 1 AND src = 2;
-- uses Index Cond: (src = 2)

ANALYZE;

EXPLAIN SELECT count(1) FROM test4_btree WHERE ctx = 1 AND src = 2;
-- uses Index Cond: ((ctx = 1) AND (src = 2))

EXPLAIN SELECT count(1) FROM test8_btree WHERE ctx = 1 AND src = 2;
-- uses Index Cond: ((ctx = 1) AND (src = 2))

EXPLAIN SELECT count(1) FROM test4_gist WHERE ctx = 1 AND src = 2;
-- uses Index Cond: ((ctx = 1) AND (src = 2))

EXPLAIN SELECT count(1) FROM test8_gist WHERE ctx = 1 AND src = 2;
-- does not use Index Cond at all, but Filter: ((ctx = 1) AND (src = 2))

SELECT count(1) FROM test4_btree WHERE ctx = 1 AND src = 2;  -- fast
SELECT count(1) FROM test8_btree WHERE ctx = 1 AND src = 2;  -- fast
SELECT count(1) FROM test4_gist WHERE ctx = 1 AND src = 2;  -- fast
SELECT count(1) FROM test8_gist WHERE ctx = 1 AND src = 2;  -- slow!

The query on "test8_gist" is significantly slower than in all other
three cases.


I wonder if this is a bug in the query planner, in the GiST facilities
of PostgreSQL, a problem of the "btree_gist" extension, or something
else? Can anyone help me?


Kind regards,
Jan Behrens


--
Public Software Group e. V.
Johannisstr. 12, 10117 Berlin, Germany

www.public-software-group.org
[hidden email]

eingetragen in das Vereinregister
des Amtsgerichtes Charlottenburg
Registernummer: VR 28873 B

Vorstände (einzelvertretungsberechtigt):
Jan Behrens
Axel Kistner
Andreas Nitsche
Björn Swierczek

Reply | Threaded
Open this post in threaded view
|

Re: GiST index on INT8, possible bug in query planner?

Tom Lane-2
Jan Behrens <[hidden email]> writes:
> However, the GiST index seems not to work as expected by me when
> 64-bit integers are involved. I tried to create a minimal
> proof-of-concept to demonstrate this. Consider the following setup:
> CREATE TABLE test8_gist (id SERIAL4, ctx INT8);
> CREATE INDEX ON test8_gist USING gist (ctx, id);
> EXPLAIN SELECT * FROM test8_gist WHERE ctx = 1 AND id = 2;
> -- uses Index Cond: (id = 2)
> The query planning for the select on table "test8_gist" does not
> include "ctx" in the "Index Cond".

Probably it would if you'd written "WHERE ctx = 1::int8".  Without
the cast, what you'll have is "int8 = int4", and I suspect that
btree_gist doesn't include cross-type operators in its opclasses.

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: GiST index on INT8, possible bug in query planner?

Jan Behrens
On Mon, 03 Dec 2018 11:47:17 -0500
Tom Lane <[hidden email]> wrote:

> Jan Behrens <[hidden email]> writes:
>
> > However, the GiST index seems not to work as expected by me when
> > 64-bit integers are involved. I tried to create a minimal
> > proof-of-concept to demonstrate this. Consider the following setup:
> >
> > CREATE TABLE test8_gist (id SERIAL4, ctx INT8);
> > CREATE INDEX ON test8_gist USING gist (ctx, id);
> > EXPLAIN SELECT * FROM test8_gist WHERE ctx = 1 AND id = 2;
> > -- uses Index Cond: (id = 2)
> >
> > The query planning for the select on table "test8_gist" does not
> > include "ctx" in the "Index Cond".
>
> Probably it would if you'd written "WHERE ctx = 1::int8".  Without
> the cast, what you'll have is "int8 = int4", and I suspect that
> btree_gist doesn't include cross-type operators in its opclasses.
>
> regards, tom lane

You are right! I just tested it and ::int8 does the job.

It might be good to add a short notice or warning in the documentation
at: https://www.postgresql.org/docs/current/btree-gist.html

It might help other people who run into the same problem.


Thanks for helping me,
Jan Behrens

--
Public Software Group e. V.
Johannisstr. 12, 10117 Berlin, Germany

www.public-software-group.org
[hidden email]

eingetragen in das Vereinregister
des Amtsgerichtes Charlottenburg
Registernummer: VR 28873 B

Vorstände (einzelvertretungsberechtigt):
Jan Behrens
Axel Kistner
Andreas Nitsche
Björn Swierczek