index on jsonb col with 2D array inside the json

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

index on jsonb col with 2D array inside the json

Mariel Cherkassky
Hi,
I have a table with json col : R(object int, data jsonb).
Example for content : 
 object         |                 data
----------------+---------------------------------------
             50 | {"ranges": [[1, 1]]}
             51 | {"ranges": [[5, 700],[1,5],[9,10}
             52 | {"ranges": [[4, 200],[2,4],[3,4]]}
             53 | {"ranges": [[2, 2]]}
             54 | {"ranges": [[5, 10]]}

Now I tried to query for all the objects that contains a specific range, for example [2,2] : 
explain analyze SELECT *
FROM   R d
WHERE  EXISTS (
   SELECT FROM jsonb_array_elements(R.data -> 'ranges') rng 
   WHERE (rng->>0)::bigint <= 2 and (rng->>1)::bigint >= 2
   );

I saw that the gin index isnt suitable for this type of comparison. However, I saw that the gist index is suitable to handle ranges. Any idea of I can implement a gist index here ? 

In addition, I saved the same data in relational table R2(object,range_first,range_last).
 The previous data in this format : 
object   range_first   range_last
50              1                  1
51              5                  700        
51              1                    5
51              9                     10

i compared the first query with :
  explain analyze  select * from R2 where  range_first   <=2 and range_last >= 2; (I have an index on range_first,range_last that is used)

The query on the jsonb column was 100x slower (700 m/s vs 7m/s). The question is, Am I missing an index or the jsonb datatype isnt suitable for this structure of data. The R2 table contains 500K records while the R table contains about 200K records.



Reply | Threaded
Open this post in threaded view
|

Re: index on jsonb col with 2D array inside the json

Michael Lewis
Is your JSON data getting toasted? I wouldn't assume so if it is remaining small but something to check. Regardless, if an index exists and isn't being used, then that would be the primary concern. You didn't share what the definition of the index on R.data is... what do you already have?

You have an array of ranges stored as the value of key "ranges" in jsonb field data. If you created a table like R2, but with a single "range" column that is int4range type, then I would expect that you could add a GiST and then use overlaps &&, or another operator. I would not expect that you could index (unnest data->>'ranges' for instance) to get the separated out range values.


Michael Lewis


On Tue, Feb 19, 2019 at 8:59 AM Mariel Cherkassky <[hidden email]> wrote:
Hi,
I have a table with json col : R(object int, data jsonb).
Example for content : 
 object         |                 data
----------------+---------------------------------------
             50 | {"ranges": [[1, 1]]}
             51 | {"ranges": [[5, 700],[1,5],[9,10}
             52 | {"ranges": [[4, 200],[2,4],[3,4]]}
             53 | {"ranges": [[2, 2]]}
             54 | {"ranges": [[5, 10]]}

Now I tried to query for all the objects that contains a specific range, for example [2,2] : 
explain analyze SELECT *
FROM   R d
WHERE  EXISTS (
   SELECT FROM jsonb_array_elements(R.data -> 'ranges') rng 
   WHERE (rng->>0)::bigint <= 2 and (rng->>1)::bigint >= 2
   );

I saw that the gin index isnt suitable for this type of comparison. However, I saw that the gist index is suitable to handle ranges. Any idea of I can implement a gist index here ? 

In addition, I saved the same data in relational table R2(object,range_first,range_last).
 The previous data in this format : 
object   range_first   range_last
50              1                  1
51              5                  700        
51              1                    5
51              9                     10

i compared the first query with :
  explain analyze  select * from R2 where  range_first   <=2 and range_last >= 2; (I have an index on range_first,range_last that is used)

The query on the jsonb column was 100x slower (700 m/s vs 7m/s). The question is, Am I missing an index or the jsonb datatype isnt suitable for this structure of data. The R2 table contains 500K records while the R table contains about 200K records.



Reply | Threaded
Open this post in threaded view
|

Re: index on jsonb col with 2D array inside the json

Mariel Cherkassky
I dont have any indexes on R (the table with the jsonb column). I was asking if I can create any that can increase this query`s performance. 
If I understood you correctly I have  3 options right now :
1)R, without indexes 
2)R2 with an index on first and last
3)R3 that should contain a single range column (type int4range) with gist index on it. 

In aspect of performance, R<R2<? R3 

‫בתאריך יום ג׳, 19 בפבר׳ 2019 ב-18:28 מאת ‪Michael Lewis‬‏ <‪[hidden email]‬‏>:‬
Is your JSON data getting toasted? I wouldn't assume so if it is remaining small but something to check. Regardless, if an index exists and isn't being used, then that would be the primary concern. You didn't share what the definition of the index on R.data is... what do you already have?

You have an array of ranges stored as the value of key "ranges" in jsonb field data. If you created a table like R2, but with a single "range" column that is int4range type, then I would expect that you could add a GiST and then use overlaps &&, or another operator. I would not expect that you could index (unnest data->>'ranges' for instance) to get the separated out range values.


Michael Lewis


On Tue, Feb 19, 2019 at 8:59 AM Mariel Cherkassky <[hidden email]> wrote:
Hi,
I have a table with json col : R(object int, data jsonb).
Example for content : 
 object         |                 data
----------------+---------------------------------------
             50 | {"ranges": [[1, 1]]}
             51 | {"ranges": [[5, 700],[1,5],[9,10}
             52 | {"ranges": [[4, 200],[2,4],[3,4]]}
             53 | {"ranges": [[2, 2]]}
             54 | {"ranges": [[5, 10]]}

Now I tried to query for all the objects that contains a specific range, for example [2,2] : 
explain analyze SELECT *
FROM   R d
WHERE  EXISTS (
   SELECT FROM jsonb_array_elements(R.data -> 'ranges') rng 
   WHERE (rng->>0)::bigint <= 2 and (rng->>1)::bigint >= 2
   );

I saw that the gin index isnt suitable for this type of comparison. However, I saw that the gist index is suitable to handle ranges. Any idea of I can implement a gist index here ? 

In addition, I saved the same data in relational table R2(object,range_first,range_last).
 The previous data in this format : 
object   range_first   range_last
50              1                  1
51              5                  700        
51              1                    5
51              9                     10

i compared the first query with :
  explain analyze  select * from R2 where  range_first   <=2 and range_last >= 2; (I have an index on range_first,range_last that is used)

The query on the jsonb column was 100x slower (700 m/s vs 7m/s). The question is, Am I missing an index or the jsonb datatype isnt suitable for this structure of data. The R2 table contains 500K records while the R table contains about 200K records.



Reply | Threaded
Open this post in threaded view
|

Re: index on jsonb col with 2D array inside the json

Michael Lewis
I would expect that R2 vs R3 would be negligible but perhaps gist works much better and would be an improvement. When you are down to 7ms already, I wouldn't hope for any big change. I assume you used btree for the multi-column index on R2 range_first, range_last but am not familiar with gist on range vs btree on two int columns.

It seems a little odd to have a jsonb value to hold multiple range values. A range is already a complex type so separating out into the association table like R3 would make sense to me.

Michael Lewis

On Tue, Feb 19, 2019 at 9:34 AM Mariel Cherkassky <[hidden email]> wrote:
I dont have any indexes on R (the table with the jsonb column). I was asking if I can create any that can increase this query`s performance. 
If I understood you correctly I have  3 options right now :
1)R, without indexes 
2)R2 with an index on first and last
3)R3 that should contain a single range column (type int4range) with gist index on it. 

In aspect of performance, R<R2<? R3 

‫בתאריך יום ג׳, 19 בפבר׳ 2019 ב-18:28 מאת ‪Michael Lewis‬‏ <‪[hidden email]‬‏>:‬
Is your JSON data getting toasted? I wouldn't assume so if it is remaining small but something to check. Regardless, if an index exists and isn't being used, then that would be the primary concern. You didn't share what the definition of the index on R.data is... what do you already have?

You have an array of ranges stored as the value of key "ranges" in jsonb field data. If you created a table like R2, but with a single "range" column that is int4range type, then I would expect that you could add a GiST and then use overlaps &&, or another operator. I would not expect that you could index (unnest data->>'ranges' for instance) to get the separated out range values.


Michael Lewis


On Tue, Feb 19, 2019 at 8:59 AM Mariel Cherkassky <[hidden email]> wrote:
Hi,
I have a table with json col : R(object int, data jsonb).
Example for content : 
 object         |                 data
----------------+---------------------------------------
             50 | {"ranges": [[1, 1]]}
             51 | {"ranges": [[5, 700],[1,5],[9,10}
             52 | {"ranges": [[4, 200],[2,4],[3,4]]}
             53 | {"ranges": [[2, 2]]}
             54 | {"ranges": [[5, 10]]}

Now I tried to query for all the objects that contains a specific range, for example [2,2] : 
explain analyze SELECT *
FROM   R d
WHERE  EXISTS (
   SELECT FROM jsonb_array_elements(R.data -> 'ranges') rng 
   WHERE (rng->>0)::bigint <= 2 and (rng->>1)::bigint >= 2
   );

I saw that the gin index isnt suitable for this type of comparison. However, I saw that the gist index is suitable to handle ranges. Any idea of I can implement a gist index here ? 

In addition, I saved the same data in relational table R2(object,range_first,range_last).
 The previous data in this format : 
object   range_first   range_last
50              1                  1
51              5                  700        
51              1                    5
51              9                     10

i compared the first query with :
  explain analyze  select * from R2 where  range_first   <=2 and range_last >= 2; (I have an index on range_first,range_last that is used)

The query on the jsonb column was 100x slower (700 m/s vs 7m/s). The question is, Am I missing an index or the jsonb datatype isnt suitable for this structure of data. The R2 table contains 500K records while the R table contains about 200K records.



Reply | Threaded
Open this post in threaded view
|

Re: index on jsonb col with 2D array inside the json

Mariel Cherkassky
Thanks for the feedback! 

On Tue, Feb 19, 2019, 6:42 PM Michael Lewis <[hidden email] wrote:
I would expect that R2 vs R3 would be negligible but perhaps gist works much better and would be an improvement. When you are down to 7ms already, I wouldn't hope for any big change. I assume you used btree for the multi-column index on R2 range_first, range_last but am not familiar with gist on range vs btree on two int columns.

It seems a little odd to have a jsonb value to hold multiple range values. A range is already a complex type so separating out into the association table like R3 would make sense to me.

Michael Lewis

On Tue, Feb 19, 2019 at 9:34 AM Mariel Cherkassky <[hidden email]> wrote:
I dont have any indexes on R (the table with the jsonb column). I was asking if I can create any that can increase this query`s performance. 
If I understood you correctly I have  3 options right now :
1)R, without indexes 
2)R2 with an index on first and last
3)R3 that should contain a single range column (type int4range) with gist index on it. 

In aspect of performance, R<R2<? R3 

‫בתאריך יום ג׳, 19 בפבר׳ 2019 ב-18:28 מאת ‪Michael Lewis‬‏ <‪[hidden email]‬‏>:‬
Is your JSON data getting toasted? I wouldn't assume so if it is remaining small but something to check. Regardless, if an index exists and isn't being used, then that would be the primary concern. You didn't share what the definition of the index on R.data is... what do you already have?

You have an array of ranges stored as the value of key "ranges" in jsonb field data. If you created a table like R2, but with a single "range" column that is int4range type, then I would expect that you could add a GiST and then use overlaps &&, or another operator. I would not expect that you could index (unnest data->>'ranges' for instance) to get the separated out range values.


Michael Lewis


On Tue, Feb 19, 2019 at 8:59 AM Mariel Cherkassky <[hidden email]> wrote:
Hi,
I have a table with json col : R(object int, data jsonb).
Example for content : 
 object         |                 data
----------------+---------------------------------------
             50 | {"ranges": [[1, 1]]}
             51 | {"ranges": [[5, 700],[1,5],[9,10}
             52 | {"ranges": [[4, 200],[2,4],[3,4]]}
             53 | {"ranges": [[2, 2]]}
             54 | {"ranges": [[5, 10]]}

Now I tried to query for all the objects that contains a specific range, for example [2,2] : 
explain analyze SELECT *
FROM   R d
WHERE  EXISTS (
   SELECT FROM jsonb_array_elements(R.data -> 'ranges') rng 
   WHERE (rng->>0)::bigint <= 2 and (rng->>1)::bigint >= 2
   );

I saw that the gin index isnt suitable for this type of comparison. However, I saw that the gist index is suitable to handle ranges. Any idea of I can implement a gist index here ? 

In addition, I saved the same data in relational table R2(object,range_first,range_last).
 The previous data in this format : 
object   range_first   range_last
50              1                  1
51              5                  700        
51              1                    5
51              9                     10

i compared the first query with :
  explain analyze  select * from R2 where  range_first   <=2 and range_last >= 2; (I have an index on range_first,range_last that is used)

The query on the jsonb column was 100x slower (700 m/s vs 7m/s). The question is, Am I missing an index or the jsonb datatype isnt suitable for this structure of data. The R2 table contains 500K records while the R table contains about 200K records.