JSONB Array of Strings (with GIN index) versus Split Rows (B-Tree Index)

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

JSONB Array of Strings (with GIN index) versus Split Rows (B-Tree Index)

Syed Jafri

I have a database which stores receiver to indicate which account the data relates to. This has led to tons of duplication of data, as one set of data may create 3 separate rows, where the only difference is the receiver column.

 

|---------------------|------------------|---------------------|------------------|

|      Receiver       |       Event      |         Date        |      Location    |

|---------------------|------------------|---------------------|------------------|

|       Alpha         |         3        |          12         |         USA      |

|---------------------|------------------|---------------------|------------------|

|       Bravo         |         3        |          12         |         USA      |

|---------------------|------------------|---------------------|------------------|

|       Charlie       |         3        |          12         |         USA      |

|---------------------|------------------|---------------------|------------------|

 

While redesigning the database, I have considered using an array with a GIN index instead of the current B-Tree index on receiver. My proposed new table would look like this:

|-------------------------------|--------------|------------|-------------------|

|           Receivers           |     Event    |    Date    |     Location      |

|-------------------------------|--------------|------------|-------------------|

| ["Alpha", "Bravo", "Charlie"] |       3      |     12     |         USA       |

|-------------------------------|--------------|------------|-------------------|

 

More Information:

·         Receiver names are of the type (a-z, 1-5, .)

·         95% of all queries currently look like this: SELECT * FROM table WHERE Receiver = Alpha, with the new format this would be SELECT * FROM table WHERE receivers @> '"Alpha"'::jsonb;

·         The table currently contains over 4 billion rows (with duplication) and the new proposed schema would cut it down to under 2 billion rows.

·          

Question:

1.      Does it make more sense to use Postgres Native Text Array?

2.      Would a jsonb_path_ops GIN index on receivers make sense here?

3.      Which option is more efficient? Which is faster?

 

Reply | Threaded
Open this post in threaded view
|

Re: JSONB Array of Strings (with GIN index) versus Split Rows (B-Tree Index)

Ian Zimmerman
On 2019-02-04 05:34, Syed Jafri wrote:

> |---------------------|------------------|---------------------|------------------|
> |      Receiver       |       Event      |         Date        |      Location    |
> |---------------------|------------------|---------------------|------------------|
> |       Alpha         |         3        |          12         |         USA      |
> |---------------------|------------------|---------------------|------------------|
> |       Bravo         |         3        |          12         |         USA      |
> |---------------------|------------------|---------------------|------------------|
> |       Charlie       |         3        |          12         |         USA      |
> |---------------------|------------------|---------------------|------------------|

Please forgive a naive question, but doesn't this simply cry out for a
normalization step?

EventID   Date   Location
3         12     USA

Receiver  EventID
Alpha     3
Bravo     3
Charlie   3

If you want to use JSON by any means necessary, perhaps a simpler
key/value store is more appropriate than a relational DB like Postgres?

--
Please don't Cc: me privately on mailing lists and Usenet,
if you also post the followup to the list or newsgroup.
To reply privately _only_ on Usenet and on broken lists
which rewrite From, fetch the TXT record for no-use.mooo.com.

Reply | Threaded
Open this post in threaded view
|

Re: JSONB Array of Strings (with GIN index) versus Split Rows (B-Tree Index)

David G Johnston
In reply to this post by Syed Jafri
On Sun, Feb 3, 2019 at 10:35 PM Syed Jafri <[hidden email]> wrote:
> ·         Receiver names are of the type (a-z, 1-5, .)
>
> ·         95% of all queries currently look like this: SELECT * FROM table WHERE Receiver = Alpha, with the new format this would be SELECT * FROM table WHERE receivers @> '"Alpha"'::jsonb;
> 3.      Which option is more efficient? Which is faster?

I'd probably develop a performance test for the model and planned
queries and experiment with:

1. text arrays
2. jsonb arrays
3. partitioning

David J.