Multicolumn index for single-column queries?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Multicolumn index for single-column queries?

rihad
Hi. Say there are 2 indexes:

     "foo_index" btree (foo_id)

     "multi_index" btree (foo_id, approved, expires_at)


foo_id is an integer. Some queries involve all three columns in their WHERE clauses, some involve only foo_id.
Would it be ok from general performance standpoint to remove foo_index and rely only on multi_index? I know that
PG would have to do less work updating just one index compared to updating them both, but wouldn't searches
on foo_id alone become slower?

Thanks.



Reply | Threaded
Open this post in threaded view
|

Re: Multicolumn index for single-column queries?

Gavin Flower-2
On 18/04/2019 18:52, rihad wrote:

> Hi. Say there are 2 indexes:
>
>     "foo_index" btree (foo_id)
>
>     "multi_index" btree (foo_id, approved, expires_at)
>
>
> foo_id is an integer. Some queries involve all three columns in their
> WHERE clauses, some involve only foo_id.
> Would it be ok from general performance standpoint to remove foo_index
> and rely only on multi_index? I know that
> PG would have to do less work updating just one index compared to
> updating them both, but wouldn't searches
> on foo_id alone become slower?
>
> Thanks.
>
>
>
The multi column index will require more RAM to hold it.  So if there is
memory contention, then there would be an increased risk of swapping,
leading to slower query times.

I suspect that if there is more than enough RAM, then a multi column
index will be slightly slower than a single column index. However, the
difference will probably be lost in the noise -- in other words, the
various things happening in the background will most likely to have far
more significant impact on query duration.  IMHO


Cheers,
Gavin




Reply | Threaded
Open this post in threaded view
|

Re: Multicolumn index for single-column queries?

Andreas Kretschmer-3
In reply to this post by rihad


Am 18.04.19 um 08:52 schrieb rihad:

> Hi. Say there are 2 indexes:
>
>     "foo_index" btree (foo_id)
>
>     "multi_index" btree (foo_id, approved, expires_at)
>
>
> foo_id is an integer. Some queries involve all three columns in their
> WHERE clauses, some involve only foo_id.
> Would it be ok from general performance standpoint to remove foo_index
> and rely only on multi_index? I know that
> PG would have to do less work updating just one index compared to
> updating them both, but wouldn't searches
> on foo_id alone become slower?

it depends .

it depends on the queries you are using, on your workload. a
multi-column-index will be large than an index over just one column,
therefore you will have more disk-io when you read from such an index.


Regards, Andreas

--
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com



Reply | Threaded
Open this post in threaded view
|

Re: Multicolumn index for single-column queries?

Laurenz Albe
Andreas Kretschmer wrote:

> Am 18.04.19 um 08:52 schrieb rihad:
> > Hi. Say there are 2 indexes:
> >
> >     "foo_index" btree (foo_id)
> >
> >     "multi_index" btree (foo_id, approved, expires_at)
> >
> >
> > foo_id is an integer. Some queries involve all three columns in their
> > WHERE clauses, some involve only foo_id.
> > Would it be ok from general performance standpoint to remove foo_index
> > and rely only on multi_index? I know that
> > PG would have to do less work updating just one index compared to
> > updating them both, but wouldn't searches
> > on foo_id alone become slower?
>
> it depends .
>
> it depends on the queries you are using, on your workload. a
> multi-column-index will be large than an index over just one column,
> therefore you will have more disk-io when you read from such an index.

To be more explicit: if you can live with a slightly less efficient
index scan and want fast data modifications, use only the second index.

If you hardly ever update the table, don't mind the wasted space and
want every bit of query speed (data warehouse), having both indexes
might be better.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com



Reply | Threaded
Open this post in threaded view
|

Re: Multicolumn index for single-column queries?

Ron-2
In reply to this post by Andreas Kretschmer-3
On 4/18/19 2:14 AM, Andreas Kretschmer wrote:


Am 18.04.19 um 08:52 schrieb rihad:
Hi. Say there are 2 indexes:

    "foo_index" btree (foo_id)

    "multi_index" btree (foo_id, approved, expires_at)


foo_id is an integer. Some queries involve all three columns in their WHERE clauses, some involve only foo_id.
Would it be ok from general performance standpoint to remove foo_index and rely only on multi_index? I know that
PG would have to do less work updating just one index compared to updating them both, but wouldn't searches
on foo_id alone become slower?

it depends .

it depends on the queries you are using, on your workload. a multi-column-index will be large than an index over just one column,
therefore you will have more disk-io when you read from such an index.

But two indexes are larger than one index, and updating two indexes requires more disk IO than updating one index.

(Prefix compression would obviate the need for this question.  Then your multi-column index would be much smaller.)


--
Angular momentum makes the world go 'round.
Reply | Threaded
Open this post in threaded view
|

Re: Multicolumn index for single-column queries?

Gavin Flower-2
On 19/04/2019 01:24, Ron wrote:

> On 4/18/19 2:14 AM, Andreas Kretschmer wrote:
>>
>>
>> Am 18.04.19 um 08:52 schrieb rihad:
>>> Hi. Say there are 2 indexes:
>>>
>>>     "foo_index" btree (foo_id)
>>>
>>>     "multi_index" btree (foo_id, approved, expires_at)
>>>
>>>
>>> foo_id is an integer. Some queries involve all three columns in
>>> their WHERE clauses, some involve only foo_id.
>>> Would it be ok from general performance standpoint to remove
>>> foo_index and rely only on multi_index? I know that
>>> PG would have to do less work updating just one index compared to
>>> updating them both, but wouldn't searches
>>> on foo_id alone become slower?
>>
>> it depends .
>>
>> it depends on the queries you are using, on your workload. a
>> multi-column-index will be large than an index over just one column,
>> therefore you will have more disk-io when you read from such an index.
>
> But two indexes are larger than one index, and updating two indexes
> requires more disk IO than updating one index.

Agreed.

A key question would be: how often is the query run, compared to the
frequency Insertions, Updates, and Deletions -- wrt the table.

>
> (Prefix compression would obviate the need for this question. Then
> your multi-column index would be *much* smaller.)

True, but a multi column index will still be bigger than single column
index.

[...]


Reply | Threaded
Open this post in threaded view
|

Re: Multicolumn index for single-column queries?

hari.fuchs
In reply to this post by Andreas Kretschmer-3
Andreas Kretschmer <[hidden email]> writes:

> Am 18.04.19 um 08:52 schrieb rihad:
>> Hi. Say there are 2 indexes:
>>
>>     "foo_index" btree (foo_id)
>>
>>     "multi_index" btree (foo_id, approved, expires_at)
>>
>>
>> foo_id is an integer. Some queries involve all three columns in
>> their WHERE clauses, some involve only foo_id.
>> Would it be ok from general performance standpoint to remove
>> foo_index and rely only on multi_index? I know that
>> PG would have to do less work updating just one index compared to
>> updating them both, but wouldn't searches
>> on foo_id alone become slower?
>
> it depends .
>
> it depends on the queries you are using, on your workload. a
> multi-column-index will be large than an index over just one column,
> therefore you will have more disk-io when you read from such an index.

I think it also depends on the average number of rows having the same foo_id.



Reply | Threaded
Open this post in threaded view
|

Re: Multicolumn index for single-column queries?

Ron-2
In reply to this post by Gavin Flower-2
On 4/18/19 8:45 AM, Gavin Flower wrote:
> On 19/04/2019 01:24, Ron wrote:
>> On 4/18/19 2:14 AM, Andreas Kretschmer wrote:
[snip]
>>
>> (Prefix compression would obviate the need for this question. Then your
>> multi-column index would be *much* smaller.)
>
> True, but a multi column index will still be bigger than single column index.

TANSTAAFL.


--
Angular momentum makes the world go 'round.


Reply | Threaded
Open this post in threaded view
|

Re: Multicolumn index for single-column queries?

Gavin Flower-2
On 19/04/2019 02:12, Ron wrote:

> On 4/18/19 8:45 AM, Gavin Flower wrote:
>> On 19/04/2019 01:24, Ron wrote:
>>> On 4/18/19 2:14 AM, Andreas Kretschmer wrote:
> [snip]
>>>
>>> (Prefix compression would obviate the need for this question. Then
>>> your multi-column index would be *much* smaller.)
>>
>> True, but a multi column index will still be bigger than single
>> column index.
>
> TANSTAAFL.
>
>
QUOTE: [Oh, 'tanstaafl.' Means ~There ain't no such thing as a free lunch.']
 From The Moon is a Harsh Mistress, by Robert Heinlein. Published 1966
Is where I first came across TANSTAAFL.

However, it appears to have been used at least as early as 1949.

Just adding this, as probably there are many people who don't know the
acronym.



Reply | Threaded
Open this post in threaded view
|

Re: Multicolumn index for single-column queries?

Gavin Flower-2
In reply to this post by hari.fuchs
On 19/04/2019 01:47, Harald Fuchs wrote:

> Andreas Kretschmer <[hidden email]> writes:
>
>> Am 18.04.19 um 08:52 schrieb rihad:
>>> Hi. Say there are 2 indexes:
>>>
>>>      "foo_index" btree (foo_id)
>>>
>>>      "multi_index" btree (foo_id, approved, expires_at)
>>>
>>>
>>> foo_id is an integer. Some queries involve all three columns in
>>> their WHERE clauses, some involve only foo_id.
>>> Would it be ok from general performance standpoint to remove
>>> foo_index and rely only on multi_index? I know that
>>> PG would have to do less work updating just one index compared to
>>> updating them both, but wouldn't searches
>>> on foo_id alone become slower?
>> it depends .
>>
>> it depends on the queries you are using, on your workload. a
>> multi-column-index will be large than an index over just one column,
>> therefore you will have more disk-io when you read from such an index.
> I think it also depends on the average number of rows having the same foo_id.
>
The number of rows referenced by an index entry for the multi_index will
always be less than or equal to those for the matching foo_index.

Also there will be fewer index entries per block for the multi_index. 
Which is why the I/O count will be higher; even in the best case, where
there is an equal row referenced by the index entries.



Reply | Threaded
Open this post in threaded view
|

Re: Multicolumn index for single-column queries?

Gavin Flower-2
In reply to this post by hari.fuchs
On 19/04/2019 01:47, Harald Fuchs wrote:

> Andreas Kretschmer<[hidden email]>  writes:
>
>> Am 18.04.19 um 08:52 schrieb rihad:
>>> Hi. Say there are 2 indexes:
>>>
>>>      "foo_index" btree (foo_id)
>>>
>>>      "multi_index" btree (foo_id, approved, expires_at)
>>>
>>>
>>> foo_id is an integer. Some queries involve all three columns in
>>> their WHERE clauses, some involve only foo_id.
>>> Would it be ok from general performance standpoint to remove
>>> foo_index and rely only on multi_index? I know that
>>> PG would have to do less work updating just one index compared to
>>> updating them both, but wouldn't searches
>>> on foo_id alone become slower?
>> it depends .
>>
>> it depends on the queries you are using, on your workload. a
>> multi-column-index will be large than an index over just one column,
>> therefore you will have more disk-io when you read from such an index.
> I think it also depends on the average number of rows having the same foo_id.
>
The number of rows referenced by an index entry for the multi_index will
always be less than or equal to those for the matching foo_index.

Also there will be fewer index entries per block for the multi_index,
which is why the I/O count will be higher even in the best case where
there is an equal number of rows referenced by each index entry.



Reply | Threaded
Open this post in threaded view
|

Re: Multicolumn index for single-column queries?

Gavin Flower-2
On 19/04/2019 14:01, Gavin Flower wrote:
[...]
>
> Also there will be fewer index entries per block for the multi_index,
> which is why the I/O count will be higher even in the best case where
> there is an equal number of rows referenced by each index entry.
>
Not sure why my system had this still in my draft folder!


Sorry, for the duplication...