Avoid full GIN index scan when possible

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
54 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Avoid full GIN index scan when possible

Julien Rouhaud
Hi,

Marc (in Cc) reported me a problematic query using a GIN index hit in
production.  The issue is that even if an GIN opclass says that the
index can be used for an operator, it's still possible that some
values aren't really compatible and requires a full index scan.

One simple example is with a GIN pg_trgm index (but other opclasses
have similar restrictions) , doing a LIKE with wildcard on both side,
where the pattern is shorter than a trigram, e.g. col LIKE '%a%'.  So,
a where clause of the form:

WHERE col LIKE '%verylongpattern%' AND col LIKE '%a%'

is much more expensive than

WHERE col LKE '%verylongpattern%'

While there's nothing to do if the unhandled const is the only
predicate, if there are multiple AND-ed predicates and at least one of
them doesn't require a full index scan, we can avoid it.

Attached patch tries to fix the issue by detecting such cases and
dropping the unhandled quals in the BitmapIndexScan, letting the
recheck in BitmapHeapScan do the proper filtering.  I'm not happy to
call the extractQuery support functions an additional time, but i
didn't find a cleaner way.  This is of course intended for pg13.

avoid_gin_fullscan-v1.diff (9K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Avoid full GIN index scan when possible

Julien Rouhaud
On Sun, Mar 24, 2019 at 11:52 AM Julien Rouhaud <[hidden email]> wrote:

>
> Marc (in Cc) reported me a problematic query using a GIN index hit in
> production.  The issue is that even if an GIN opclass says that the
> index can be used for an operator, it's still possible that some
> values aren't really compatible and requires a full index scan.
>
> One simple example is with a GIN pg_trgm index (but other opclasses
> have similar restrictions) , doing a LIKE with wildcard on both side,
> where the pattern is shorter than a trigram, e.g. col LIKE '%a%'.  So,
> a where clause of the form:
>
> WHERE col LIKE '%verylongpattern%' AND col LIKE '%a%'
>
> is much more expensive than
>
> WHERE col LKE '%verylongpattern%'
>
> While there's nothing to do if the unhandled const is the only
> predicate, if there are multiple AND-ed predicates and at least one of
> them doesn't require a full index scan, we can avoid it.
>
> Attached patch tries to fix the issue by detecting such cases and
> dropping the unhandled quals in the BitmapIndexScan, letting the
> recheck in BitmapHeapScan do the proper filtering.  I'm not happy to
> call the extractQuery support functions an additional time, but i
> didn't find a cleaner way.  This is of course intended for pg13.
Patch doesn't apply anymore (thanks cfbot).  Rebased patch attached.

avoid_gin_fullscan-v2.diff (9K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Avoid full GIN index scan when possible

Tomas Vondra-4
Hi,

I've briefly looked at the patch today. I think the idea is worthwhile,
but I found a couple of issues with the patch:


1) The index_selfuncs.h header is included in the wrong place, it should
be included before lsyscache.h (because 'i' < 'l').


2) I'm not sure it's a good idea to add dependency on a specific AM type
into indxpath.c. At the moment there are only two places, both referring
to BTREE_AM_OID, do we really hard-code another OID here?

I wonder if this could be generalized to another support proc in the
inde AM API, with just GIN implementing it.


3) selfuncs.c is hardly the right place for gin_get_optimizable_quals,
as it's only for functions computing selectivity estimates (and funcs
directly related to that). And the new function is not related to that
at all, so why not to define it in indxpath.c directly?

Of course, if it gets into the index AM API then this would disappear.


4) The gin_get_optimizable_quals is quite misleading. Firstly, it's not
very obvious what "optimizable" means in this context, but that's a
minor issue. The bigger issue is that it's a lie - when there are no
"optimizable" clauses (so all clauses would require full scan) the
function returns the original list, which is by definition completely
non-optimizable.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Avoid full GIN index scan when possible

Julien Rouhaud
On Fri, Jun 28, 2019 at 6:10 PM Tomas Vondra
<[hidden email]> wrote:
>
> I've briefly looked at the patch today. I think the idea is worthwhile,

Thanks!

> 2) I'm not sure it's a good idea to add dependency on a specific AM type
> into indxpath.c. At the moment there are only two places, both referring
> to BTREE_AM_OID, do we really hard-code another OID here?
>
> I wonder if this could be generalized to another support proc in the
> inde AM API, with just GIN implementing it.

Yes, this patch was more a POC than anything, to discuss the approach
before spending too much time on infrastructure.  I considered another
support function, but I'm still unclear of how useful it'd be for
custom AM (as I don't see any use for that for the vanilla one I
think), or whether if this should be opclass specific or not.

> 3) selfuncs.c is hardly the right place for gin_get_optimizable_quals,
> as it's only for functions computing selectivity estimates (and funcs
> directly related to that). And the new function is not related to that
> at all, so why not to define it in indxpath.c directly?

I kept this function in selfuncs.c as it's using some private
functions (gincost_opexpr and gincost_scalararrayopexpr) used by
gincostestimate.  That seemed the simplest approach at this stage.
BTW there's also an ongoing discussion to move the (am)estimate
functions in AM-specific files [1], so that'll directly impact this
too.

> 4) The gin_get_optimizable_quals is quite misleading. Firstly, it's not
> very obvious what "optimizable" means in this context, but that's a
> minor issue. The bigger issue is that it's a lie - when there are no
> "optimizable" clauses (so all clauses would require full scan) the
> function returns the original list, which is by definition completely
> non-optimizable.

The comment is hopefully clearer about what this function does, but
definitely this name is terrible.  I'll try to come up with a better
one.

[1] https://www.postgresql.org/message-id/4079.1561661677%40sss.pgh.pa.us


Reply | Threaded
Open this post in threaded view
|

Re: Avoid full GIN index scan when possible

Tom Lane-2
Julien Rouhaud <[hidden email]> writes:
> On Fri, Jun 28, 2019 at 6:10 PM Tomas Vondra
> <[hidden email]> wrote:
>> 2) I'm not sure it's a good idea to add dependency on a specific AM type
>> into indxpath.c. At the moment there are only two places, both referring
>> to BTREE_AM_OID, do we really hard-code another OID here?
>>
>> I wonder if this could be generalized to another support proc in the
>> inde AM API, with just GIN implementing it.

> Yes, this patch was more a POC than anything, to discuss the approach
> before spending too much time on infrastructure.  I considered another
> support function, but I'm still unclear of how useful it'd be for
> custom AM (as I don't see any use for that for the vanilla one I
> think), or whether if this should be opclass specific or not.

I just spent a lot of sweat to get rid of (most of) indxpath.c's knowledge
about specific AMs' capabilities; I'd be very sad if we started to put any
back.  Aside from being a modularity violation, it's going to fall foul
of the principle that if index AM X wants something, some index AM Y is
going to want it too, eventually.

Also, I'm quite unhappy about including index_selfuncs.h into indxpath.c
at all, never mind whether you got the alphabetical ordering right.
I have doubts still about how we ought to refactor the mess that is
*selfuncs.c, but this isn't going in the right direction.

>> 3) selfuncs.c is hardly the right place for gin_get_optimizable_quals,
>> as it's only for functions computing selectivity estimates (and funcs
>> directly related to that). And the new function is not related to that
>> at all, so why not to define it in indxpath.c directly?

I not only don't want that function in indxpath.c, I don't even want
it to be known/called from there.  If we need the ability for the index
AM to editorialize on the list of indexable quals (which I'm not very
convinced of yet), let's make an AM interface function to do it.

BTW, I have no idea what you think you're doing here by messing with
outer_relids, but it's almost certainly wrong, and if it isn't wrong
then it needs a comment explaining itself.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Avoid full GIN index scan when possible

Tomas Vondra-4
On Fri, Jun 28, 2019 at 03:03:19PM -0400, Tom Lane wrote:

>Julien Rouhaud <[hidden email]> writes:
>> On Fri, Jun 28, 2019 at 6:10 PM Tomas Vondra
>> <[hidden email]> wrote:
>>> 2) I'm not sure it's a good idea to add dependency on a specific AM type
>>> into indxpath.c. At the moment there are only two places, both referring
>>> to BTREE_AM_OID, do we really hard-code another OID here?
>>>
>>> I wonder if this could be generalized to another support proc in the
>>> inde AM API, with just GIN implementing it.
>
>> Yes, this patch was more a POC than anything, to discuss the approach
>> before spending too much time on infrastructure.  I considered another
>> support function, but I'm still unclear of how useful it'd be for
>> custom AM (as I don't see any use for that for the vanilla one I
>> think), or whether if this should be opclass specific or not.
>
>I just spent a lot of sweat to get rid of (most of) indxpath.c's knowledge
>about specific AMs' capabilities; I'd be very sad if we started to put any
>back.  Aside from being a modularity violation, it's going to fall foul
>of the principle that if index AM X wants something, some index AM Y is
>going to want it too, eventually.
>
>Also, I'm quite unhappy about including index_selfuncs.h into indxpath.c
>at all, never mind whether you got the alphabetical ordering right.
>I have doubts still about how we ought to refactor the mess that is
>*selfuncs.c, but this isn't going in the right direction.
>

Right.

>>> 3) selfuncs.c is hardly the right place for gin_get_optimizable_quals,
>>> as it's only for functions computing selectivity estimates (and funcs
>>> directly related to that). And the new function is not related to that
>>> at all, so why not to define it in indxpath.c directly?
>
>I not only don't want that function in indxpath.c, I don't even want
>it to be known/called from there.  If we need the ability for the index
>AM to editorialize on the list of indexable quals (which I'm not very
>convinced of yet), let's make an AM interface function to do it.
>

Wouldn't it be better to have a function that inspects a single qual and
says whether it's "optimizable" or not? That could be part of the AM
implementation, and we'd call it and it'd be us messing with the list.

That being said, is this really a binary thing - if you have a value
that matches 99% of rows, that probably is not much better than a full
scan.  It may be more difficult to decide (compared to the 'short
trigram' case), but perhaps we should allow that too? Essentially,
instead of 'optimizable' returning true/false, it might return value
between 0.0 and 1.0, as a measure of 'optimizability'.

But that kinda resembles stuff we already have - selectivity/cost. So
why shouldn't this be considered as part of costing? That is, could
gincostestimate look at the index quals and decide what will be used for
scanning the index? Of course, this would make the logic GIN-specific,
and other index AMs would have to implement their own version ...


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Avoid full GIN index scan when possible

Tom Lane-2
Tomas Vondra <[hidden email]> writes:
> On Fri, Jun 28, 2019 at 03:03:19PM -0400, Tom Lane wrote:
>> I not only don't want that function in indxpath.c, I don't even want
>> it to be known/called from there.  If we need the ability for the index
>> AM to editorialize on the list of indexable quals (which I'm not very
>> convinced of yet), let's make an AM interface function to do it.

> Wouldn't it be better to have a function that inspects a single qual and
> says whether it's "optimizable" or not? That could be part of the AM
> implementation, and we'd call it and it'd be us messing with the list.

Uh ... we already determined that the qual is indexable (ie is a member
of the index's opclass), or allowed the index AM to derive an indexable
clause from it, so I'm not sure what you envision would happen
additionally there.  If I understand what Julien is concerned about
--- and I may not --- it's that the set of indexable clauses *as a whole*
may have or lack properties of interest.  So I'm thinking the answer
involves some callback that can do something to the whole list, not
qual-at-a-time.  We've already got facilities for the latter case.

> But that kinda resembles stuff we already have - selectivity/cost. So
> why shouldn't this be considered as part of costing?

Yeah, I'm not entirely convinced that we need anything new here.
The cost estimate function can detect such situations, and so can
the index AM at scan start --- for example, btree checks for
contradictory quals at scan start.  There's a certain amount of
duplicative effort involved there perhaps, but you also have to
keep in mind that we don't know the values of run-time-determined
comparison values until scan start.  So if you want certainty rather
than just a cost estimate, you may have to do these sorts of checks
at scan start.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Avoid full GIN index scan when possible

Julien Rouhaud
On Fri, Jun 28, 2019 at 10:16 PM Tom Lane <[hidden email]> wrote:

>
> Tomas Vondra <[hidden email]> writes:
> > On Fri, Jun 28, 2019 at 03:03:19PM -0400, Tom Lane wrote:
> >> I not only don't want that function in indxpath.c, I don't even want
> >> it to be known/called from there.  If we need the ability for the index
> >> AM to editorialize on the list of indexable quals (which I'm not very
> >> convinced of yet), let's make an AM interface function to do it.
>
> > Wouldn't it be better to have a function that inspects a single qual and
> > says whether it's "optimizable" or not? That could be part of the AM
> > implementation, and we'd call it and it'd be us messing with the list.
>
> Uh ... we already determined that the qual is indexable (ie is a member
> of the index's opclass), or allowed the index AM to derive an indexable
> clause from it, so I'm not sure what you envision would happen
> additionally there.  If I understand what Julien is concerned about
> --- and I may not --- it's that the set of indexable clauses *as a whole*
> may have or lack properties of interest.  So I'm thinking the answer
> involves some callback that can do something to the whole list, not
> qual-at-a-time.  We've already got facilities for the latter case.

Yes, the root issue here is that with gin it's entirely possible that
"WHERE sometable.col op value1" is way more efficient than "WHERE
sometable.col op value AND sometable.col op value2", where both qual
are determined indexable by the opclass.  The only way to avoid that
is indeed to inspect the whole list, as done in this poor POC.

This is a problem actually hit in production, and as far as I know
there's no easy way from the application POV to prevent unexpected
slowdown.  Maybe Marc will have more details about the actual problem
and how expensive such a case was compared to the normal ones.

> > But that kinda resembles stuff we already have - selectivity/cost. So
> > why shouldn't this be considered as part of costing?
>
> Yeah, I'm not entirely convinced that we need anything new here.
> The cost estimate function can detect such situations, and so can
> the index AM at scan start --- for example, btree checks for
> contradictory quals at scan start.  There's a certain amount of
> duplicative effort involved there perhaps, but you also have to
> keep in mind that we don't know the values of run-time-determined
> comparison values until scan start.  So if you want certainty rather
> than just a cost estimate, you may have to do these sorts of checks
> at scan start.

Ah, I didn't know about _bt_preprocess_keys().  I'm not familiar with
this code, so please bear with me.  IIUC the idea would be to add
additional logic in gingetbitmap() / ginNewScanKey() to drop some
quals at runtime.  But that would mean that additional logic would
also be required in BitmapHeapScan, or that all the returned bitmap
should be artificially marked as lossy to enforce a recheck?


Reply | Threaded
Open this post in threaded view
|

Re: Avoid full GIN index scan when possible

Nikita Glukhov
Hi!

On 29.06.2019 1:23, Julien Rouhaud wrote:
But that kinda resembles stuff we already have - selectivity/cost. So
why shouldn't this be considered as part of costing?
Yeah, I'm not entirely convinced that we need anything new here.
The cost estimate function can detect such situations, and so can
the index AM at scan start --- for example, btree checks for
contradictory quals at scan start.  There's a certain amount of
duplicative effort involved there perhaps, but you also have to
keep in mind that we don't know the values of run-time-determined
comparison values until scan start.  So if you want certainty rather
than just a cost estimate, you may have to do these sorts of checks
at scan start.
Ah, I didn't know about _bt_preprocess_keys().  I'm not familiar with
this code, so please bear with me.  IIUC the idea would be to add
additional logic in gingetbitmap() / ginNewScanKey() to drop some
quals at runtime.  But that would mean that additional logic would
also be required in BitmapHeapScan, or that all the returned bitmap
should be artificially marked as lossy to enforce a recheck?

We have a similar solution for this problem.  The idea is to avoid full index
scan inside GIN itself when we have some GIN entries, and forcibly recheck
all tuples if triconsistent() returns GIN_MAYBE for the keys that emitted no
GIN entries.

The attached patch in its current shape contain at least two ugly places:

1. We still need to initialize empty scan key to call triconsistent(), but 
   then we have to remove it from the list of scan keys.  Simple refactoring 
   of ginFillScanKey() can be helpful here.
 
2. We need to replace GIN_SEARCH_MODE_EVERYTHING with GIN_SEARCH_MODE_ALL
   if there are no GIN entries and some key requested GIN_SEARCH_MODE_ALL
   because we need to skip NULLs in GIN_SEARCH_MODE_ALL.  Simplest example here
   is "array @> '{}'": triconsistent() returns GIN_TRUE, recheck is not forced,
   and GIN_SEARCH_MODE_EVERYTHING returns NULLs that are not rechecked.  Maybe
   it would be better to introduce new GIN_SEARCH_MODE_EVERYTHING_NON_NULL.



Example:

CREATE TABLE test AS SELECT i::text AS t FROM generate_series(0, 999999) i;

CREATE INDEX ON test USING gin (t gin_trgm_ops);

-- master
EXPLAIN ANALYZE SELECT * FROM test WHERE LIKE '%1234%' AND t LIKE '%1%';
                                                          QUERY PLAN                                                          
------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on test  (cost=11777.99..16421.73 rows=7999 width=32) (actual time=65.431..65.857 rows=300 loops=1)
   Recheck Cond: ((t ~~ '%1234%'::text) AND (t ~~ '%1%'::text))
   Rows Removed by Index Recheck: 2
   Heap Blocks: exact=114
   ->  Bitmap Index Scan on test_t_idx  (cost=0.00..11775.99 rows=7999 width=0) (actual time=65.380..65.380 rows=302 loops=1)
         Index Cond: ((t ~~ '%1234%'::text) AND (t ~~ '%1%'::text))
 Planning Time: 0.151 ms
 Execution Time: 65.900 ms
(8 rows)


-- patched
EXPLAIN ANALYZE SELECT * FROM test WHERE t LIKE '%1234%' AND t LIKE '%1%';
                                                      QUERY PLAN                                                       
-----------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on test  (cost=20.43..176.79 rows=42 width=6) (actual time=0.287..0.424 rows=300 loops=1)
   Recheck Cond: ((t ~~ '%1234%'::text) AND (t ~~ '%1%'::text))
   Rows Removed by Index Recheck: 2
   Heap Blocks: exact=114
   ->  Bitmap Index Scan on test_t_idx  (cost=0.00..20.42 rows=42 width=0) (actual time=0.271..0.271 rows=302 loops=1)
         Index Cond: ((t ~~ '%1234%'::text) AND (t ~~ '%1%'::text))
 Planning Time: 0.080 ms
 Execution Time: 0.450 ms
(8 rows)

-- 
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


0001-Avoid-full-GIN-index-scan-v3.patch (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Avoid full GIN index scan when possible

Tomas Vondra-4
In reply to this post by Tom Lane-2
On Fri, Jun 28, 2019 at 04:16:23PM -0400, Tom Lane wrote:

>Tomas Vondra <[hidden email]> writes:
>> On Fri, Jun 28, 2019 at 03:03:19PM -0400, Tom Lane wrote:
>>> I not only don't want that function in indxpath.c, I don't even want
>>> it to be known/called from there.  If we need the ability for the index
>>> AM to editorialize on the list of indexable quals (which I'm not very
>>> convinced of yet), let's make an AM interface function to do it.
>
>> Wouldn't it be better to have a function that inspects a single qual and
>> says whether it's "optimizable" or not? That could be part of the AM
>> implementation, and we'd call it and it'd be us messing with the list.
>
>Uh ... we already determined that the qual is indexable (ie is a member
>of the index's opclass), or allowed the index AM to derive an indexable
>clause from it, so I'm not sure what you envision would happen
>additionally there.  If I understand what Julien is concerned about
>--- and I may not --- it's that the set of indexable clauses *as a whole*
>may have or lack properties of interest.  So I'm thinking the answer
>involves some callback that can do something to the whole list, not
>qual-at-a-time.  We've already got facilities for the latter case.
>

I'm not sure I understand the problem either.

I don't think "indexable" is the thing we care about here - in Julien's
original example the qual with '%a%' is indexable. And we probably want
to keep it that way.

The problem is that evaluating some of the quals may be inefficient with
a given index - but only if there are other quals. In Julien's example
it makes sense to just drop the '%a%' qual, but only when there are some
quals that work with the trigram index. But if there are no such 'good'
quals, it may be better to keep al least the bad ones.

So I think you're right we need to look at the list as a whole.

>> But that kinda resembles stuff we already have - selectivity/cost. So
>> why shouldn't this be considered as part of costing?
>
>Yeah, I'm not entirely convinced that we need anything new here.
>The cost estimate function can detect such situations, and so can
>the index AM at scan start --- for example, btree checks for
>contradictory quals at scan start.  There's a certain amount of
>duplicative effort involved there perhaps, but you also have to
>keep in mind that we don't know the values of run-time-determined
>comparison values until scan start.  So if you want certainty rather
>than just a cost estimate, you may have to do these sorts of checks
>at scan start.
>

Right, that's why I suggested doing this as part of costing, but you're
right scan start would be another option. I assume it should affect cost
estimates in some way, so the cost function would be my first choice.

But does the cost function really has enough info to make such decision?
For example, ignoring quals is valid only if we recheck them later. For
GIN that's not an issue thanks to the bitmap index scan.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Avoid full GIN index scan when possible

Julien Rouhaud
In reply to this post by Nikita Glukhov
On Sat, Jun 29, 2019 at 12:51 AM Nikita Glukhov
<[hidden email]> wrote:>

> On 29.06.2019 1:23, Julien Rouhaud wrote:
>
> But that kinda resembles stuff we already have - selectivity/cost. So
> why shouldn't this be considered as part of costing?
>
> Yeah, I'm not entirely convinced that we need anything new here.
> The cost estimate function can detect such situations, and so can
> the index AM at scan start --- for example, btree checks for
> contradictory quals at scan start.  There's a certain amount of
> duplicative effort involved there perhaps, but you also have to
> keep in mind that we don't know the values of run-time-determined
> comparison values until scan start.  So if you want certainty rather
> than just a cost estimate, you may have to do these sorts of checks
> at scan start.
>
> Ah, I didn't know about _bt_preprocess_keys().  I'm not familiar with
> this code, so please bear with me.  IIUC the idea would be to add
> additional logic in gingetbitmap() / ginNewScanKey() to drop some
> quals at runtime.  But that would mean that additional logic would
> also be required in BitmapHeapScan, or that all the returned bitmap
> should be artificially marked as lossy to enforce a recheck?
>
> We have a similar solution for this problem.  The idea is to avoid full index
> scan inside GIN itself when we have some GIN entries, and forcibly recheck
> all tuples if triconsistent() returns GIN_MAYBE for the keys that emitted no
> GIN entries.

Thanks for looking at it.  That's I think a way better approach.

> The attached patch in its current shape contain at least two ugly places:
>
> 1. We still need to initialize empty scan key to call triconsistent(), but
>    then we have to remove it from the list of scan keys.  Simple refactoring
>    of ginFillScanKey() can be helpful here.
>
> 2. We need to replace GIN_SEARCH_MODE_EVERYTHING with GIN_SEARCH_MODE_ALL
>    if there are no GIN entries and some key requested GIN_SEARCH_MODE_ALL
>    because we need to skip NULLs in GIN_SEARCH_MODE_ALL.  Simplest example here
>    is "array @> '{}'": triconsistent() returns GIN_TRUE, recheck is not forced,
>    and GIN_SEARCH_MODE_EVERYTHING returns NULLs that are not rechecked.  Maybe
>    it would be better to introduce new GIN_SEARCH_MODE_EVERYTHING_NON_NULL.

Also

+       if (searchMode == GIN_SEARCH_MODE_ALL && nQueryValues <= 0)
+       {
+           /*
+            * Don't emit ALL key with no entries, check only whether
+            * unconditional recheck is needed.
+            */
+           GinScanKey  key = &so->keys[--so->nkeys];
+
+           hasSearchAllMode = true;
+           so->forcedRecheck = key->triConsistentFn(key) != GIN_TRUE;
+       }

Shouldn't you make sure that  the forcedRecheck flag can't reset?

> -- patched
> EXPLAIN ANALYZE SELECT * FROM test WHERE t LIKE '%1234%' AND t LIKE '%1%';
>                                                       QUERY PLAN
> -----------------------------------------------------------------------------------------------------------------------
>  Bitmap Heap Scan on test  (cost=20.43..176.79 rows=42 width=6) (actual time=0.287..0.424 rows=300 loops=1)
>    Recheck Cond: ((t ~~ '%1234%'::text) AND (t ~~ '%1%'::text))
>    Rows Removed by Index Recheck: 2
>    Heap Blocks: exact=114
>    ->  Bitmap Index Scan on test_t_idx  (cost=0.00..20.42 rows=42 width=0) (actual time=0.271..0.271 rows=302 loops=1)
>          Index Cond: ((t ~~ '%1234%'::text) AND (t ~~ '%1%'::text))
>  Planning Time: 0.080 ms
>  Execution Time: 0.450 ms
> (8 rows)

One thing that's bothering me is that the explain implies that the
LIKE '%i% was part of the index scan, while in reality it wasn't.  One
of the reason why I tried to modify the qual while generating the path
was to have the explain be clearer about what is really done.


Reply | Threaded
Open this post in threaded view
|

Re: Avoid full GIN index scan when possible

Tomas Vondra-4
On Sat, Jun 29, 2019 at 11:10:03AM +0200, Julien Rouhaud wrote:

>On Sat, Jun 29, 2019 at 12:51 AM Nikita Glukhov
><[hidden email]> wrote:>
>> On 29.06.2019 1:23, Julien Rouhaud wrote:
>>
>> But that kinda resembles stuff we already have - selectivity/cost. So
>> why shouldn't this be considered as part of costing?
>>
>> Yeah, I'm not entirely convinced that we need anything new here.
>> The cost estimate function can detect such situations, and so can
>> the index AM at scan start --- for example, btree checks for
>> contradictory quals at scan start.  There's a certain amount of
>> duplicative effort involved there perhaps, but you also have to
>> keep in mind that we don't know the values of run-time-determined
>> comparison values until scan start.  So if you want certainty rather
>> than just a cost estimate, you may have to do these sorts of checks
>> at scan start.
>>
>> Ah, I didn't know about _bt_preprocess_keys().  I'm not familiar with
>> this code, so please bear with me.  IIUC the idea would be to add
>> additional logic in gingetbitmap() / ginNewScanKey() to drop some
>> quals at runtime.  But that would mean that additional logic would
>> also be required in BitmapHeapScan, or that all the returned bitmap
>> should be artificially marked as lossy to enforce a recheck?
>>
>> We have a similar solution for this problem.  The idea is to avoid full index
>> scan inside GIN itself when we have some GIN entries, and forcibly recheck
>> all tuples if triconsistent() returns GIN_MAYBE for the keys that emitted no
>> GIN entries.
>
>Thanks for looking at it.  That's I think a way better approach.
>
>> The attached patch in its current shape contain at least two ugly places:
>>
>> 1. We still need to initialize empty scan key to call triconsistent(), but
>>    then we have to remove it from the list of scan keys.  Simple refactoring
>>    of ginFillScanKey() can be helpful here.
>>
>> 2. We need to replace GIN_SEARCH_MODE_EVERYTHING with GIN_SEARCH_MODE_ALL
>>    if there are no GIN entries and some key requested GIN_SEARCH_MODE_ALL
>>    because we need to skip NULLs in GIN_SEARCH_MODE_ALL.  Simplest example here
>>    is "array @> '{}'": triconsistent() returns GIN_TRUE, recheck is not forced,
>>    and GIN_SEARCH_MODE_EVERYTHING returns NULLs that are not rechecked.  Maybe
>>    it would be better to introduce new GIN_SEARCH_MODE_EVERYTHING_NON_NULL.
>
>Also
>
>+       if (searchMode == GIN_SEARCH_MODE_ALL && nQueryValues <= 0)
>+       {
>+           /*
>+            * Don't emit ALL key with no entries, check only whether
>+            * unconditional recheck is needed.
>+            */
>+           GinScanKey  key = &so->keys[--so->nkeys];
>+
>+           hasSearchAllMode = true;
>+           so->forcedRecheck = key->triConsistentFn(key) != GIN_TRUE;
>+       }
>
>Shouldn't you make sure that  the forcedRecheck flag can't reset?
>
>> -- patched
>> EXPLAIN ANALYZE SELECT * FROM test WHERE t LIKE '%1234%' AND t LIKE '%1%';
>>                                                       QUERY PLAN
>> -----------------------------------------------------------------------------------------------------------------------
>>  Bitmap Heap Scan on test  (cost=20.43..176.79 rows=42 width=6) (actual time=0.287..0.424 rows=300 loops=1)
>>    Recheck Cond: ((t ~~ '%1234%'::text) AND (t ~~ '%1%'::text))
>>    Rows Removed by Index Recheck: 2
>>    Heap Blocks: exact=114
>>    ->  Bitmap Index Scan on test_t_idx  (cost=0.00..20.42 rows=42 width=0) (actual time=0.271..0.271 rows=302 loops=1)
>>          Index Cond: ((t ~~ '%1234%'::text) AND (t ~~ '%1%'::text))
>>  Planning Time: 0.080 ms
>>  Execution Time: 0.450 ms
>> (8 rows)
>
>One thing that's bothering me is that the explain implies that the
>LIKE '%i% was part of the index scan, while in reality it wasn't.  One
>of the reason why I tried to modify the qual while generating the path
>was to have the explain be clearer about what is really done.

Yeah, I think that's a bit annoying - it'd be nice to make it clear
which quals were actually used to scan the index. It some cases it may
not be possible (e.g. in cases when the decision is done at runtime, not
while planning the query), but it'd be nice to show it when possible.

A related issue is that during costing is too late to modify cardinality
estimates, so the 'Bitmap Index Scan' will be expected to return fewer
rows than it actually returns (after ignoring the full-scan quals).
Ignoring redundant quals (the way btree does it at execution) does not
have such consequence, of course.

Which may be an issue, because we essentially want to modify the list of
quals to minimize the cost of

   bitmap index scan + recheck during bitmap heap scan

OTOH it's not a huge issue, because it won't affect the rest of the plan
(because that uses the bitmap heap scan estimates, and those are not
affected by this).


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Avoid full GIN index scan when possible

Julien Rouhaud
On Sat, Jun 29, 2019 at 12:25 PM Tomas Vondra
<[hidden email]> wrote:

>
> On Sat, Jun 29, 2019 at 11:10:03AM +0200, Julien Rouhaud wrote:
> >On Sat, Jun 29, 2019 at 12:51 AM Nikita Glukhov
> >> -- patched
> >> EXPLAIN ANALYZE SELECT * FROM test WHERE t LIKE '%1234%' AND t LIKE '%1%';
> >>                                                       QUERY PLAN
> >> -----------------------------------------------------------------------------------------------------------------------
> >>  Bitmap Heap Scan on test  (cost=20.43..176.79 rows=42 width=6) (actual time=0.287..0.424 rows=300 loops=1)
> >>    Recheck Cond: ((t ~~ '%1234%'::text) AND (t ~~ '%1%'::text))
> >>    Rows Removed by Index Recheck: 2
> >>    Heap Blocks: exact=114
> >>    ->  Bitmap Index Scan on test_t_idx  (cost=0.00..20.42 rows=42 width=0) (actual time=0.271..0.271 rows=302 loops=1)
> >>          Index Cond: ((t ~~ '%1234%'::text) AND (t ~~ '%1%'::text))
> >>  Planning Time: 0.080 ms
> >>  Execution Time: 0.450 ms
> >> (8 rows)
> >
> >One thing that's bothering me is that the explain implies that the
> >LIKE '%i% was part of the index scan, while in reality it wasn't.  One
> >of the reason why I tried to modify the qual while generating the path
> >was to have the explain be clearer about what is really done.
>
> Yeah, I think that's a bit annoying - it'd be nice to make it clear
> which quals were actually used to scan the index. It some cases it may
> not be possible (e.g. in cases when the decision is done at runtime, not
> while planning the query), but it'd be nice to show it when possible.

Maybe we could somehow add some runtime information about ignored
quals, similar to the "never executed" information for loops?

> A related issue is that during costing is too late to modify cardinality
> estimates, so the 'Bitmap Index Scan' will be expected to return fewer
> rows than it actually returns (after ignoring the full-scan quals).
> Ignoring redundant quals (the way btree does it at execution) does not
> have such consequence, of course.
>
> Which may be an issue, because we essentially want to modify the list of
> quals to minimize the cost of
>
>    bitmap index scan + recheck during bitmap heap scan
>
> OTOH it's not a huge issue, because it won't affect the rest of the plan
> (because that uses the bitmap heap scan estimates, and those are not
> affected by this).

Doesn't this problem already exists, as the quals that we could drop
can't actually reduce the node's results?


Reply | Threaded
Open this post in threaded view
|

Re: Avoid full GIN index scan when possible

Tomas Vondra-4
On Sat, Jun 29, 2019 at 02:50:51PM +0200, Julien Rouhaud wrote:

>On Sat, Jun 29, 2019 at 12:25 PM Tomas Vondra
><[hidden email]> wrote:
>>
>> On Sat, Jun 29, 2019 at 11:10:03AM +0200, Julien Rouhaud wrote:
>> >On Sat, Jun 29, 2019 at 12:51 AM Nikita Glukhov
>> >> -- patched
>> >> EXPLAIN ANALYZE SELECT * FROM test WHERE t LIKE '%1234%' AND t LIKE '%1%';
>> >>                                                       QUERY PLAN
>> >> -----------------------------------------------------------------------------------------------------------------------
>> >>  Bitmap Heap Scan on test  (cost=20.43..176.79 rows=42 width=6) (actual time=0.287..0.424 rows=300 loops=1)
>> >>    Recheck Cond: ((t ~~ '%1234%'::text) AND (t ~~ '%1%'::text))
>> >>    Rows Removed by Index Recheck: 2
>> >>    Heap Blocks: exact=114
>> >>    ->  Bitmap Index Scan on test_t_idx  (cost=0.00..20.42 rows=42 width=0) (actual time=0.271..0.271 rows=302 loops=1)
>> >>          Index Cond: ((t ~~ '%1234%'::text) AND (t ~~ '%1%'::text))
>> >>  Planning Time: 0.080 ms
>> >>  Execution Time: 0.450 ms
>> >> (8 rows)
>> >
>> >One thing that's bothering me is that the explain implies that the
>> >LIKE '%i% was part of the index scan, while in reality it wasn't.  One
>> >of the reason why I tried to modify the qual while generating the path
>> >was to have the explain be clearer about what is really done.
>>
>> Yeah, I think that's a bit annoying - it'd be nice to make it clear
>> which quals were actually used to scan the index. It some cases it may
>> not be possible (e.g. in cases when the decision is done at runtime, not
>> while planning the query), but it'd be nice to show it when possible.
>
>Maybe we could somehow add some runtime information about ignored
>quals, similar to the "never executed" information for loops?
>

Maybe. I suppose it depends on when exactly we make the decision about
which quals to ignore.

>> A related issue is that during costing is too late to modify cardinality
>> estimates, so the 'Bitmap Index Scan' will be expected to return fewer
>> rows than it actually returns (after ignoring the full-scan quals).
>> Ignoring redundant quals (the way btree does it at execution) does not
>> have such consequence, of course.
>>
>> Which may be an issue, because we essentially want to modify the list of
>> quals to minimize the cost of
>>
>>    bitmap index scan + recheck during bitmap heap scan
>>
>> OTOH it's not a huge issue, because it won't affect the rest of the plan
>> (because that uses the bitmap heap scan estimates, and those are not
>> affected by this).
>
>Doesn't this problem already exists, as the quals that we could drop
>can't actually reduce the node's results?

How could it not reduce the node's results, if you ignore some quals
that are not redundant? My understanding is we have a plan like this:

    Bitmap Heap Scan
      -> Bitmap Index Scan

and by ignoring some quals at the index scan level, we trade the (high)
cost of evaluating the qual there for a plain recheck at the bitmap heap
scan. But it means the index scan may produce more rows, so it's only a
win if the "extra rechecks" are cheaper than the (removed) full scan.

So the full scan might actually reduce the number of rows from the index
scan, but clearly whatever we do the results from the bitmap heap scan
must remain the same.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Avoid full GIN index scan when possible

Julien Rouhaud
On Sat, Jun 29, 2019 at 3:11 PM Tomas Vondra
<[hidden email]> wrote:

>
> On Sat, Jun 29, 2019 at 02:50:51PM +0200, Julien Rouhaud wrote:
> >On Sat, Jun 29, 2019 at 12:25 PM Tomas Vondra
> >> A related issue is that during costing is too late to modify cardinality
> >> estimates, so the 'Bitmap Index Scan' will be expected to return fewer
> >> rows than it actually returns (after ignoring the full-scan quals).
> >> Ignoring redundant quals (the way btree does it at execution) does not
> >> have such consequence, of course.
> >>
> >> Which may be an issue, because we essentially want to modify the list of
> >> quals to minimize the cost of
> >>
> >>    bitmap index scan + recheck during bitmap heap scan
> >>
> >> OTOH it's not a huge issue, because it won't affect the rest of the plan
> >> (because that uses the bitmap heap scan estimates, and those are not
> >> affected by this).
> >
> >Doesn't this problem already exists, as the quals that we could drop
> >can't actually reduce the node's results?
>
> How could it not reduce the node's results, if you ignore some quals
> that are not redundant? My understanding is we have a plan like this:
>
>     Bitmap Heap Scan
>       -> Bitmap Index Scan
>
> and by ignoring some quals at the index scan level, we trade the (high)
> cost of evaluating the qual there for a plain recheck at the bitmap heap
> scan. But it means the index scan may produce more rows, so it's only a
> win if the "extra rechecks" are cheaper than the (removed) full scan.

Sorry, by node I meant the BitmapIndexScan.  AIUI, if you have for
instance WHERE val LIKE '%abcde%' AND val LIKE '%z%' and a trgm index,
the BitmapIndexScan will have to through the whole index and discard
rows based on the only opclass-optimizable qual (LIKE '%abcde%'),
letting the recheck do the proper filtering for the other qual.  So
whether you have the LIKE '%z%' or not in the index scan, the
BitmapIndexScan will return the same number of rows, the only
difference being that in one case you'll have to scan the whole index,
while in the other case you won't.

> clearly whatever we do the results from the bitmap heap scan
> must remain the same.

Of course.


Reply | Threaded
Open this post in threaded view
|

Re: Avoid full GIN index scan when possible

Tomas Vondra-4
On Sat, Jun 29, 2019 at 03:28:11PM +0200, Julien Rouhaud wrote:

>On Sat, Jun 29, 2019 at 3:11 PM Tomas Vondra
><[hidden email]> wrote:
>>
>> On Sat, Jun 29, 2019 at 02:50:51PM +0200, Julien Rouhaud wrote:
>> >On Sat, Jun 29, 2019 at 12:25 PM Tomas Vondra
>> >> A related issue is that during costing is too late to modify cardinality
>> >> estimates, so the 'Bitmap Index Scan' will be expected to return fewer
>> >> rows than it actually returns (after ignoring the full-scan quals).
>> >> Ignoring redundant quals (the way btree does it at execution) does not
>> >> have such consequence, of course.
>> >>
>> >> Which may be an issue, because we essentially want to modify the list of
>> >> quals to minimize the cost of
>> >>
>> >>    bitmap index scan + recheck during bitmap heap scan
>> >>
>> >> OTOH it's not a huge issue, because it won't affect the rest of the plan
>> >> (because that uses the bitmap heap scan estimates, and those are not
>> >> affected by this).
>> >
>> >Doesn't this problem already exists, as the quals that we could drop
>> >can't actually reduce the node's results?
>>
>> How could it not reduce the node's results, if you ignore some quals
>> that are not redundant? My understanding is we have a plan like this:
>>
>>     Bitmap Heap Scan
>>       -> Bitmap Index Scan
>>
>> and by ignoring some quals at the index scan level, we trade the (high)
>> cost of evaluating the qual there for a plain recheck at the bitmap heap
>> scan. But it means the index scan may produce more rows, so it's only a
>> win if the "extra rechecks" are cheaper than the (removed) full scan.
>
>Sorry, by node I meant the BitmapIndexScan.  AIUI, if you have for
>instance WHERE val LIKE '%abcde%' AND val LIKE '%z%' and a trgm index,
>the BitmapIndexScan will have to through the whole index and discard
>rows based on the only opclass-optimizable qual (LIKE '%abcde%'),
>letting the recheck do the proper filtering for the other qual.  So
>whether you have the LIKE '%z%' or not in the index scan, the
>BitmapIndexScan will return the same number of rows, the only
>difference being that in one case you'll have to scan the whole index,
>while in the other case you won't.
>

Oh! I thought 'full scan' means we have to scan all the keys in the GIN
index, but we can still eliminate some of the keys (for example for the
trigrams we might check if the trigram contains the short string). But
clearly I was mistaken and it does not work like that ...


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Avoid full GIN index scan when possible

Alexander Korotkov
In reply to this post by Nikita Glukhov
Hi!

On Sat, Jun 29, 2019 at 1:52 AM Nikita Glukhov <[hidden email]> wrote:

> We have a similar solution for this problem.  The idea is to avoid full index
> scan inside GIN itself when we have some GIN entries, and forcibly recheck
> all tuples if triconsistent() returns GIN_MAYBE for the keys that emitted no
> GIN entries.
>
> The attached patch in its current shape contain at least two ugly places:
>
> 1. We still need to initialize empty scan key to call triconsistent(), but
>    then we have to remove it from the list of scan keys.  Simple refactoring
>    of ginFillScanKey() can be helpful here.
>
> 2. We need to replace GIN_SEARCH_MODE_EVERYTHING with GIN_SEARCH_MODE_ALL
>    if there are no GIN entries and some key requested GIN_SEARCH_MODE_ALL
>    because we need to skip NULLs in GIN_SEARCH_MODE_ALL.  Simplest example here
>    is "array @> '{}'": triconsistent() returns GIN_TRUE, recheck is not forced,
>    and GIN_SEARCH_MODE_EVERYTHING returns NULLs that are not rechecked.  Maybe
>    it would be better to introduce new GIN_SEARCH_MODE_EVERYTHING_NON_NULL.

Thank you for publishing this!

What would happen when two-columns index have GIN_SEARCH_MODE_DEFAULT
scan on first column and GIN_SEARCH_MODE_ALL on second?  I think even
if triconsistent() for second column returns GIN_TRUE, we still need
to recheck to verify second columns is not NULL.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Reply | Threaded
Open this post in threaded view
|

Re: Avoid full GIN index scan when possible

Alexander Korotkov
In reply to this post by Tomas Vondra-4
On Sat, Jun 29, 2019 at 1:25 PM Tomas Vondra
<[hidden email]> wrote:
> A related issue is that during costing is too late to modify cardinality
> estimates, so the 'Bitmap Index Scan' will be expected to return fewer
> rows than it actually returns (after ignoring the full-scan quals).
> Ignoring redundant quals (the way btree does it at execution) does not
> have such consequence, of course.

Adjust cardinality estimates should be possible in gincostestimate(),
because we call extractquery() method there.  However, it seems to be
quite independent issue.  Number of rows returned by 'Bitmap Index
Scan' doesn't vary much whether we execute GIN_SEARCH_MODE_ALL or not.
The only difference is for multicolumn index, GIN_SEARCH_MODE_ALL
allows to exclude NULL on one column, when normal scan is performed on
another column.  And we can take it into account in gincostestimate().

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Reply | Threaded
Open this post in threaded view
|

Re: Avoid full GIN index scan when possible

Alexander Korotkov
In reply to this post by Julien Rouhaud
On Sat, Jun 29, 2019 at 3:51 PM Julien Rouhaud <[hidden email]> wrote:

> On Sat, Jun 29, 2019 at 12:25 PM Tomas Vondra
> <[hidden email]> wrote:
> >
> > On Sat, Jun 29, 2019 at 11:10:03AM +0200, Julien Rouhaud wrote:
> > >On Sat, Jun 29, 2019 at 12:51 AM Nikita Glukhov
> > >> -- patched
> > >> EXPLAIN ANALYZE SELECT * FROM test WHERE t LIKE '%1234%' AND t LIKE '%1%';
> > >>                                                       QUERY PLAN
> > >> -----------------------------------------------------------------------------------------------------------------------
> > >>  Bitmap Heap Scan on test  (cost=20.43..176.79 rows=42 width=6) (actual time=0.287..0.424 rows=300 loops=1)
> > >>    Recheck Cond: ((t ~~ '%1234%'::text) AND (t ~~ '%1%'::text))
> > >>    Rows Removed by Index Recheck: 2
> > >>    Heap Blocks: exact=114
> > >>    ->  Bitmap Index Scan on test_t_idx  (cost=0.00..20.42 rows=42 width=0) (actual time=0.271..0.271 rows=302 loops=1)
> > >>          Index Cond: ((t ~~ '%1234%'::text) AND (t ~~ '%1%'::text))
> > >>  Planning Time: 0.080 ms
> > >>  Execution Time: 0.450 ms
> > >> (8 rows)
> > >
> > >One thing that's bothering me is that the explain implies that the
> > >LIKE '%i% was part of the index scan, while in reality it wasn't.  One
> > >of the reason why I tried to modify the qual while generating the path
> > >was to have the explain be clearer about what is really done.
> >
> > Yeah, I think that's a bit annoying - it'd be nice to make it clear
> > which quals were actually used to scan the index. It some cases it may
> > not be possible (e.g. in cases when the decision is done at runtime, not
> > while planning the query), but it'd be nice to show it when possible.
>
> Maybe we could somehow add some runtime information about ignored
> quals, similar to the "never executed" information for loops?

+1,
This sounds reasonable for me.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Reply | Threaded
Open this post in threaded view
|

Re: Avoid full GIN index scan when possible

Marc Cousin-2
In reply to this post by Julien Rouhaud
On 29/06/2019 00:23, Julien Rouhaud wrote:

> On Fri, Jun 28, 2019 at 10:16 PM Tom Lane <[hidden email]> wrote:
>>
>> Tomas Vondra <[hidden email]> writes:
>>> On Fri, Jun 28, 2019 at 03:03:19PM -0400, Tom Lane wrote:
>>>> I not only don't want that function in indxpath.c, I don't even want
>>>> it to be known/called from there.  If we need the ability for the index
>>>> AM to editorialize on the list of indexable quals (which I'm not very
>>>> convinced of yet), let's make an AM interface function to do it.
>>
>>> Wouldn't it be better to have a function that inspects a single qual and
>>> says whether it's "optimizable" or not? That could be part of the AM
>>> implementation, and we'd call it and it'd be us messing with the list.
>>
>> Uh ... we already determined that the qual is indexable (ie is a member
>> of the index's opclass), or allowed the index AM to derive an indexable
>> clause from it, so I'm not sure what you envision would happen
>> additionally there.  If I understand what Julien is concerned about
>> --- and I may not --- it's that the set of indexable clauses *as a whole*
>> may have or lack properties of interest.  So I'm thinking the answer
>> involves some callback that can do something to the whole list, not
>> qual-at-a-time.  We've already got facilities for the latter case.
>
> Yes, the root issue here is that with gin it's entirely possible that
> "WHERE sometable.col op value1" is way more efficient than "WHERE
> sometable.col op value AND sometable.col op value2", where both qual
> are determined indexable by the opclass.  The only way to avoid that
> is indeed to inspect the whole list, as done in this poor POC.
>
> This is a problem actually hit in production, and as far as I know
> there's no easy way from the application POV to prevent unexpected
> slowdown.  Maybe Marc will have more details about the actual problem
> and how expensive such a case was compared to the normal ones.
Sorry for the delay...

Yes, quite easily, here is what we had (it's just a bit simplified, we have other criterions but I think it shows the problem):

rh2=> explain analyze select * from account_employee where typeahead like '%albert%';
                                                                       QUERY PLAN                                                                      
--------------------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on account_employee  (cost=53.69..136.27 rows=734 width=666) (actual time=15.562..35.044 rows=8957 loops=1)
   Recheck Cond: (typeahead ~~ '%albert%'::text)
   Rows Removed by Index Recheck: 46
   Heap Blocks: exact=8919
   ->  Bitmap Index Scan on account_employee_site_typeahead_gin_idx  (cost=0.00..53.51 rows=734 width=0) (actual time=14.135..14.135 rows=9011 loops=1)
         Index Cond: (typeahead ~~ '%albert%'::text)
 Planning time: 0.224 ms
 Execution time: 35.389 ms
(8 rows)

rh2=> explain analyze select * from account_employee where typeahead like '%albert%' and typeahead like '%lo%';
                                                                           QUERY PLAN                                                                          
----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on account_employee  (cost=28358.38..28366.09 rows=67 width=666) (actual time=18210.109..18227.134 rows=1172 loops=1)
   Recheck Cond: ((typeahead ~~ '%albert%'::text) AND (typeahead ~~ '%lo%'::text))
   Rows Removed by Index Recheck: 7831
   Heap Blocks: exact=8919
   ->  Bitmap Index Scan on account_employee_site_typeahead_gin_idx  (cost=0.00..28358.37 rows=67 width=0) (actual time=18204.756..18204.756 rows=9011 loops=1)
         Index Cond: ((typeahead ~~ '%albert%'::text) AND (typeahead ~~ '%lo%'::text))
 Planning time: 0.288 ms
 Execution time: 18230.182 ms
(8 rows)


We noticed this because the application timed out for users searching someone whose name was 2 characters ( it happens :) ).

We reject such filters when it's the only criterion, as we know it's going to be slow, but ignoring it as a supplementary filter would be a bit weird.

Of course there is the possibility of filtering with two stages with a CTE, but that's not as great as having PostgreSQL doing it itself.


By the way, while preparing this, I noticed that it seems that during this kind of index scan, the interrupt signal is masked
for a very long time. Control-C takes a very long while to cancel the query. But it's an entirely different problem :)

Regards


signature.asc (499 bytes) Download Attachment
123