[HACKERS] subquery returning array

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[HACKERS] subquery returning array

Zeljko Vrba
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi! I have posted this as a user comment in section 9.17:

- ----
= ANY(array expression) doesn't work in the obvious way when the array
expression is a subselect. For example:

select * from stat3 where stat3.id = any ('{4,5,6,7}');

works (and returns the expected tuples). However,

select * from stat3 where stat3.id = any (select stat3 from
helix_request where id=11);

DOESN'T work and complains with an error: operator does not exist:
integer = integer[]. The inner select returns EXACTLY ONE value, namely
the same array as the literal in the first example.

The solution is:

select * from stat3 where (select stat3.id = any (stat3) from
helix_request where id=11);

I thank to Chris Kings-Lynne ("KL") for helping me out with this over IRC.
- ----

KL suggested to mail this question to the hackers list. The problem with
this solution is that postgresql uses sequential scan for the proposed
solution:

 Seq Scan on stat3  (cost=0.00..40018.94 rows=3321 width=32) (actual
time=0.112..75.911 rows=4 loops=1)
   Filter: (subplan)
   SubPlan
     ->  Index Scan using helix_request_pkey on helix_request
(cost=0.00..6.01 rows=1 width=32) (actual time=0.007..0.008 rows=1
loops=6756)
           Index Cond: (id = 11)
 Total runtime: 76.040 ms
(6 rows)

If I write select * from stat3 where id in (4,5,6,7) and index scan is used:
 Index Scan using stat3_pkey, stat3_pkey, stat3_pkey, stat3_pkey on
stat3  (cost=0.00..23.94 rows=4 width=32) (actual time=0.066..0.091
rows=4 loops=1)
   Index Cond: ((id = 4) OR (id = 5) OR (id = 6) OR (id = 7))
 Total runtime: 0.164 ms

I'm going to have MANY queries of this kind and having sequential scan
for each such query will lead to quadratic performance - unacceptable
for the amount of data I will have (millions of rows).

Is there yet another way of making WHERE field = ANY (subselect
returning an array) work? Or make postgres to use index?

OK, I know the suggestion from the manual: usually it's bad database
design searching through arrays and a separate table is better. however,
to convert {4,5,6,7} into a table would require a table with two
columns, e.g. (1,4), (1,5), (1,6), (1,7) where the first column would be
used to identify elements in the single array). this additional column
is a waste of space.

KL has mentioned a package for converting an array into a tuple. Where
can I find it? Also, what is the limit on the number of elements in the
IN (...) condition before the database resorts to sequential scan?

PS: Please reply also via e-mail as I'm not a regular subscriber of this
list.

Thanks.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.7 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iQCVAwUBQpF0U2nN2B8CwPRmAQKN8gP+JCzLiX5b48kMYmHRwTSFZWN5Jydfw0iH
MABuYj2mKCY9Dgmd4pLK8Xlxhf/tEYzd3N2lcPFYf1vIXCSpbFasRrO3hJ4WjRLr
MZ6MLXCn59Y8wtd8Iz8uug47XuYVGSreZESKA9tRfl+u8t16nPC9nEMyDNDFIRI9
ZOnBMCnQzrY=
=5qmB
-----END PGP SIGNATURE-----

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: [HACKERS] subquery returning array

Richard Huxton
Zeljko Vrba wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hi! I have posted this as a user comment in section 9.17:
>
> - ----
> = ANY(array expression) doesn't work in the obvious way when the array
> expression is a subselect. For example:
>
> select * from stat3 where stat3.id = any ('{4,5,6,7}');
>
> works (and returns the expected tuples). However,
>
> select * from stat3 where stat3.id = any (select stat3 from
> helix_request where id=11);
>
> DOESN'T work and complains with an error: operator does not exist:
> integer = integer[]. The inner select returns EXACTLY ONE value, namely
> the same array as the literal in the first example.

No, it doesn't - hence the error.

What it returns is a SET containing ONE TUPLE which contains ONE ARRAY.
Here the "any" ranges over the SET not the array, so it tries to compare
the outer integer with the inner array of integers.

> The solution is:
>
> select * from stat3 where (select stat3.id = any (stat3) from
> helix_request where id=11);
>
> I thank to Chris Kings-Lynne ("KL") for helping me out with this over IRC.
> - ----
>
> KL suggested to mail this question to the hackers list. The problem with
> this solution is that postgresql uses sequential scan for the proposed
> solution:
[snip]

> I'm going to have MANY queries of this kind and having sequential scan
> for each such query will lead to quadratic performance - unacceptable
> for the amount of data I will have (millions of rows).
>
> Is there yet another way of making WHERE field = ANY (subselect
> returning an array) work? Or make postgres to use index?
>
> OK, I know the suggestion from the manual: usually it's bad database
> design searching through arrays and a separate table is better. however,
> to convert {4,5,6,7} into a table would require a table with two
> columns, e.g. (1,4), (1,5), (1,6), (1,7) where the first column would be
> used to identify elements in the single array). this additional column
> is a waste of space.

You're trading space for time, and deliberately spoiling your design to
do so. If you're going to want to access each value separately, design
your system to reflect that fact.

Have you actually tested your example with a proper two-column table? It
might be that weaknesses in PostgreSQL force you to compromise your
design, but I wouldn't start from that point. Try a proper relational
design, with a bit of tuning and see if that will work for you.

If you really want to use arrays, perhaps see if the contrib/intarray
module can help you.

--
   Richard Huxton
   Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to [hidden email] so that your
      message can get through to the mailing list cleanly
Reply | Threaded
Open this post in threaded view
|

Re: [HACKERS] subquery returning array

Greg Stark-3
In reply to this post by Zeljko Vrba

Zeljko Vrba <[hidden email]> writes:

> Is there yet another way of making WHERE field = ANY (subselect
> returning an array) work? Or make postgres to use index?

You could use the int_array_enum() function from the contrib/int_agg module.

> Also, what is the limit on the number of elements in the IN (...) condition
> before the database resorts to sequential scan?

It depends on the table. If, for example, there's little free space in your
table and the records are very narrow then sequential scans will be especially
efficient since each i/o will read in many records. An index scan of any
significant size would likely have to read nearly every page in multiple
times.

The problem in your case is that Postgres has no idea how large an array it's
going to find when it's doing the planning. I'm not sure what the fallback
logic is, apparently it's assuming a fairly low selectivity for the =ANY
constraint.

Actually I think there was some discussion a while back about making =ANY a
little less pessimistic about the selectivity. Perhaps 8.1 will be better for
this. If you're still in development with a long timeframe you could try a CVS
build to see if that's the case.


--
greg


---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster