Finding nearest numeric value

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

Finding nearest numeric value

Poul Møller Hansen
Does anyone know how to find the row with the nearest numeric value, not
necessarily an exact match ?

Thanks,
 Poul


---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
       subscribe-nomail command to [hidden email] so that your
       message can get through to the mailing list cleanly
Reply | Threaded
Open this post in threaded view
|

Re: Finding nearest numeric value

Sean Davis
On 8/17/05 8:50 AM, "Poul Møller Hansen" <[hidden email]> wrote:

> Does anyone know how to find the row with the nearest numeric value, not
> necessarily an exact match ?

To find the nearest value in number_column to some CONSTANT (where you
replace constant with a number), try:

select *,(number_column - CONSTANT)^2 as d from tablename order by d limit
1;

Does that do it for you?

Sean


---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faq
Reply | Threaded
Open this post in threaded view
|

Re: Finding nearest numeric value

Poul Møller Hansen

>To find the nearest value in number_column to some CONSTANT (where you
>replace constant with a number), try:
>
>select *,(number_column - CONSTANT)^2 as d from tablename order by d limit
>1;
>
>Does that do it for you?
>
>Sean
>  
>
It does ideed, not that I understood how, but I will find out.
Thank you very much.

Poul


---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings
Reply | Threaded
Open this post in threaded view
|

Re: Finding nearest numeric value

Csaba Nagy
In reply to this post by Sean Davis
[snip]
> To find the nearest value in number_column to some CONSTANT (where you
> replace constant with a number), try:
>
> select *,(number_column - CONSTANT)^2 as d from tablename order by d limit
> 1;
>
This will scan the whole table and sort the results... and then pick
just one of it. Watch this:

db=> prepare test_01(bigint) as select *, (pk_col - $1) ^ 2 as d from
big_table order by d limit 1;
PREPARE
eb=> explain execute test_01(27163619);
                                     QUERY PLAN
-------------------------------------------------------------------------------------
 Limit  (cost=31239164.71..31239164.72 rows=1 width=59)
   ->  Sort  (cost=31239164.71..31505657.00 rows=106596914 width=59)
         Sort Key: (((pk_col - $1))::double precision ^ 2::double
precision)
         ->  Seq Scan on big_table  (cost=0.00..3149688.00
rows=106596914 width=59)
(4 rows)
 

The names were changed, this is a production DB, but the idea is:
big_table has around 100 million rows, and pk_col is the primary key on
it. Running the above query would take forever.

If you don't have an index on the numeric column, or if the table is
small, this might be your best choice... but if your table is big, and
you have an index on the numeric column, you should use something along:

select * number_col from big_table where number_col < CONSTANT order by
number_col desc limit 1

select * number_col from big_table where number_col > CONSTANT order by
number_col limit 1

You execute the 2 queries, which are very fast even for big tables if
you have an index on number_col, and then choose the row with the
smallest difference (you do this in your client program).

HTH,
Csaba.


> Does that do it for you?
>
> Sean
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faq


---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster
Reply | Threaded
Open this post in threaded view
|

Re: Finding nearest numeric value

Pete-110
In reply to this post by Sean Davis
Sean Davis wrote:

> On 8/17/05 8:50 AM, "Poul Møller Hansen" <[hidden email]> wrote:
>
>
>>Does anyone know how to find the row with the nearest numeric value, not
>>necessarily an exact match ?
>
>
> To find the nearest value in number_column to some CONSTANT (where you
> replace constant with a number), try:
>
> select *,(number_column - CONSTANT)^2 as d from tablename order by d limit
> 1;
>

Save yourself some cycles - use abs() instead of exponentiation.


--
Peter Fein                 [hidden email]                 773-575-0694

Basically, if you're not a utopianist, you're a schmuck. -J. Feldman

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings
Reply | Threaded
Open this post in threaded view
|

Re: Finding nearest numeric value

Richard Huxton
In reply to this post by Poul Møller Hansen
Poul Møller Hansen wrote:
> Does anyone know how to find the row with the nearest numeric value, not
> necessarily an exact match ?

While the other answers all do their job, and in one go too, I'd be
surprised if you found anything faster than:

SELECT myval FROM mytable WHERE myval > 1234 ORDER BY myval LIMIT 1
UNION ALL
SELECT myval FROM mytable WHERE myval < 1234 ORDER BY myval DESC LIMIT 1

That gives you (up to) two values to look at, but should use any index
you have on myval.

You can always sort the results by abs(myval) then if you don't want to
handle two values in the application layer.

--
   Richard Huxton
   Archonet Ltd


---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster
Reply | Threaded
Open this post in threaded view
|

Re: Finding nearest numeric value

Sean Davis
In reply to this post by Poul Møller Hansen
On 8/17/05 10:01 AM, "Poul Møller Hansen" <[hidden email]> wrote:

>
>> To find the nearest value in number_column to some CONSTANT (where you
>> replace constant with a number), try:
>>
>> select *,(number_column - CONSTANT)^2 as d from tablename order by d limit
>> 1;
>>
>> Does that do it for you?
>>
>> Sean
>>  
>>
> It does ideed, not that I understood how, but I will find out.
> Thank you very much.

Just a word (or several) of explanation, then....

To compute the distance between two points on a line, you can compute the
absolute value of the difference (4-2 is the same distance as 2-4, while the
latter is negative) or you can square the difference (just to make it
positive).  You could use absolute value in the above query if you like--I
don't know which is faster, but they will give the same result.

As for the query structure, you can select calculations of columns as well
as the columns themselves.  The "as d" part just gives the calculation a
nice name to use in the rest of the query and in the resulting output.

Sean


---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend
Reply | Threaded
Open this post in threaded view
|

Re: Finding nearest numeric value

Csaba Nagy
In reply to this post by Richard Huxton
The only problem is that you can't use the order by/limit syntax inside
the union queries I guess, cause the query you proposed is giving a
syntax error. I also thought first to do it like this, but it won't
work. If it would, then you could wrap the thing in another query which
orders by the difference and limits to the first one ;-)

Cheers,
Csaba.

On Wed, 2005-08-17 at 17:10, Richard Huxton wrote:

> Poul Møller Hansen wrote:
> > Does anyone know how to find the row with the nearest numeric value, not
> > necessarily an exact match ?
>
> While the other answers all do their job, and in one go too, I'd be
> surprised if you found anything faster than:
>
> SELECT myval FROM mytable WHERE myval > 1234 ORDER BY myval LIMIT 1
> UNION ALL
> SELECT myval FROM mytable WHERE myval < 1234 ORDER BY myval DESC LIMIT 1
>
> That gives you (up to) two values to look at, but should use any index
> you have on myval.
>
> You can always sort the results by abs(myval) then if you don't want to
> handle two values in the application layer.


---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

               http://archives.postgresql.org
Reply | Threaded
Open this post in threaded view
|

Re: Finding nearest numeric value

Bruno Wolff III
On Wed, Aug 17, 2005 at 17:35:37 +0200,
  Csaba Nagy <[hidden email]> wrote:
> The only problem is that you can't use the order by/limit syntax inside
> the union queries I guess, cause the query you proposed is giving a
> syntax error. I also thought first to do it like this, but it won't
> work. If it would, then you could wrap the thing in another query which
> orders by the difference and limits to the first one ;-)

You probably can just add parenthesis. I think that the second ORDER BY
and LIMIT may be being applied to the UNION results which would be a
problem. Putting the second subquery in parens will take care of this if
that is the problem.

>
> On Wed, 2005-08-17 at 17:10, Richard Huxton wrote:
> > Poul Møller Hansen wrote:
> > > Does anyone know how to find the row with the nearest numeric value, not
> > > necessarily an exact match ?
> >
> > While the other answers all do their job, and in one go too, I'd be
> > surprised if you found anything faster than:
> >
> > SELECT myval FROM mytable WHERE myval > 1234 ORDER BY myval LIMIT 1
> > UNION ALL
> > SELECT myval FROM mytable WHERE myval < 1234 ORDER BY myval DESC LIMIT 1
> >
> > That gives you (up to) two values to look at, but should use any index
> > you have on myval.
> >
> > You can always sort the results by abs(myval) then if you don't want to
> > handle two values in the application layer.

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster
Reply | Threaded
Open this post in threaded view
|

Re: Finding nearest numeric value

Tom Lane-2
In reply to this post by Csaba Nagy
Csaba Nagy <[hidden email]> writes:
> The only problem is that you can't use the order by/limit syntax inside
> the union queries I guess, cause the query you proposed is giving a
> syntax error.

Parentheses are your friend ;-)

                        regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
       subscribe-nomail command to [hidden email] so that your
       message can get through to the mailing list cleanly
Reply | Threaded
Open this post in threaded view
|

Re: Finding nearest numeric value

Csaba Nagy
In reply to this post by Bruno Wolff III
Yep, you're right. The following works and uses the index on pk_col:

prepare test_01 (bigint) as
select * from
  (
    (SELECT * FROM big_table WHERE pk_col > $1 ORDER BY pk_col LIMIT 1)
    UNION ALL
    (SELECT * FROM big_table WHERE pk_col < $1 ORDER BY pk_col DESC
LIMIT 1)
  ) as nearest
order by abs(pk_col - $1)
limit 1;


db=> explain execute test_01(12321);
                                                                   
QUERY
PLAN                                                                                                                              
----------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=2.12..2.12 rows=1 width=112)
   ->  Sort  (cost=2.12..2.13 rows=2 width=112)
         Sort Key: abs((pk_col - $1))
         ->  Subquery Scan nearest  (cost=0.00..2.11 rows=2 width=112)
               ->  Append  (cost=0.00..2.08 rows=2 width=59)
                     ->  Subquery Scan "*SELECT* 1"  (cost=0.00..1.04
rows=1 width=59)
                           ->  Limit  (cost=0.00..1.03 rows=1 width=59)
                                 ->  Index Scan using idx_pk_col on
big_table  (cost=0.00..36639172.72 rows=35532914 width=59)
                                       Index Cond: (pk_col > $1)
                     ->  Subquery Scan "*SELECT* 2"  (cost=0.00..1.04
rows=1 width=59)
                           ->  Limit  (cost=0.00..1.03 rows=1 width=59)
                                 ->  Index Scan Backward using
idx_pk_col on big_table  (cost=0.00..36639172.72 rows=35532914 width=59)
                                       Index Cond: (pk_col < $1)
(13 rows)
 

Cheers,
Csaba.

On Wed, 2005-08-17 at 17:57, Bruno Wolff III wrote:

> On Wed, Aug 17, 2005 at 17:35:37 +0200,
>   Csaba Nagy <[hidden email]> wrote:
> > The only problem is that you can't use the order by/limit syntax inside
> > the union queries I guess, cause the query you proposed is giving a
> > syntax error. I also thought first to do it like this, but it won't
> > work. If it would, then you could wrap the thing in another query which
> > orders by the difference and limits to the first one ;-)
>
> You probably can just add parenthesis. I think that the second ORDER BY
> and LIMIT may be being applied to the UNION results which would be a
> problem. Putting the second subquery in parens will take care of this if
> that is the problem.
>
> >
> > On Wed, 2005-08-17 at 17:10, Richard Huxton wrote:
> > > Poul Møller Hansen wrote:
> > > > Does anyone know how to find the row with the nearest numeric value, not
> > > > necessarily an exact match ?
> > >
> > > While the other answers all do their job, and in one go too, I'd be
> > > surprised if you found anything faster than:
> > >
> > > SELECT myval FROM mytable WHERE myval > 1234 ORDER BY myval LIMIT 1
> > > UNION ALL
> > > SELECT myval FROM mytable WHERE myval < 1234 ORDER BY myval DESC LIMIT 1
> > >
> > > That gives you (up to) two values to look at, but should use any index
> > > you have on myval.
> > >
> > > You can always sort the results by abs(myval) then if you don't want to
> > > handle two values in the application layer.


---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend
Reply | Threaded
Open this post in threaded view
|

Re: Finding nearest numeric value

Ron Mayer
In reply to this post by Richard Huxton
Richard Huxton wrote:
>
> While the other answers all do their job, and in one go too, I'd be
> surprised if you found anything faster than:
>
> SELECT myval FROM mytable WHERE myval > 1234 ORDER BY myval LIMIT 1

Really?   Aren't most things with ORDER BY O(n*log(n))?

Or is the optimizer smart enough to find an index on myval
and stop after the first one (assuming the index returned
things sequentially.

If not, it seems this could do things in O(n) time:
   select min(abs(value - CONSTANT)) from tablename
followed by
   select * from tablename where abs(value - CONSTANT) = [result]
though I'm sure someone could roll that up into a single statement.

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faq
Reply | Threaded
Open this post in threaded view
|

Re: Finding nearest numeric value

Pete-110
In reply to this post by Richard Huxton
Richard Huxton wrote:

> Poul Møller Hansen wrote:
>
>> Does anyone know how to find the row with the nearest numeric value,
>> not necessarily an exact match ?
>
>
> While the other answers all do their job, and in one go too, I'd be
> surprised if you found anything faster than:
>
> SELECT myval FROM mytable WHERE myval > 1234 ORDER BY myval LIMIT 1
> UNION ALL
> SELECT myval FROM mytable WHERE myval < 1234 ORDER BY myval DESC LIMIT 1
>
> That gives you (up to) two values to look at, but should use any index
> you have on myval.
>
> You can always sort the results by abs(myval) then if you don't want to
> handle two values in the application layer.
>

Ahh, should that be >= and <= ? ;)

--
Peter Fein                 [hidden email]                 773-575-0694

Basically, if you're not a utopianist, you're a schmuck. -J. Feldman

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faq
Reply | Threaded
Open this post in threaded view
|

Re: Finding nearest numeric value

Poul Møller Hansen
In reply to this post by Sean Davis
>>>To find the nearest value in number_column to some CONSTANT (where you
>>>replace constant with a number), try:
>>>
>>>select *,(number_column - CONSTANT)^2 as d from tablename order by d limit
>>>1;
>>>
>>>Does that do it for you?
>>>
>>>Sean
>>>
>>>
>>
>>It does ideed, not that I understood how, but I will find out.
>>Thank you very much.
>
>
> Just a word (or several) of explanation, then....
>
> To compute the distance between two points on a line, you can compute the
> absolute value of the difference (4-2 is the same distance as 2-4, while the
> latter is negative) or you can square the difference (just to make it
> positive).  You could use absolute value in the above query if you like--I
> don't know which is faster, but they will give the same result.
>
> As for the query structure, you can select calculations of columns as well
> as the columns themselves.  The "as d" part just gives the calculation a
> nice name to use in the rest of the query and in the resulting output.
>
> Sean
>

Thanks for the explanation, guess I was fast giving up understanding the
query as it is actually quite simple :)

Of course there are the performance issues as argued by others, but the
table do only contain around 800 rows, so this method is adequate.

Thank you all for the inputs.


Poul

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
       subscribe-nomail command to [hidden email] so that your
       message can get through to the mailing list cleanly
Reply | Threaded
Open this post in threaded view
|

Re: Finding nearest numeric value

Bruno Wolff III
In reply to this post by Ron Mayer
On Wed, Aug 17, 2005 at 11:57:52 -0700,
  Ron Mayer <[hidden email]> wrote:
> Richard Huxton wrote:
> >
> >While the other answers all do their job, and in one go too, I'd be
> >surprised if you found anything faster than:
> >
> >SELECT myval FROM mytable WHERE myval > 1234 ORDER BY myval LIMIT 1
>
> Really?   Aren't most things with ORDER BY O(n*log(n))?

No. Index lookups are O(log(n)). And you need to do only a constant number
of index lookups (2 or 4 depending on whether the values are unique).

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faq