How to always run UPDATE FROM despite missing records in the source table?

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

How to always run UPDATE FROM despite missing records in the source table?

Alexander Farber
Good afternoon

I have prepared a simplified test case for my question: https://www.db-fiddle.com/f/22jfWnsvqD8hVeFPXsyLbV/0

In PostgreSQL 10.6 there are 2 tables:

CREATE TABLE users (
        uid SERIAL PRIMARY KEY,
        created       timestamptz NOT NULL,
        visited       timestamptz NOT NULL,
        ip            inet        NOT NULL,
        lat           double precision,
        lng           double precision
  );

  CREATE TABLE geoip (
        block   inet    PRIMARY KEY,
        lat     double precision,
        lng     double precision
);

CREATE INDEX ON geoip USING SPGIST (block);

which are filled with the following test data:

INSERT INTO users (created, visited, ip) VALUES
  (now(), now(), '1.2.3.4'::inet),
  (now(), now(), '1.2.3.5'::inet),
  (now(), now(), '1.2.3.6'::inet);

INSERT INTO geoip (block, lat, lng) VALUES
 ('1.2.3.0/24', -33.4940, 143.2104),
 ('10.0.0.0/8', 34.6617, 133.9350);

Then in a stored function I run the following UPDATE command -

UPDATE users u SET
    visited = now(),
    ip      = '10.10.10.10'::inet,
    lat     = i.lat,
    lng     = i.lng
FROM geoip i
WHERE u.uid = 1 AND '10.10.10.10'::inet <<= i.block;

(the 1 and the ip address are actually in_uid and in_ip parameters in my stored function).

The above query works well and updates all 4 fields in the users table.

However the following query does not work as intended and does not update any fields, because there is no matching block in the geoip table found:

UPDATE users u SET
    visited = now(),                  -- HOW TO ALWAYS UPDATE THIS FIELD?
    ip      = '20.20.20.20'::inet, -- HOW TO ALWAYS UPDATE THIS FIELD?
    lat     = i.lat,
    lng     = i.lng
FROM geoip i
WHERE u.uid = 2 AND '20.20.20.20'::inet <<= i.block;

The field visited and ip however should be always updated - regardless if the block was found or not.

Kind of LEFT JOIN, but for UPDATE - how to achieve this please?

The only workaround that I could think of is -

UPDATE users SET
    visited = now(),
    ip      = '20.20.20.20'::inet,
    lat     = (SELECT lat FROM geoip WHERE '20.20.20.20'::inet <<= block),
    lng     = (SELECT lng FROM geoip WHERE '20.20.20.20'::inet <<= block)
WHERE uid = 2;

But that would run the same subquery twice (correct?) and my geoip table is already slow with 3073410 records (and that is why I am trying to cache its lat and lng values in the users table on each user login event)

Regards
Alex
Reply | Threaded
Open this post in threaded view
|

RE: How to always run UPDATE FROM despite missing records in the source table?

Kevin Brannen

From: Alexander Farber <[hidden email]>

  • The only workaround that I could think of is -

    UPDATE users SET
        visited = now(),
        ip      = '20.20.20.20'::inet,
        lat     = (SELECT lat FROM geoip WHERE '20.20.20.20'::inet <<= block),
        lng     = (SELECT lng FROM geoip WHERE '20.20.20.20'::inet <<= block)
    WHERE uid = 2;

    But that would run the same subquery twice (correct?) and my geoip table is already slow with 3073410 records (and that is why I am trying to cache its lat and lng values in the users table on each user login event)

 

Have you considered using a WITH clause to get the data so the query is only run once?
See section 7.8.2 at https://www.postgresql.org/docs/9.6/queries-with.html

Kevin

This e-mail transmission, and any documents, files or previous e-mail messages attached to it, may contain confidential information. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are hereby notified that any disclosure, distribution, review, copy or use of any of the information contained in or attached to this message is STRICTLY PROHIBITED. If you have received this transmission in error, please immediately notify us by reply e-mail, and destroy the original transmission and its attachments without reading them or saving them to disk. Thank you.
Reply | Threaded
Open this post in threaded view
|

Re: How to always run UPDATE FROM despite missing records in the source table?

Adrian Klaver-4
In reply to this post by Alexander Farber
On 1/11/19 4:50 AM, Alexander Farber wrote:

> Good afternoon
>
> I have prepared a simplified test case for my question:
> https://www.db-fiddle.com/f/22jfWnsvqD8hVeFPXsyLbV/0
>
> In PostgreSQL 10.6 there are 2 tables:
>
> CREATE TABLE users (
>          uid SERIAL PRIMARY KEY,
>          created       timestamptz NOT NULL,
>          visited       timestamptz NOT NULL,
>          ip            inet        NOT NULL,
>          lat           double precision,
>          lng           double precision
>    );
>
>    CREATE TABLE geoip (
>          block   inet    PRIMARY KEY,
>          lat     double precision,
>          lng     double precision
> );
>
> CREATE INDEX ON geoip USING SPGIST (block);
>
> which are filled with the following test data:
>
> INSERT INTO users (created, visited, ip) VALUES
>    (now(), now(), '1.2.3.4'::inet),
>    (now(), now(), '1.2.3.5'::inet),
>    (now(), now(), '1.2.3.6'::inet);
>
> INSERT INTO geoip (block, lat, lng) VALUES
>   ('1.2.3.0/24 <http://1.2.3.0/24>', -33.4940, 143.2104),
>   ('10.0.0.0/8 <http://10.0.0.0/8>', 34.6617, 133.9350);
>
> Then in a stored function I run the following UPDATE command -
>
> UPDATE users u SET
>      visited = now(),
>      ip      = '10.10.10.10'::inet,
>      lat     = i.lat,
>      lng     = i.lng
> FROM geoip i
> WHERE u.uid = 1 AND '10.10.10.10'::inet <<= i.block;
>
> (the 1 and the ip address are actually in_uid and in_ip parameters in my
> stored function).
>
> The above query works well and updates all 4 fields in the users table.
>
> However the following query does not work as intended and does not
> update any fields, because there is no matching block in the geoip table
> found:
>
> UPDATE users u SET
>      visited = now(),                  -- HOW TO ALWAYS UPDATE THIS FIELD?
>      ip      = '20.20.20.20'::inet, -- HOW TO ALWAYS UPDATE THIS FIELD?
>      lat     = i.lat,
>      lng     = i.lng
> FROM geoip i
> WHERE u.uid = 2 AND '20.20.20.20'::inet <<= i.block;
>
> The field visited and ip however should be always updated - regardless
> if the block was found or not.
>
> Kind of LEFT JOIN, but for UPDATE - how to achieve this please?
>
> The only workaround that I could think of is -
>
> UPDATE users SET
>      visited = now(),
>      ip      = '20.20.20.20'::inet,
>      lat     = (SELECT lat FROM geoip WHERE '20.20.20.20'::inet <<= block),
>      lng     = (SELECT lng FROM geoip WHERE '20.20.20.20'::inet <<= block)
> WHERE uid = 2;
>
> But that would run the same subquery twice (correct?) and my geoip table
> is already slow with 3073410 records (and that is why I am trying to
> cache its lat and lng values in the users table on each user login event)

Why not put a test for the block in the function and then use different
UPDATE's depending on the result?

Pseudo code:

IF ip IN block THEN
        UPDATE users u SET
     visited = now(),
     ip      = '10.10.10.10'::inet,
     lat     = i.lat,
     lng     = i.lng
FROM geoip i
WHERE u.uid = 1;

ELSE

UPDATE users u SET
     visited = now(),
     ip      = '20.20.20.20'::inet
FROM geoip i
WHERE u.uid = 2 ;

END IF;

>
> Regards
> Alex


--
Adrian Klaver
[hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: How to always run UPDATE FROM despite missing records in the source table?

Andrew Gierth
In reply to this post by Alexander Farber
>>>>> "Alexander" == Alexander Farber <[hidden email]> writes:

 Alexander> However the following query does not work as intended and
 Alexander> does not update any fields, because there is no matching
 Alexander> block in the geoip table found:

 Alexander> UPDATE users u SET
 Alexander>     visited = now(),                  -- HOW TO ALWAYS UPDATE THIS FIELD?
 Alexander>     ip      = '20.20.20.20'::inet, -- HOW TO ALWAYS UPDATE THIS FIELD?
 Alexander>     lat     = i.lat,
 Alexander>     lng     = i.lng
 Alexander> FROM geoip i
 Alexander> WHERE u.uid = 2 AND '20.20.20.20'::inet <<= i.block;

 Alexander> The field visited and ip however should be always updated -
 Alexander> regardless if the block was found or not.

 Alexander> Kind of LEFT JOIN, but for UPDATE - how to achieve this please?

It can be done like this (this assumes you want to preserve the previous
values of u.lat/u.lng if the block was not found; if you want to set
them to null instead, then remove the coalesce() calls):

UPDATE users u
   SET visited = now(),
       ip = v.ip,
       lat = coalesce(i.lat, u.lat),
       lng = coalesce(i.lng, u.lng)
  FROM (VALUES ('20.20.20.20'::inet)) v(ip)
       LEFT JOIN geoip i ON (v.ip <<= i.block)
 WHERE u.uid = 2;

 Alexander> But that would run the same subquery twice (correct?) and my
 Alexander> geoip table is already slow with 3073410 records

Slow even with a gist or spgist index? what does the explain analyze
look like?

(You could also try using the ip4r module; I've not done any serious
benchmarking to see if it's faster than the built-in index types, though
it has some theoretical advantages due to not being restricted to CIDR
ranges. In pg versions before the built-in inet type got a gist index
method, ip4r was _the_ way to do ip block lookups for geoip etc.)

--
Andrew (irc:RhodiumToad)

Reply | Threaded
Open this post in threaded view
|

Re: How to always run UPDATE FROM despite missing records in the source table?

Alexander Farber
In reply to this post by Adrian Klaver-4
Thank you Adrian -

On Fri, Jan 11, 2019 at 4:55 PM Adrian Klaver <[hidden email]> wrote:
On 1/11/19 4:50 AM, Alexander Farber wrote:
> https://www.db-fiddle.com/f/22jfWnsvqD8hVeFPXsyLbV/0

Why not put a test for the block in the function and then use different
UPDATE's depending on the result?

I didn't want to use IF and switch that statement to PL/pgSQL from pure SQL, so Andrew's answer

 UPDATE users u SET 
    visited = now(),
    ip = v.ip,
    lat = i.lat,
    lng = i.lng
FROM (VALUES ('20.20.20.20'::inet)) v(ip)
      LEFT JOIN geoip i ON (v.ip <<= i.block)
WHERE u.uid = 2;

suits me better, even though I wonder what is the (VALUES ('20.20.20.20'::inet)) v(ip) construct there, some temporary table which is then LEGT JOINed to the geoip table?

Also, Andrew you have been right - with spgist index my queries against geoip are fast enough, I was looking at the wrong spot in my EXPLAIN ANALYZE output (the average values are slow, I am going to cache them soon)

Regards
Alex
Reply | Threaded
Open this post in threaded view
|

Re: How to always run UPDATE FROM despite missing records in the source table?

Andrew Gierth
>>>>> "Alexander" == Alexander Farber <[hidden email]> writes:

 Alexander> even though I wonder what is the (VALUES
 Alexander> ('20.20.20.20'::inet)) v(ip) construct there, some temporary
 Alexander> table which is then LEGT JOINed to the geoip table?

The SQL spec calls it a <table value constructor>. The VALUES clause
behaves like a SELECT that returns a fixed number of rows (1 or more)
whose columns contain the results of the specified expressions. The
v(ip) part is just a table and column alias (I omitted the optional AS
keyword out of long habit) to name the constructed table.

--
Andrew (irc:RhodiumToad)