Want records to be UNIQUE. When searching for dupes, stop on first matching record.

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Want records to be UNIQUE. When searching for dupes, stop on first matching record.

Pól Ua Laoínecháin
Hi all,

I was fiddling round with a 10M record table recently, trying to get a
unique (simulated) combination for two INTEGERs user_id and article_id.

I eventually came up with a solution (is it the best way of doing
this? - secondary question):

INSERT INTO test_article (user_id, article_id)
SELECT * FROM
(
  WITH x AS
  (
    SELECT generate_series(1, 500) AS bill
  ),
  y AS
  (
    SELECT generate_series(1, 20000) AS fred
  )
  SELECT * FROM x
  CROSS JOIN y
) AS z
ORDER BY bill, fred;

But, while I was experimenting with this, I had to test many times to
see if I was, in fact, inserting UNIQUE records - it took me a while
to come up with the SQL above!

It took around 30s to search through my table each time I was checking
(thank God for SSDs :-) ). I used this construct to check for dupes
(is there a better one?):

SELECT (user_id, article_id)::text, count(*)
FROM test_article
WHERE 1 = (SELECT 1)
GROUP BY user_id, article_id
HAVING count(*) > 1

But what I really want (what I really, really want - apologies to the
Spice Girls) to know is, is there a query which will run and stop on
the first match? Using ANY, ALL or EXISTS or some construct like that?

TIA and rgs,

Pól...


Reply | Threaded
Open this post in threaded view
|

Re: Want records to be UNIQUE. When searching for dupes, stop on first matching record.

David G Johnston
On Thu, Apr 18, 2019 at 1:15 PM Pól Ua Laoínecháin <[hidden email]> wrote:
SELECT (user_id, article_id)::text, count(*)
FROM test_article
WHERE 1 = (SELECT 1)
GROUP BY user_id, article_id
HAVING count(*) > 1

But what I really want (what I really, really want - apologies to the
Spice Girls) to know is, is there a query which will run and stop on
the first match? Using ANY, ALL or EXISTS or some construct like that?


Create a unique index over user_id, article_id on the test_article table.  The system will never allow a duplicate to be inserted in the first place.

That said your query is overly complicated...
SELECT user_id, article_id
FROM test_article
GROUP BY user_id, article_id
HAVING count(*) > 1

A where clause is optional and the composite and count in the select list aren't really needed unless you have some other unstated requirement for them.

David J.
Reply | Threaded
Open this post in threaded view
|

Re: Want records to be UNIQUE. When searching for dupes, stop on first matching record.

Pól Ua Laoínecháin
Hi,

Le jeu. 18 avr. 2019 à 22:41, David G. Johnston
<[hidden email]> a écrit :

> Create a unique index over user_id, article_id on the test_article table.
> The system will never allow a duplicate to be inserted in the first place.

I know this and, indeed, I could have done it. But, the real thrust of
my question is how do I "drop out" of the query after finding the
first matching record. Putting indexes to the side - it's more of an
SQL question than just adding an index.

> That said your query is overly complicated...
> SELECT user_id, article_id
> FROM test_article
> GROUP BY user_id, article_id
> HAVING count(*) > 1

Sorry  - the "WHERE 1 = (SELECT 1)" was just something I was testing
to get plans to be more complicated.
Forget that bit. Should have proofed better! :-(


> A where clause is optional and the composite and count in the select list aren't really needed
> unless you have some other unstated requirement for them.

What I want to know is as above - once I find one match - that's it,
all bets are off, there's no point in looking any further. A dupe
means that my previous attempt to insert UNIQUE records failed - I'd
like to be told of that sooner rather than later.


Thanks for your input!

Rgs,

Pól...

> David J.


Reply | Threaded
Open this post in threaded view
|

Re: Want records to be UNIQUE. When searching for dupes, stop on first matching record.

Peter Geoghegan-4
In reply to this post by Pól Ua Laoínecháin
On Thu, Apr 18, 2019 at 1:15 PM Pól Ua Laoínecháin <[hidden email]> wrote:
> But what I really want (what I really, really want - apologies to the
> Spice Girls) to know is, is there a query which will run and stop on
> the first match? Using ANY, ALL or EXISTS or some construct like that?

It sounds like you're looking for a loose index scan:

https://wiki.postgresql.org/wiki/Loose_indexscan

Unfortunately, Postgres doesn't have this capability just yet, so
you'll have to simulate it in the manner described by the Wiki page.

--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: Want records to be UNIQUE. When searching for dupes, stop on first matching record.

Pól Ua Laoínecháin
> It sounds like you're looking for a loose index scan:
> https://wiki.postgresql.org/wiki/Loose_indexscan

Hi and thanks for your input,

but I don't think that the loose index scan is what I want.

I want a query that breaks off immediately it finds a dupe and won't
continue any futher and that doesn't use LIMIT!

Maybe there's no such thing available - I'm just asking on the
offchance that it does exist.


Pól...


Reply | Threaded
Open this post in threaded view
|

SELECTing for group membership of many groups?

Pól Ua Laoínecháin
Hi all,

maybe this should be easy, but I'm bashing my head up against a brick wall!

DDL and DML for sample data at bottom of post.

Simplified - I have 3 tables, the_user (user is a keyword in some
systems), groupe (use the French to avoid conflicts with SQL keyword
GROUP!) and user_group.

Now, what I want is a simple list of all users who are in groups 5, 6
and 7 - in this case, there is only one - that's user 3.

I naively tried this:

SELECT * FROM the_user u
JOIN user_group ug ON
  u.id = ug.user_id
AND ug.group_id = 5
AND ug.group_id = 6
AND ug.group_id = 7;

But that won't work, because no single user_group record can have  a
group_id of 5, 6 **and** 7!

Then, there's this:

SELECT * FROM the_user u
INNER JOIN user_group g1 ON g1.user_id = u.id
INNER JOIN user_group g2 ON g2.user_id = u.id
INNER JOIN user_group g3 ON g3.user_id = id
WHERE g1.group_id = 5
  AND g2.group_id = 6
  AND g3.group_id = 7;


id name user_id group_id user_id group_id user_id group_id
3 user3          3            5          3           6          3           7

Now, this picks  out the_user.id = 3 which is correct, but I just want
the answer to be (3, 'user3') and not the group_ids which I've put
into the query.

There's an sql fiddle available here for those who are interested:
https://www.db-fiddle.com/f/gHaajr7txvTojV7WAjJbYb/2

TIA and rgs,

Pól...

============= DDL and DML ====================

CREATE TABLE the_user
(
  id INT,
  name VARCHAR (10)
);

CREATE TABLE groupe
(
  id INT,
  name VARCHAR (10)
);

CREATE TABLE user_group
(
  user_id INT,
  group_id INT
);

INSERT INTO the_user VALUES (1, 'user1'), (2, 'user2'), (3, 'user3');

INSERT INTO groupe VALUES (5, 'group1'), (6, 'group2'), (7, 'group3');

INSERT INTO user_group VALUES (1, 6), (1, 7), (2, 5), (2, 7), (3, 5),
(3, 6), (3, 7);


Reply | Threaded
Open this post in threaded view
|

RE: SELECTing for group membership of many groups?

David Raymond
"Now, this picks  out the_user.id = 3 which is correct, but I just want
the answer to be (3, 'user3') and not the group_ids which I've put
into the query."

If the query is returning the correct records then you've got the hard part done. Just alter which fields you get back from it then.

Instead of: "select * from..."
Go with either: "select u.* from..."
Or: "select u.id, u.name from..."


-----Original Message-----
From: Pól Ua Laoínecháin [mailto:[hidden email]]
Sent: Tuesday, April 23, 2019 12:14 PM
Cc: pgsql-novice
Subject: SELECTing for group membership of many groups?

Hi all,

maybe this should be easy, but I'm bashing my head up against a brick wall!

DDL and DML for sample data at bottom of post.

Simplified - I have 3 tables, the_user (user is a keyword in some
systems), groupe (use the French to avoid conflicts with SQL keyword
GROUP!) and user_group.

Now, what I want is a simple list of all users who are in groups 5, 6
and 7 - in this case, there is only one - that's user 3.

I naively tried this:

SELECT * FROM the_user u
JOIN user_group ug ON
  u.id = ug.user_id
AND ug.group_id = 5
AND ug.group_id = 6
AND ug.group_id = 7;

But that won't work, because no single user_group record can have  a
group_id of 5, 6 **and** 7!

Then, there's this:

SELECT * FROM the_user u
INNER JOIN user_group g1 ON g1.user_id = u.id
INNER JOIN user_group g2 ON g2.user_id = u.id
INNER JOIN user_group g3 ON g3.user_id = id
WHERE g1.group_id = 5
  AND g2.group_id = 6
  AND g3.group_id = 7;


id name user_id group_id user_id group_id user_id group_id
3 user3          3            5          3           6          3           7

Now, this picks  out the_user.id = 3 which is correct, but I just want
the answer to be (3, 'user3') and not the group_ids which I've put
into the query.

There's an sql fiddle available here for those who are interested:
https://www.db-fiddle.com/f/gHaajr7txvTojV7WAjJbYb/2

TIA and rgs,

Pól...

============= DDL and DML ====================

CREATE TABLE the_user
(
  id INT,
  name VARCHAR (10)
);

CREATE TABLE groupe
(
  id INT,
  name VARCHAR (10)
);

CREATE TABLE user_group
(
  user_id INT,
  group_id INT
);

INSERT INTO the_user VALUES (1, 'user1'), (2, 'user2'), (3, 'user3');

INSERT INTO groupe VALUES (5, 'group1'), (6, 'group2'), (7, 'group3');

INSERT INTO user_group VALUES (1, 6), (1, 7), (2, 5), (2, 7), (3, 5),
(3, 6), (3, 7);


Reply | Threaded
Open this post in threaded view
|

Re: SELECTing for group membership of many groups?

Mark Wallace
In reply to this post by Pól Ua Laoínecháin
I would do the heart of the solution like this:

SELECT ug.user_id
  FROM user_group ug
  WHERE ug.group_id = 5

INTERSECT

SELECT ug.user_id
  FROM user_group ug
  WHERE ug.group_id = 6

INTERSECT

SELECT ug.user_id
  FROM user_group ug
  WHERE ug.group_id = 7

I’m sure there are syntax errors, but what I’m trying to convey is the concept.

The result is only the user_id’s from the rows of user_group that match all three group_id’s.

Then, in syntax I’m not showing, take those remaining user_id’s (in your example, only one of them), and join to the_user to get the user name.

Mark



> On Apr 23, 2019, at 12:13, Pól Ua Laoínecháin <[hidden email]> wrote:
>
> Hi all,
>
> maybe this should be easy, but I'm bashing my head up against a brick wall!
>
> DDL and DML for sample data at bottom of post.
>
> Simplified - I have 3 tables, the_user (user is a keyword in some
> systems), groupe (use the French to avoid conflicts with SQL keyword
> GROUP!) and user_group.
>
> Now, what I want is a simple list of all users who are in groups 5, 6
> and 7 - in this case, there is only one - that's user 3.
>
> I naively tried this:
>
> SELECT * FROM the_user u
> JOIN user_group ug ON
>  u.id = ug.user_id
> AND ug.group_id = 5
> AND ug.group_id = 6
> AND ug.group_id = 7;
>
> But that won't work, because no single user_group record can have  a
> group_id of 5, 6 **and** 7!
>
> Then, there's this:
>
> SELECT * FROM the_user u
> INNER JOIN user_group g1 ON g1.user_id = u.id
> INNER JOIN user_group g2 ON g2.user_id = u.id
> INNER JOIN user_group g3 ON g3.user_id = id
> WHERE g1.group_id = 5
>  AND g2.group_id = 6
>  AND g3.group_id = 7;
>
>
> id name user_id group_id user_id group_id user_id group_id
> 3 user3          3            5          3           6          3           7
>
> Now, this picks  out the_user.id = 3 which is correct, but I just want
> the answer to be (3, 'user3') and not the group_ids which I've put
> into the query.
>
> There's an sql fiddle available here for those who are interested:
> https://www.db-fiddle.com/f/gHaajr7txvTojV7WAjJbYb/2
>
> TIA and rgs,
>
> Pól...
>
> ============= DDL and DML ====================
>
> CREATE TABLE the_user
> (
>  id INT,
>  name VARCHAR (10)
> );
>
> CREATE TABLE groupe
> (
>  id INT,
>  name VARCHAR (10)
> );
>
> CREATE TABLE user_group
> (
>  user_id INT,
>  group_id INT
> );
>
> INSERT INTO the_user VALUES (1, 'user1'), (2, 'user2'), (3, 'user3');
>
> INSERT INTO groupe VALUES (5, 'group1'), (6, 'group2'), (7, 'group3');
>
> INSERT INTO user_group VALUES (1, 6), (1, 7), (2, 5), (2, 7), (3, 5),
> (3, 6), (3, 7);
>
>