Logical replication without a Primary Key

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Logical replication without a Primary Key

Joshua D. Drake
-Hackers,

In the docs it says:

"
If the table does not have any suitable key, then it can be set to
replica identity“full”, which means the entire row becomes the key.

"

How does that work? Is it using one of the hidden columns on a row?


Thanks,


JD


--
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc

PostgreSQL Centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://pgconf.org
*****     Unless otherwise stated, opinions are my own.   *****


Reply | Threaded
Open this post in threaded view
|

Re: Logical replication without a Primary Key

Peter Eisentraut-6
On 12/6/17 19:03, Joshua D. Drake wrote:

> -Hackers,
>
> In the docs it says:
>
> "
> If the table does not have any suitable key, then it can be set to
> replica identity“full”, which means the entire row becomes the key.
>
> "
>
> How does that work? Is it using one of the hidden columns on a row?

It means that for example if an update record is produced, the entire
row is included in the record as the key.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: Logical replication without a Primary Key

Joshua D. Drake
On 12/07/2017 05:30 AM, Peter Eisentraut wrote:
>
>> How does that work? Is it using one of the hidden columns on a row?
> It means that for example if an update record is produced, the entire
> row is included in the record as the key.

Thanks Peter, Craig also responded,

The confusion I have is what if we have two rows that are identical and
now that I think about it we would just update both rows, yes? That
would make sense because it would produce two updated rows.

Thanks,

JD



>

--
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc

PostgreSQL Centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://pgconf.org
*****     Unless otherwise stated, opinions are my own.   *****


Reply | Threaded
Open this post in threaded view
|

Re: Logical replication without a Primary Key

David G Johnston
In reply to this post by Peter Eisentraut-6
On Thursday, December 7, 2017, Peter Eisentraut <[hidden email]> wrote:
On 12/6/17 19:03, Joshua D. Drake wrote:

>
> How does that work? Is it using one of the hidden columns on a row?

It means that for example if an update record is produced, the entire
row is included in the record as the key.


IOW, IIUC, whether defined or not the user data portion of the table must, as a whole, provide a natural unique key if you are going to use logical replication.  If two records only differ in their OID (or maybe ctid?) value you will have problem.

David J.
Reply | Threaded
Open this post in threaded view
|

Re: Logical replication without a Primary Key

Craig Ringer-3
In reply to this post by Joshua D. Drake
On 7 December 2017 at 22:32, Joshua D. Drake <[hidden email]> wrote:

The confusion I have is what if we have two rows that are identical and now that I think about it we would just update both rows, yes? That would make sense because it would produce two updated rows.


I expect so, but honestly, maybe it's easier to just add a pg_regress test to check and demonstrate the behaviour, or do a manual test?

(a pg_regress test would make sense to add anyway, though, and might be easier)

The only ways you could update only one of two identical rows would be if you did so by ctid (since if it were a table with oids, the rows wouldn't be identical anymore). It'd make no sense to send the ctid to the downstream since it'd bear no relationship to the downstream table. I expect it'd probably update both rows. If it doesn't, it probably should.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
Reply | Threaded
Open this post in threaded view
|

Re: Logical replication without a Primary Key

Petr Jelinek-4
In reply to this post by Joshua D. Drake
On 07/12/17 15:32, Joshua D. Drake wrote:

> On 12/07/2017 05:30 AM, Peter Eisentraut wrote:
>>
>>> How does that work? Is it using one of the hidden columns on a row?
>> It means that for example if an update record is produced, the entire
>> row is included in the record as the key.
>
> Thanks Peter, Craig also responded,
>
> The confusion I have is what if we have two rows that are identical and
> now that I think about it we would just update both rows, yes? That
> would make sense because it would produce two updated rows.
>

No it won't, it will update only one row, it does not try to find
multiple matching rows.

--
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: Logical replication without a Primary Key

Robert Haas
On Thu, Dec 7, 2017 at 9:43 AM, Petr Jelinek
<[hidden email]> wrote:
> No it won't, it will update only one row, it does not try to find
> multiple matching rows.

Good, because that's exactly what it should do.  I mean, if you have
on the master two tuples that are identical, and you update one of
them, then the replica had better update exactly one of them as well.
Since they are identical, it doesn't matter *which* one gets updated
on the replica, but if you update *both* of them on the replica, then
things are out of sync.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Reply | Threaded
Open this post in threaded view
|

Re: Logical replication without a Primary Key

Joshua D. Drake
On 12/07/2017 10:49 AM, Robert Haas wrote:

> On Thu, Dec 7, 2017 at 9:43 AM, Petr Jelinek
> <[hidden email]> wrote:
>> No it won't, it will update only one row, it does not try to find
>> multiple matching rows.
> Good, because that's exactly what it should do.  I mean, if you have
> on the master two tuples that are identical, and you update one of
> them, then the replica had better update exactly one of them as well.
> Since they are identical, it doesn't matter *which* one gets updated
> on the replica, but if you update *both* of them on the replica, then
> things are out of sync.

Well I think that is a problem actually. If I have:

A    B   C
foo,bar,baz
foo,bar,baz

And then I say:

UPDATE test set A = 1 where C = baz

I have updated two rows because there is no primary key to identify the
differences. Both of those rows should be updated and thus replicated
otherwise, logical replication (of this specific table) provides
inaccurate data on the subscriber.

Thanks,

JD


--
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc

PostgreSQL Centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://pgconf.org
*****     Unless otherwise stated, opinions are my own.   *****


Reply | Threaded
Open this post in threaded view
|

Re: Logical replication without a Primary Key

Chapman Flack
On 12/07/2017 02:38 PM, Joshua D. Drake wrote:

> A    B   C
> foo,bar,baz
> foo,bar,baz
>
> And then I say:
>
> UPDATE test set A = 1 where C = baz
>
> I have updated two rows because there is no primary key to identify the
> differences. Both of those rows should be updated and thus replicated

Would the subscriber see two records reporting update of a
foo,bar,baz row to 1, so it would do that to (arbitrarily)
one of them the first time, and (necessarily) the other, the
second time?

Or is that not the way it would work?

-Chap

Reply | Threaded
Open this post in threaded view
|

Re: Logical replication without a Primary Key

Robert Haas
On Thu, Dec 7, 2017 at 2:53 PM, Chapman Flack <[hidden email]> wrote:

> On 12/07/2017 02:38 PM, Joshua D. Drake wrote:
>> A    B   C
>> foo,bar,baz
>> foo,bar,baz
>>
>> And then I say:
>>
>> UPDATE test set A = 1 where C = baz
>>
>> I have updated two rows because there is no primary key to identify the
>> differences. Both of those rows should be updated and thus replicated
>
> Would the subscriber see two records reporting update of a
> foo,bar,baz row to 1, so it would do that to (arbitrarily)
> one of them the first time, and (necessarily) the other, the
> second time?

Exactly.

(I think.)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Reply | Threaded
Open this post in threaded view
|

Re: Logical replication without a Primary Key

Andres Freund
In reply to this post by Joshua D. Drake
On 2017-12-07 11:38:51 -0800, Joshua D. Drake wrote:

> On 12/07/2017 10:49 AM, Robert Haas wrote:
> > On Thu, Dec 7, 2017 at 9:43 AM, Petr Jelinek
> > <[hidden email]> wrote:
> > > No it won't, it will update only one row, it does not try to find
> > > multiple matching rows.
> > Good, because that's exactly what it should do.  I mean, if you have
> > on the master two tuples that are identical, and you update one of
> > them, then the replica had better update exactly one of them as well.
> > Since they are identical, it doesn't matter *which* one gets updated
> > on the replica, but if you update *both* of them on the replica, then
> > things are out of sync.
>
> Well I think that is a problem actually. If I have:
>
> A    B   C
> foo,bar,baz
> foo,bar,baz
>
> And then I say:
>
> UPDATE test set A = 1 where C = baz
>
> I have updated two rows because there is no primary key to identify the
> differences. Both of those rows should be updated and thus replicated
> otherwise, logical replication (of this specific table) provides inaccurate
> data on the subscriber.

Not a problem. If you updated both rows, then there's two cases:
a) the update actually changed the column values. In which case the first per-row
   change that's replicated updates the first row, but the second one won't
   again find it as matching in all columns.
b) the update didn't actually change anything. In which case the same
   row gets updated twice, but because the column values didn't change,
   that doesn't matter.

Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: Logical replication without a Primary Key

Petr Jelinek-4
In reply to this post by Robert Haas
On 07/12/17 21:19, Robert Haas wrote:

> On Thu, Dec 7, 2017 at 2:53 PM, Chapman Flack <[hidden email]> wrote:
>> On 12/07/2017 02:38 PM, Joshua D. Drake wrote:
>>> A    B   C
>>> foo,bar,baz
>>> foo,bar,baz
>>>
>>> And then I say:
>>>
>>> UPDATE test set A = 1 where C = baz
>>>
>>> I have updated two rows because there is no primary key to identify the
>>> differences. Both of those rows should be updated and thus replicated
>>
>> Would the subscriber see two records reporting update of a
>> foo,bar,baz row to 1, so it would do that to (arbitrarily)
>> one of them the first time, and (necessarily) the other, the
>> second time?
>
> Exactly.
>
> (I think.)
>

Yes, that how it was designed to work.

--
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Previous Thread Next Thread