User defined data types in Logical Replication

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

User defined data types in Logical Replication

Huong Dangminh
Hi,

We are getting the bellow error while trying use Logical Replication
with user defined data types in a C program (when call elog function).

 ERROR:  XX000: cache lookup failed for type XXXXX

# XXXXX is remote type's oid

It occurs in worker.c:slot_store_error_callback function when remotetypoid not exist in local pg_type.

I have tried to write a patch (attached).
I think it is not kindly to change typename to the OID's one,
But I could not find the easy way to get typename from OID in the remote host.

---
Thanks and best regards,
Dang Minh Huong
NEC Solution Innovators, Ltd.
http://www.nec-solutioninnovators.co.jp/en/



--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

logicalrep_typmap.patch (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

RE: User defined data types in Logical Replication

Huong Dangminh
Hi,

> We are getting the bellow error while trying use Logical Replication with
> user defined data types in a C program (when call elog function).
>
>  ERROR:  XX000: cache lookup failed for type XXXXX
>

Sorry for continuously disturbing in this topic, but am I missing something here?

I mean that in case of type's OID in PUBLICATION host does not exists in SUBSCRIPTION host's pg_type,
it could returns unintended error (the XX000 above) when elog or ereport is executed.

For more details, it happen in slot_store_error_callback when it try to call format_type_be(localtypoid) for errcontext.
slot_store_error_callback is set in slot_store_cstrings, slot_modify_cstrings function and it also be unset here, so the effect here is small but it happens.


---
Thanks and best regards,
Dang Minh Huong
NEC Solution Innovators, Ltd.
http://www.nec-solutioninnovators.co.jp/en/



Reply | Threaded
Open this post in threaded view
|

Re: User defined data types in Logical Replication

Masahiko Sawada
On Wed, Nov 15, 2017 at 7:55 PM, Huong Dangminh
<[hidden email]> wrote:
> Hi,
>
>> We are getting the bellow error while trying use Logical Replication with
>> user defined data types in a C program (when call elog function).
>>
>>  ERROR:  XX000: cache lookup failed for type XXXXX
>>
>
> Sorry for continuously disturbing in this topic, but am I missing something here?

No, but I'd suggest to provide a procedure for reproducing if
possible, which will be helpful for investigation.

> I mean that in case of type's OID in PUBLICATION host does not exists in SUBSCRIPTION host's pg_type,
> it could returns unintended error (the XX000 above) when elog or ereport is executed.
>
> For more details, it happen in slot_store_error_callback when it try to call format_type_be(localtypoid) for errcontext.
> slot_store_error_callback is set in slot_store_cstrings, slot_modify_cstrings function and it also be unset here, so the effect here is small but it happens.
>

I think I found out the cause of this issue, and this is a bug. This
can be reproduced, for example, if the input function of the data type
calls elog() during applying on the environment where OIDs of the data
type on publisher and subscriber are different. The cause of this
issue is that we call format_type_be() with remotetypoid. If the OIDs
of data type on publisher and subscriber are different we search it
from syscache by the OID that doesn't exist on subscriber.

On detail of your patch, I don't think this direction is good. Since
the subscriber already has a LogicalRepTyp cache entry for the type we
can report the error message using the data type name. So I think this
issue can be fixed by using the remote type name got from the cache.

Also I'm confused about the message of errcontext; currently we store
the local data type OID corresponding to the remote data type name
into the cache, and then we search the local data type name by the
local data type OID stored in the cache. So  it means the both the
local data type OID and the remote data type OID always imply the same
data type. We use the both data type OIDs for log message in
slot_store_error_callback, but I think what the function want to do is
to show the different type names if the table definitions on both
server are different (e.g. sending jsonb column data to text column
data). I think we should use the type of the local relation attribute
rather than remote's one.

Attached draft patch fixed this issue, at least on my environment.
Please review it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

fix_slot_store_error_callback.patch (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

RE: Re: User defined data types in Logical Replication

Huong Dangminh
In reply to this post by Huong Dangminh
Sawada-san,

Thanks for your response.
# And sorry again because I could not reply to your gmail
# address from my environment due to security restriction.

> >> We are getting the bellow error while trying use Logical Replication
> >> with user defined data types in a C program (when call elog function).
> >>
> >>  ERROR:  XX000: cache lookup failed for type XXXXX
> >>
> >
> > Sorry for continuously disturbing in this topic, but am I missing something
> here?
>
> No, but I'd suggest to provide a procedure for reproducing if possible,
> which will be helpful for investigation.

Sorry, I will be careful next time.

> > I mean that in case of type's OID in PUBLICATION host does not exists
> > in SUBSCRIPTION host's pg_type, it could returns unintended error (the
> XX000 above) when elog or ereport is executed.
> >
> > For more details, it happen in slot_store_error_callback when it try to
> call format_type_be(localtypoid) for errcontext.
> > slot_store_error_callback is set in slot_store_cstrings,
> slot_modify_cstrings function and it also be unset here, so the effect here
> is small but it happens.
> >
>
> I think I found out the cause of this issue, and this is a bug. This can
> be reproduced, for example, if the input function of the data type calls
> elog() during applying on the environment where OIDs of the data type on
> publisher and subscriber are different. The cause of this issue is that
> we call format_type_be() with remotetypoid. If the OIDs of data type on
> publisher and subscriber are different we search it from syscache by the
> OID that doesn't exist on subscriber.

Yes, I also think that.

> On detail of your patch, I don't think this direction is good. Since the
> subscriber already has a LogicalRepTyp cache entry for the type we can report
> the error message using the data type name. So I think this issue can be
> fixed by using the remote type name got from the cache.

Thanks,
I did not realize the LogicalRepRelMapEntry, remote type name is already here.

> Also I'm confused about the message of errcontext; currently we store the
> local data type OID corresponding to the remote data type name into the
> cache, and then we search the local data type name by the local data type
> OID stored in the cache. So  it means the both the local data type OID and
> the remote data type OID always imply the same data type. We use the both
> data type OIDs for log message in slot_store_error_callback, but I think
> what the function want to do is to show the different type names if the
> table definitions on both server are different (e.g. sending jsonb column
> data to text column data). I think we should use the type of the local relation
> attribute rather than remote's one.
>
> Attached draft patch fixed this issue, at least on my environment.

It works good for me.

> Please review it.

I will review it soon.


---
Thanks and best regards,
Dang Minh Huong
NEC Solution Innovators, Ltd.
http://www.nec-solutioninnovators.co.jp/en/

Reply | Threaded
Open this post in threaded view
|

RE: Re: User defined data types in Logical Replication

Huong Dangminh
Sorry for not replying sooner.

> > Attached draft patch fixed this issue, at least on my environment.
>
> It works good for me.
>
> > Please review it.
>
> I will review it soon.

There is one more case that user-defined data type is not supported in Logical Replication.
That is when remote data type's name does not exist in SUBSCRIBE.

In relation.c:logicalrep_typmap_gettypname
We search OID in syscache by remote's data type name and mapping it, if it does not exist in syscache
We will be faced with the bellow error.

        if (!OidIsValid(entry->typoid))
                ereport(ERROR,
                                (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                                 errmsg("data type \"%s.%s\" required for logical replication does not exist",
                                                entry->nspname, entry->typname)));

I think, it is not necessary to check typoid here in order to avoid above case, is that right?
I attached a patch based on Sawada-san's patch with a bit of messages modified and remove the above check.
Could somebody check it for me or should I add it into CF?


---
Thanks and best regards,
Dang Minh Huong
NEC Solution Innovators, Ltd.
http://www.nec-solutioninnovators.co.jp/en/


fix_slot_store_error_callback_V2.patch (7K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

RE: Re: User defined data types in Logical Replication

Huong Dangminh
Hi,

> From: Huong Dangminh [mailto:[hidden email]]
> I attached a patch based on Sawada-san's patch with a bit of messages
> modified and remove the above check.
> Could somebody check it for me or should I add it into CF?

Sorry, I have added this thread to the next CF.


---
Thanks and best regards,
Dang Minh Huong
NEC Solution Innovators, Ltd.
http://www.nec-solutioninnovators.co.jp/en/

Reply | Threaded
Open this post in threaded view
|

Re: Re: User defined data types in Logical Replication

Masahiko Sawada
In reply to this post by Huong Dangminh
On Wed, Nov 22, 2017 at 12:25 AM, Huong Dangminh
<[hidden email]> wrote:
> Thanks for your response.
> # And sorry again because I could not reply to your gmail
> # address from my environment due to security restriction.

It's okay. I can understand your environment :-)

> Sorry for not replying sooner.
>
>> > Attached draft patch fixed this issue, at least on my environment.
>>
>> It works good for me.
>>
>> > Please review it.
>>
>> I will review it soon.
>
> There is one more case that user-defined data type is not supported in Logical Replication.
> That is when remote data type's name does not exist in SUBSCRIBE.
>
> In relation.c:logicalrep_typmap_gettypname
> We search OID in syscache by remote's data type name and mapping it, if it does not exist in syscache
> We will be faced with the bellow error.
>
>         if (!OidIsValid(entry->typoid))
>                 ereport(ERROR,
>                                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>                                  errmsg("data type \"%s.%s\" required for logical replication does not exist",
>                                                 entry->nspname, entry->typname)));
>
> I think, it is not necessary to check typoid here in order to avoid above case, is that right?

I think it's not right. We should end up with an error in the case
where the same type name doesn't exist on subscriber. With your
proposed patch, logicalrep_typmap_gettypname() can return an invalid
string (entry->typname) in that case, which can be a cause of SEGV.

> Sorry, I have added this thread to the next CF.

Thank you for adding it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Reply | Threaded
Open this post in threaded view
|

Re: User defined data types in Logical Replication

bocap

Hi Sawada-san,

Sorry for my late response.

On 2017/12/05 0:11, Masahiko Sawada wrote:
There is one more case that user-defined data type is not supported in Logical Replication.
That is when remote data type's name does not exist in SUBSCRIBE.

In relation.c:logicalrep_typmap_gettypname
We search OID in syscache by remote's data type name and mapping it, if it does not exist in syscache
We will be faced with the bellow error.

        if (!OidIsValid(entry->typoid))
                ereport(ERROR,
                                (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                                 errmsg("data type \"%s.%s\" required for logical replication does not exist",
                                                entry->nspname, entry->typname)));

I think, it is not necessary to check typoid here in order to avoid above case, is that right?
I think it's not right. We should end up with an error in the case where the same type name doesn't exist on subscriber. With your proposed patch, logicalrep_typmap_gettypname() can return an invalid string (entry->typname) in that case, which can be a cause of SEGV.
Thanks, I think you are right.
# I thought that entry->typname was received from walsender, and it is already be qualified in logicalrep_write_typ.
# But we also need check it in subscriber, because it could be lost info in transmission.
Also we do not need to fix for the case above too, because user can change type name by ALTER TYPE statement.
I attached the patch, which based on your patch with a bit of modified messages.

--- 
Thanks and best regards, 
Dang Minh Huong 



fix_slot_store_error_callback_V3.patch (7K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: User defined data types in Logical Replication

Masahiko Sawada
On Thu, Dec 7, 2017 at 12:23 AM, Dang Minh Huong <[hidden email]> wrote:

> Hi Sawada-san,
>
> Sorry for my late response.
>
> On 2017/12/05 0:11, Masahiko Sawada wrote:
>
> There is one more case that user-defined data type is not supported in
> Logical Replication.
> That is when remote data type's name does not exist in SUBSCRIBE.
>
> In relation.c:logicalrep_typmap_gettypname
> We search OID in syscache by remote's data type name and mapping it, if it
> does not exist in syscache
> We will be faced with the bellow error.
>
>         if (!OidIsValid(entry->typoid))
>                 ereport(ERROR,
>
> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>                                  errmsg("data type \"%s.%s\" required for
> logical replication does not exist",
>                                                 entry->nspname,
> entry->typname)));
>
> I think, it is not necessary to check typoid here in order to avoid above
> case, is that right?
>
> I think it's not right. We should end up with an error in the case where the
> same type name doesn't exist on subscriber. With your proposed patch,
> logicalrep_typmap_gettypname() can return an invalid string (entry->typname)
> in that case, which can be a cause of SEGV.
>
> Thanks, I think you are right.
> # I thought that entry->typname was received from walsender, and it is
> already be qualified in logicalrep_write_typ.
> # But we also need check it in subscriber, because it could be lost info in
> transmission.

Oops, the last sentence of my previous mail was wrong.
logicalrep_typmap_gettypname() doesn't return an invalid string since
entry->typname is initialized with a type name got from wal sender.

After more thought, we might not need to raise an error even if there
is not the same data type on both publisher and subscriber. Because
data is sent after converted to the text representation and is
converted to a data type according to the local table definition
subscribers don't always need to have the same data type. If it's
right, slot_store_error_callback() doesn't need to find a
corresponding local data type OID but just finds the corresponding
type name by remote data type OID. What do you think?

--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -100,8 +100,9 @@ static dlist_head lsn_mapping =
DLIST_STATIC_INIT(lsn_mapping);

 typedef struct SlotErrCallbackArg
 {
-    LogicalRepRelation *rel;
-    int            attnum;
+    LogicalRepRelMapEntry *rel;
+    int            remote_attnum;
+    int            local_attnum;
 } SlotErrCallbackArg;

Since LogicalRepRelMapEntry has a map of local attributes to remote
ones we don't need to have two attribute numbers.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Reply | Threaded
Open this post in threaded view
|

Re: User defined data types in Logical Replication

Masahiko Sawada
On Thu, Dec 7, 2017 at 11:07 AM, Masahiko Sawada <[hidden email]> wrote:

> On Thu, Dec 7, 2017 at 12:23 AM, Dang Minh Huong <[hidden email]> wrote:
>> Hi Sawada-san,
>>
>> Sorry for my late response.
>>
>> On 2017/12/05 0:11, Masahiko Sawada wrote:
>>
>> There is one more case that user-defined data type is not supported in
>> Logical Replication.
>> That is when remote data type's name does not exist in SUBSCRIBE.
>>
>> In relation.c:logicalrep_typmap_gettypname
>> We search OID in syscache by remote's data type name and mapping it, if it
>> does not exist in syscache
>> We will be faced with the bellow error.
>>
>>         if (!OidIsValid(entry->typoid))
>>                 ereport(ERROR,
>>
>> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>>                                  errmsg("data type \"%s.%s\" required for
>> logical replication does not exist",
>>                                                 entry->nspname,
>> entry->typname)));
>>
>> I think, it is not necessary to check typoid here in order to avoid above
>> case, is that right?
>>
>> I think it's not right. We should end up with an error in the case where the
>> same type name doesn't exist on subscriber. With your proposed patch,
>> logicalrep_typmap_gettypname() can return an invalid string (entry->typname)
>> in that case, which can be a cause of SEGV.
>>
>> Thanks, I think you are right.
>> # I thought that entry->typname was received from walsender, and it is
>> already be qualified in logicalrep_write_typ.
>> # But we also need check it in subscriber, because it could be lost info in
>> transmission.
>
> Oops, the last sentence of my previous mail was wrong.
> logicalrep_typmap_gettypname() doesn't return an invalid string since
> entry->typname is initialized with a type name got from wal sender.
>
> After more thought, we might not need to raise an error even if there
> is not the same data type on both publisher and subscriber. Because
> data is sent after converted to the text representation and is
> converted to a data type according to the local table definition
> subscribers don't always need to have the same data type. If it's
> right, slot_store_error_callback() doesn't need to find a
> corresponding local data type OID but just finds the corresponding
> type name by remote data type OID. What do you think?
>
> --- a/src/backend/replication/logical/worker.c
> +++ b/src/backend/replication/logical/worker.c
> @@ -100,8 +100,9 @@ static dlist_head lsn_mapping =
> DLIST_STATIC_INIT(lsn_mapping);
>
>  typedef struct SlotErrCallbackArg
>  {
> -    LogicalRepRelation *rel;
> -    int            attnum;
> +    LogicalRepRelMapEntry *rel;
> +    int            remote_attnum;
> +    int            local_attnum;
>  } SlotErrCallbackArg;
>
> Since LogicalRepRelMapEntry has a map of local attributes to remote
> ones we don't need to have two attribute numbers.
>
Attached the patch incorporated what I have on mind. Please review it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

fix_slot_store_error_callback_v4.patch (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

RE: User defined data types in Logical Replication

Huong Dangminh
Hi Sawada-san,

> On Thu, Dec 7, 2017 at 11:07 AM, Masahiko Sawada <[hidden email]>
> wrote:
> > On Thu, Dec 7, 2017 at 12:23 AM, Dang Minh Huong <[hidden email]>
> wrote:
> >> Hi Sawada-san,
> >>
> >> Sorry for my late response.
> >>
> >> On 2017/12/05 0:11, Masahiko Sawada wrote:
> >>
> >> There is one more case that user-defined data type is not supported
> >> in Logical Replication.
> >> That is when remote data type's name does not exist in SUBSCRIBE.
> >>
> >> In relation.c:logicalrep_typmap_gettypname
> >> We search OID in syscache by remote's data type name and mapping it,
> >> if it does not exist in syscache We will be faced with the bellow
> >> error.
> >>
> >>         if (!OidIsValid(entry->typoid))
> >>                 ereport(ERROR,
> >>
> >> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> >>                                  errmsg("data type \"%s.%s\" required
> >> for logical replication does not exist",
> >>                                                 entry->nspname,
> >> entry->typname)));
> >>
> >> I think, it is not necessary to check typoid here in order to avoid
> >> above case, is that right?
> >>
> >> I think it's not right. We should end up with an error in the case
> >> where the same type name doesn't exist on subscriber. With your
> >> proposed patch,
> >> logicalrep_typmap_gettypname() can return an invalid string
> >> (entry->typname) in that case, which can be a cause of SEGV.
> >>
> >> Thanks, I think you are right.
> >> # I thought that entry->typname was received from walsender, and it
> >> is already be qualified in logicalrep_write_typ.
> >> # But we also need check it in subscriber, because it could be lost
> >> info in transmission.
> >
> > Oops, the last sentence of my previous mail was wrong.
> > logicalrep_typmap_gettypname() doesn't return an invalid string since
> > entry->typname is initialized with a type name got from wal sender.
> >
> > After more thought, we might not need to raise an error even if there
> > is not the same data type on both publisher and subscriber. Because
> > data is sent after converted to the text representation and is
> > converted to a data type according to the local table definition
> > subscribers don't always need to have the same data type. If it's
> > right, slot_store_error_callback() doesn't need to find a
> > corresponding local data type OID but just finds the corresponding
> > type name by remote data type OID. What do you think?
> >
> > --- a/src/backend/replication/logical/worker.c
> > +++ b/src/backend/replication/logical/worker.c
> > @@ -100,8 +100,9 @@ static dlist_head lsn_mapping =
> > DLIST_STATIC_INIT(lsn_mapping);
> >
> >  typedef struct SlotErrCallbackArg
> >  {
> > -    LogicalRepRelation *rel;
> > -    int            attnum;
> > +    LogicalRepRelMapEntry *rel;
> > +    int            remote_attnum;
> > +    int            local_attnum;
> >  } SlotErrCallbackArg;
> >
> > Since LogicalRepRelMapEntry has a map of local attributes to remote
> > ones we don't need to have two attribute numbers.
> >

Sorry for the late reply.

> Attached the patch incorporated what I have on mind. Please review it.

Thanks for the patch, I will do it at this weekend.


---
Thanks and best regards,
Dang Minh Huong
NEC Solution Innovators, Ltd.
http://www.nec-solutioninnovators.co.jp/en/
Reply | Threaded
Open this post in threaded view
|

Re: User defined data types in Logical Replication

bocap
On 2017/12/08 13:18, Huong Dangminh wrote:

> Hi Sawada-san,
>
>> On Thu, Dec 7, 2017 at 11:07 AM, Masahiko Sawada <[hidden email]>
>> wrote:
>>> On Thu, Dec 7, 2017 at 12:23 AM, Dang Minh Huong <[hidden email]>
>> wrote:
>>>> Hi Sawada-san,
>>>>
>>>> Sorry for my late response.
>>>>
>>>> On 2017/12/05 0:11, Masahiko Sawada wrote:
>>>>
>>>> There is one more case that user-defined data type is not supported
>>>> in Logical Replication.
>>>> That is when remote data type's name does not exist in SUBSCRIBE.
>>>>
>>>> In relation.c:logicalrep_typmap_gettypname
>>>> We search OID in syscache by remote's data type name and mapping it,
>>>> if it does not exist in syscache We will be faced with the bellow
>>>> error.
>>>>
>>>>          if (!OidIsValid(entry->typoid))
>>>>                  ereport(ERROR,
>>>>
>>>> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>>>>                                   errmsg("data type \"%s.%s\" required
>>>> for logical replication does not exist",
>>>>                                                  entry->nspname,
>>>> entry->typname)));
>>>>
>>>> I think, it is not necessary to check typoid here in order to avoid
>>>> above case, is that right?
>>>>
>>>> I think it's not right. We should end up with an error in the case
>>>> where the same type name doesn't exist on subscriber. With your
>>>> proposed patch,
>>>> logicalrep_typmap_gettypname() can return an invalid string
>>>> (entry->typname) in that case, which can be a cause of SEGV.
>>>>
>>>> Thanks, I think you are right.
>>>> # I thought that entry->typname was received from walsender, and it
>>>> is already be qualified in logicalrep_write_typ.
>>>> # But we also need check it in subscriber, because it could be lost
>>>> info in transmission.
>>> Oops, the last sentence of my previous mail was wrong.
>>> logicalrep_typmap_gettypname() doesn't return an invalid string since
>>> entry->typname is initialized with a type name got from wal sender.
Yeah, so we do not need to check the existing of publisher's type name
in subscriber.
>>> After more thought, we might not need to raise an error even if there
>>> is not the same data type on both publisher and subscriber. Because
>>> data is sent after converted to the text representation and is
>>> converted to a data type according to the local table definition
>>> subscribers don't always need to have the same data type. If it's
>>> right, slot_store_error_callback() doesn't need to find a
>>> corresponding local data type OID but just finds the corresponding
>>> type name by remote data type OID. What do you think?
I totally agree. It will make logical replication more flexible with
data type.

>>> --- a/src/backend/replication/logical/worker.c
>>> +++ b/src/backend/replication/logical/worker.c
>>> @@ -100,8 +100,9 @@ static dlist_head lsn_mapping =
>>> DLIST_STATIC_INIT(lsn_mapping);
>>>
>>>   typedef struct SlotErrCallbackArg
>>>   {
>>> -    LogicalRepRelation *rel;
>>> -    int            attnum;
>>> +    LogicalRepRelMapEntry *rel;
>>> +    int            remote_attnum;
>>> +    int            local_attnum;
>>>   } SlotErrCallbackArg;
>>>
>>> Since LogicalRepRelMapEntry has a map of local attributes to remote
>>> ones we don't need to have two attribute numbers.
>>>
> Sorry for the late reply.
>
>> Attached the patch incorporated what I have on mind. Please review it.
> Thanks for the patch, I will do it at this weekend.
Your patch is fine for me.
But logicalrep_typmap_getid will be unused.
I attached the patch with removing of it.


---
Thanks and best regards,
Dang Minh Huong

fix_slot_store_error_callback_v5.patch (7K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: User defined data types in Logical Replication

Masahiko Sawada
On Sun, Dec 10, 2017 at 12:33 AM, Dang Minh Huong <[hidden email]> wrote:

> On 2017/12/08 13:18, Huong Dangminh wrote:
>
>> Hi Sawada-san,
>>
>>> On Thu, Dec 7, 2017 at 11:07 AM, Masahiko Sawada <[hidden email]>
>>> wrote:
>>>>
>>>> On Thu, Dec 7, 2017 at 12:23 AM, Dang Minh Huong <[hidden email]>
>>>
>>> wrote:
>>>>>
>>>>> Hi Sawada-san,
>>>>>
>>>>> Sorry for my late response.
>>>>>
>>>>> On 2017/12/05 0:11, Masahiko Sawada wrote:
>>>>>
>>>>> There is one more case that user-defined data type is not supported
>>>>> in Logical Replication.
>>>>> That is when remote data type's name does not exist in SUBSCRIBE.
>>>>>
>>>>> In relation.c:logicalrep_typmap_gettypname
>>>>> We search OID in syscache by remote's data type name and mapping it,
>>>>> if it does not exist in syscache We will be faced with the bellow
>>>>> error.
>>>>>
>>>>>          if (!OidIsValid(entry->typoid))
>>>>>                  ereport(ERROR,
>>>>>
>>>>> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>>>>>                                   errmsg("data type \"%s.%s\" required
>>>>> for logical replication does not exist",
>>>>>                                                  entry->nspname,
>>>>> entry->typname)));
>>>>>
>>>>> I think, it is not necessary to check typoid here in order to avoid
>>>>> above case, is that right?
>>>>>
>>>>> I think it's not right. We should end up with an error in the case
>>>>> where the same type name doesn't exist on subscriber. With your
>>>>> proposed patch,
>>>>> logicalrep_typmap_gettypname() can return an invalid string
>>>>> (entry->typname) in that case, which can be a cause of SEGV.
>>>>>
>>>>> Thanks, I think you are right.
>>>>> # I thought that entry->typname was received from walsender, and it
>>>>> is already be qualified in logicalrep_write_typ.
>>>>> # But we also need check it in subscriber, because it could be lost
>>>>> info in transmission.
>>>>
>>>> Oops, the last sentence of my previous mail was wrong.
>>>> logicalrep_typmap_gettypname() doesn't return an invalid string since
>>>> entry->typname is initialized with a type name got from wal sender.
>
> Yeah, so we do not need to check the existing of publisher's type name in
> subscriber.
>>>>
>>>> After more thought, we might not need to raise an error even if there
>>>> is not the same data type on both publisher and subscriber. Because
>>>> data is sent after converted to the text representation and is
>>>> converted to a data type according to the local table definition
>>>> subscribers don't always need to have the same data type. If it's
>>>> right, slot_store_error_callback() doesn't need to find a
>>>> corresponding local data type OID but just finds the corresponding
>>>> type name by remote data type OID. What do you think?
>
> I totally agree. It will make logical replication more flexible with data
> type.
>>>>
>>>> --- a/src/backend/replication/logical/worker.c
>>>> +++ b/src/backend/replication/logical/worker.c
>>>> @@ -100,8 +100,9 @@ static dlist_head lsn_mapping =
>>>> DLIST_STATIC_INIT(lsn_mapping);
>>>>
>>>>   typedef struct SlotErrCallbackArg
>>>>   {
>>>> -    LogicalRepRelation *rel;
>>>> -    int            attnum;
>>>> +    LogicalRepRelMapEntry *rel;
>>>> +    int            remote_attnum;
>>>> +    int            local_attnum;
>>>>   } SlotErrCallbackArg;
>>>>
>>>> Since LogicalRepRelMapEntry has a map of local attributes to remote
>>>> ones we don't need to have two attribute numbers.
>>>>
>> Sorry for the late reply.
>>
>>> Attached the patch incorporated what I have on mind. Please review it.
>>
>> Thanks for the patch, I will do it at this weekend.
>
> Your patch is fine for me.
> But logicalrep_typmap_getid will be unused.

Yeah, actually we don't use LogicalRepTyp.typoid as well. If we allow
to replicate data to an another data type of column, what is the data
type mapping on subscriber for? It seems enough to have remote data
type oid, namespace and name.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Previous Thread Next Thread