Enumize logical replication message actions

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|

Enumize logical replication message actions

Ashutosh Bapat-2
Hi All,
 Logical replication protocol uses single byte character to identify
different chunks of logical repliation messages. The code uses
character literals for the same. These literals are used as bare
constants in code as well. That's true for almost all the code that
deals with wire protocol. With that it becomes difficult to identify
the code which deals with a particular message. For example code that
deals with message type 'B'. In various protocol 'B' has different
meaning and it gets difficult and time consuming to differentiate one
usage from other and find all places which deal with one usage. Here's
a patch simplifying that for top level logical replication messages.

I think I have covered the places that need change. But I might have
missed something, given that these literals are used at several other
places (a problem this patch tries to fix :)).

Initially I had used #define for the same, but Peter E suggested using
Enums so that switch cases can detect any remaining items along with
stronger type checks.

Pavan offleast suggested to create a wrapper
pg_send_logical_rep_message() on top of pg_sendbyte(), similarly for
pg_getmsgbyte(). I wanted to see if this change is acceptable. If so,
I will change that as well. Comments/suggestions welcome.

--
Best Wishes,
Ashutosh Bapat

0001-Enumize-top-level-logical-replication-actions.patch (11K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Enumize logical replication message actions

Li Japin

> On Oct 16, 2020, at 3:25 PM, Ashutosh Bapat <[hidden email]> wrote:
>
> Hi All,
> Logical replication protocol uses single byte character to identify
> different chunks of logical repliation messages. The code uses
> character literals for the same. These literals are used as bare
> constants in code as well. That's true for almost all the code that
> deals with wire protocol. With that it becomes difficult to identify
> the code which deals with a particular message. For example code that
> deals with message type 'B'. In various protocol 'B' has different
> meaning and it gets difficult and time consuming to differentiate one
> usage from other and find all places which deal with one usage. Here's
> a patch simplifying that for top level logical replication messages.
>
> I think I have covered the places that need change. But I might have
> missed something, given that these literals are used at several other
> places (a problem this patch tries to fix :)).
>
> Initially I had used #define for the same, but Peter E suggested using
> Enums so that switch cases can detect any remaining items along with
> stronger type checks.
>
> Pavan offleast suggested to create a wrapper
> pg_send_logical_rep_message() on top of pg_sendbyte(), similarly for
> pg_getmsgbyte(). I wanted to see if this change is acceptable. If so,
> I will change that as well. Comments/suggestions welcome.
>
> --
> Best Wishes,
> Ashutosh Bapat
> <0001-Enumize-top-level-logical-replication-actions.patch>

What about ’N’ for new tuples, ‘O’ for old tuple follows, ‘K’ for old key follows?
Those are also logical replication protocol message, I think.

--
Best regards
Japin Li

Reply | Threaded
Open this post in threaded view
|

Re: Enumize logical replication message actions

Kyotaro Horiguchi-4
At Fri, 16 Oct 2020 08:08:40 +0000, Li Japin <[hidden email]> wrote in

>
> > On Oct 16, 2020, at 3:25 PM, Ashutosh Bapat <[hidden email]> wrote:
> >
> > Hi All,
> > Logical replication protocol uses single byte character to identify
> > different chunks of logical repliation messages. The code uses
> > character literals for the same. These literals are used as bare
> > constants in code as well. That's true for almost all the code that
> > deals with wire protocol. With that it becomes difficult to identify
> > the code which deals with a particular message. For example code that
> > deals with message type 'B'. In various protocol 'B' has different
> > meaning and it gets difficult and time consuming to differentiate one
> > usage from other and find all places which deal with one usage. Here's
> > a patch simplifying that for top level logical replication messages.
> >
> > I think I have covered the places that need change. But I might have
> > missed something, given that these literals are used at several other
> > places (a problem this patch tries to fix :)).
> >
> > Initially I had used #define for the same, but Peter E suggested using
> > Enums so that switch cases can detect any remaining items along with
> > stronger type checks.
> >
> > Pavan offleast suggested to create a wrapper
> > pg_send_logical_rep_message() on top of pg_sendbyte(), similarly for
> > pg_getmsgbyte(). I wanted to see if this change is acceptable. If so,
> > I will change that as well. Comments/suggestions welcome.
> >
> > --
> > Best Wishes,
> > Ashutosh Bapat
> > <0001-Enumize-top-level-logical-replication-actions.patch>
>
> What about ’N’ for new tuples, ‘O’ for old tuple follows, ‘K’ for old key follows?
> Those are also logical replication protocol message, I think.

They are flags stored in a message so they can be seen as different
from the message type letters.

Anyway if the values are determined after some meaning, I'm not sure
enumerize them is good thing or not.  In other words, 'U' conveys
almost same amount of information with LOGICAL_REP_MSG_UPDATE in the
context of logical replcation protocol.

We have the same code pattern in PostgresMain and perhaps we don't
going to change them into enums.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center
Reply | Threaded
Open this post in threaded view
|

Re: Enumize logical replication message actions

Ashutosh Bapat-3


On Fri, 16 Oct 2020 at 14:06, Kyotaro Horiguchi <[hidden email]> wrote:
At Fri, 16 Oct 2020 08:08:40 +0000, Li Japin <[hidden email]> wrote in
>
> > On Oct 16, 2020, at 3:25 PM, Ashutosh Bapat <[hidden email]> wrote:

>
> What about ’N’ for new tuples, ‘O’ for old tuple follows, ‘K’ for old key follows?
> Those are also logical replication protocol message, I think.

They are flags stored in a message so they can be seen as different
from the message type letters.

I think we converting those into macros/enums will help but for now I have tackled only the top level message types.
 

Anyway if the values are determined after some meaning, I'm not sure
enumerize them is good thing or not.  In other words, 'U' conveys
almost same amount of information with LOGICAL_REP_MSG_UPDATE in the
context of logical replcation protocol.

We have the same code pattern in PostgresMain and perhaps we don't
going to change them into enums.

That's exactly the problem I am trying to solve. Take for example 'B' as I have mentioned before. That string literal appears in 64 different places in the master branch. Which of those are the ones related to a "BEGIN" message in logical replication protocol is not clear, unless I thumb through each of those. In PostgresMain it's used to indicate a BIND message. Which of those 64 instances are also using 'B' to mean a bind message? Using enums or macros makes it clear. Just look up LOGICAL_REP_MSG_BEGIN. Converting all 'B' to their respective macros will help but might be problematic for back-patching. So that's arguable. But doing that in something as new as logical replication will be helpful, before it gets too late to change that.

Further logical repliation protocol is using the same literal e.g. 'O' to mean origin in some places and old tuple in some other. While comments there help, it's not easy to locate all the code that deals with one meaning or the other. This change will help with that. Another reason as to why logical replication.
--
Best Wishes,
Ashutosh
Reply | Threaded
Open this post in threaded view
|

Re: Enumize logical replication message actions

akapila
In reply to this post by Ashutosh Bapat-2
On Fri, Oct 16, 2020 at 12:55 PM Ashutosh Bapat
<[hidden email]> wrote:

>
> Hi All,
>  Logical replication protocol uses single byte character to identify
> different chunks of logical repliation messages. The code uses
> character literals for the same. These literals are used as bare
> constants in code as well. That's true for almost all the code that
> deals with wire protocol. With that it becomes difficult to identify
> the code which deals with a particular message. For example code that
> deals with message type 'B'. In various protocol 'B' has different
> meaning and it gets difficult and time consuming to differentiate one
> usage from other and find all places which deal with one usage. Here's
> a patch simplifying that for top level logical replication messages.
>

+1. I think this will make the code easier to read and understand. I
think it would be good to do this in some other parts as well but
starting with logical replication is a good idea as that area is still
evolving.

--
With Regards,
Amit Kapila.


Reply | Threaded
Open this post in threaded view
|

Re: Enumize logical replication message actions

Andres Freund
In reply to this post by Ashutosh Bapat-2
Hi,

On 2020-10-16 12:55:26 +0530, Ashutosh Bapat wrote:
> Here's a patch simplifying that for top level logical replication
> messages.

I think that's a good plan. One big benefit for me is that it's much
easier to search for an enum than for a single letter
constant. Including searching for all the places that deal with any sort
of logical rep message type.


>  void
>  logicalrep_write_begin(StringInfo out, ReorderBufferTXN *txn)
>  {
> - pq_sendbyte(out, 'B'); /* BEGIN */
> + pq_sendbyte(out, LOGICAL_REP_MSG_BEGIN); /* BEGIN */

I think if we have the LOGICAL_REP_MSG_BEGIN we don't need the /* BEGIN */.


Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: Enumize logical replication message actions

Ashutosh Bapat-3
Thanks Andres for your review. Thanks Li, Horiguchi-san and Amit for your comments.

On Tue, 20 Oct 2020 at 04:57, Andres Freund <[hidden email]> wrote:
Hi,

On 2020-10-16 12:55:26 +0530, Ashutosh Bapat wrote:
> Here's a patch simplifying that for top level logical replication
> messages.

I think that's a good plan. One big benefit for me is that it's much
easier to search for an enum than for a single letter
constant. Including searching for all the places that deal with any sort
of logical rep message type. 


>  void
>  logicalrep_write_begin(StringInfo out, ReorderBufferTXN *txn)
>  {
> -     pq_sendbyte(out, 'B');          /* BEGIN */
> +     pq_sendbyte(out, LOGICAL_REP_MSG_BEGIN);                /* BEGIN */

I think if we have the LOGICAL_REP_MSG_BEGIN we don't need the /* BEGIN */.

Yes. Fixed all places.

I have attached two places - 0001 which is previous 0001 patch with your comments addressed.

0002 adds wrappers on top of pq_sendbyte() and pq_getmsgbyte() to send and receive a logical replication message type respectively. These wrappers add more protection to make sure that the enum definitions fit one byte. This also removes the default case from apply_dispatch() so that we can detect any LogicalRepMsgType not handled by that function.

These two patches are intended to be committed together as a single commit. For now the second one is separate so that it's easy to remove the changes if they are not acceptable.

--
Best Wishes,
Ashutosh

0001-Enumize-top-level-logical-replication-actions.patch (10K) Download Attachment
0002-Functions-to-send-and-receive-LogicalRepMsgType.patch (9K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Enumize logical replication message actions

Kyotaro Horiguchi-4
At Thu, 22 Oct 2020 12:13:40 +0530, Ashutosh Bapat <[hidden email]> wrote in

> Thanks Andres for your review. Thanks Li, Horiguchi-san and Amit for your
> comments.
>
> On Tue, 20 Oct 2020 at 04:57, Andres Freund <[hidden email]> wrote:
>
> > Hi,
> >
> > On 2020-10-16 12:55:26 +0530, Ashutosh Bapat wrote:
> > > Here's a patch simplifying that for top level logical replication
> > > messages.
> >
> > I think that's a good plan. One big benefit for me is that it's much
> > easier to search for an enum than for a single letter
> > constant. Including searching for all the places that deal with any sort
> > of logical rep message type.
>
>
> >
> > >  void
> > >  logicalrep_write_begin(StringInfo out, ReorderBufferTXN *txn)
> > >  {
> > > -     pq_sendbyte(out, 'B');          /* BEGIN */
> > > +     pq_sendbyte(out, LOGICAL_REP_MSG_BEGIN);                /* BEGIN */
> >
> > I think if we have the LOGICAL_REP_MSG_BEGIN we don't need the /* BEGIN */.
> >
>
> Yes. Fixed all places.
>
> I have attached two places - 0001 which is previous 0001 patch with your
> comments addressed.

We shouldn't have the default: in the switch() block in
apply_dispatch(). That prevents compilers from checking
completeness. The content of the default: should be moved out to after
the switch() block.

apply_dispatch()
{
    switch (action)
        {
           ....
            case LOGICAL_REP_MSG_STREAM_COMMIT(s);
                   apply_handle_stream_commit(s);
                   return;
    }

    ereport(ERROR, ...);
}    

> 0002 adds wrappers on top of pq_sendbyte() and pq_getmsgbyte() to send and
> receive a logical replication message type respectively. These wrappers add
> more protection to make sure that the enum definitions fit one byte. This
> also removes the default case from apply_dispatch() so that we can detect
> any LogicalRepMsgType not handled by that function.

pg_send_logicalrep_msg_type() looks somewhat too-much.  If we need
something like that we shouldn't do this refactoring, I think.

pg_get_logicalrep_msg_type() seems doing the same check (that the
value is compared aganst every keyword value) with
apply_dispatch(). Why do we need that function separately from
apply_dispatch()?


> These two patches are intended to be committed together as a single commit.
> For now the second one is separate so that it's easy to remove the changes
> if they are not acceptable.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


Reply | Threaded
Open this post in threaded view
|

Re: Enumize logical replication message actions

Ashutosh Bapat-3


On Thu, 22 Oct 2020 at 14:46, Kyotaro Horiguchi <[hidden email]> wrote:


We shouldn't have the default: in the switch() block in
apply_dispatch(). That prevents compilers from checking
completeness. The content of the default: should be moved out to after
the switch() block.

apply_dispatch()
{
    switch (action)
        {
           ....
            case LOGICAL_REP_MSG_STREAM_COMMIT(s);
                   apply_handle_stream_commit(s);
                   return;
    }

    ereport(ERROR, ...);
}   

> 0002 adds wrappers on top of pq_sendbyte() and pq_getmsgbyte() to send and
> receive a logical replication message type respectively. These wrappers add
> more protection to make sure that the enum definitions fit one byte. This
> also removes the default case from apply_dispatch() so that we can detect
> any LogicalRepMsgType not handled by that function.

pg_send_logicalrep_msg_type() looks somewhat too-much.  If we need
something like that we shouldn't do this refactoring, I think.

Enum is an integer, and we want to send byte. The function asserts that the enum fits a byte. If there's a way to declare byte long enums I would use that. But I didn't find a way to do that.


pg_get_logicalrep_msg_type() seems doing the same check (that the
value is compared aganst every keyword value) with
apply_dispatch(). Why do we need that function separately from
apply_dispatch()?


The second patch removes the default case you quoted above. I think that's important to detect any unhandled case at compile time rather than at run time. But we need some way to detect whether the values we get from wire are legit. pg_get_logicalrep_msg_type() does that. Further that function can be used at places other than apply_dispatch() if required without each of those places having their own validation.

--
Best Wishes,
Ashutosh
Reply | Threaded
Open this post in threaded view
|

Re: Enumize logical replication message actions

Kyotaro Horiguchi-4
At Thu, 22 Oct 2020 16:37:18 +0530, Ashutosh Bapat <[hidden email]> wrote in

> On Thu, 22 Oct 2020 at 14:46, Kyotaro Horiguchi <[hidden email]>
> wrote:
>
> >
> >
> > We shouldn't have the default: in the switch() block in
> > apply_dispatch(). That prevents compilers from checking
> > completeness. The content of the default: should be moved out to after
> > the switch() block.
> >
> > apply_dispatch()
> > {
> >     switch (action)
> >         {
> >            ....
> >             case LOGICAL_REP_MSG_STREAM_COMMIT(s);
> >                    apply_handle_stream_commit(s);
> >                    return;
> >     }
> >
> >     ereport(ERROR, ...);
> > }
> >
> > > 0002 adds wrappers on top of pq_sendbyte() and pq_getmsgbyte() to send
> > and
> > > receive a logical replication message type respectively. These wrappers
> > add
> > > more protection to make sure that the enum definitions fit one byte. This
> > > also removes the default case from apply_dispatch() so that we can detect
> > > any LogicalRepMsgType not handled by that function.
> >
> > pg_send_logicalrep_msg_type() looks somewhat too-much.  If we need
> > something like that we shouldn't do this refactoring, I think.
> >
>
> Enum is an integer, and we want to send byte. The function asserts that the
> enum fits a byte. If there's a way to declare byte long enums I would use
> that. But I didn't find a way to do that.

That way of defining enums can contain two different symbols with the
same value. If we need to check the values are actually in the range
of char, checking duplicate values has more importance from the
standpoint of likelihood.

AFAICS there're two instances of this kind of enums, CoreceionMethod
and TypeCat. None of them are not checked for width nor duplicates
when they are used.

Even if we need such a kind of check, it souldn't be a wrapper
function that adds costs on non-assertion builds, but a replacing of
pq_sendbyte() done only on USE_ASSERT_CHECKING.

> pg_get_logicalrep_msg_type() seems doing the same check (that the
> > value is compared aganst every keyword value) with
> > apply_dispatch(). Why do we need that function separately from
> > apply_dispatch()?
> >
> >
> The second patch removes the default case you quoted above. I think that's
> important to detect any unhandled case at compile time rather than at run
> time. But we need some way to detect whether the values we get from wire
> are legit. pg_get_logicalrep_msg_type() does that. Further that function
> can be used at places other than apply_dispatch() if required without each
> of those places having their own validation.

Even if that enum contains out-of-range values, that "command" is sent
having truncated to uint8 and on the receiver side apply_dispatch()
doesn't identify the command and raises an error.  That is equivalent
to what pq_send_logicalrep_msg_type() does. (Also equivalent on the
point that symbols that are not used in regression are not checked.)

reagrds.

--
Kyotaro Horiguchi
NTT Open Source Software Center


Reply | Threaded
Open this post in threaded view
|

Re: Enumize logical replication message actions

Kyotaro Horiguchi-4
At Fri, 23 Oct 2020 10:08:44 +0900 (JST), Kyotaro Horiguchi <[hidden email]> wrote in

> At Thu, 22 Oct 2020 16:37:18 +0530, Ashutosh Bapat <[hidden email]> wrote in
> > On Thu, 22 Oct 2020 at 14:46, Kyotaro Horiguchi <[hidden email]>
> > wrote:
> > pg_get_logicalrep_msg_type() seems doing the same check (that the
> > > value is compared aganst every keyword value) with
> > > apply_dispatch(). Why do we need that function separately from
> > > apply_dispatch()?
> > >
> > >
> > The second patch removes the default case you quoted above. I think that's
> > important to detect any unhandled case at compile time rather than at run
> > time. But we need some way to detect whether the values we get from wire
> > are legit. pg_get_logicalrep_msg_type() does that. Further that function
> > can be used at places other than apply_dispatch() if required without each
> > of those places having their own validation.
>
> Even if that enum contains out-of-range values, that "command" is sent
> having truncated to uint8 and on the receiver side apply_dispatch()
> doesn't identify the command and raises an error.  That is equivalent
> to what pq_send_logicalrep_msg_type() does. (Also equivalent on the
> point that symbols that are not used in regression are not checked.)

Sorry, this is about pg_send_logicalrep_msg_type(), not
pg_get..(). And I forgot to mention pg_get_logicalrep_msg_type().

For the pg_get_logicalrep_msg_type(), It is just a repetion of what
apply_displatch() does in switch().

If I flattened the code, it looks like:

apply_dispatch(s)
{
  LogicalRepMsgType msgtype = pq_getmsgtype(s);
  bool pass = false;

  switch (msgtype)
  {
     case LOGICAL_REP_MSG_BEGIN:
     ...
     case LOGICAL_REP_MSG_STREAM_COMMIT:
       pass = true;
  }
  if (!pass)
     ereport(ERROR, (errmsg("invalid logical replication message type"..

  switch (msgtype)
  {
     case LOGICAL_REP_MSG_BEGIN:
        apply_handle_begin();
        break;
     ...
     case LOGICAL_REP_MSG_STREAM_COMMIT:
        apply_handle_begin();
        break;
  }  
}    

Those two switch()es are apparently redundant. That code is exactly
equivalent to:

apply_dispatch(s)
{
  LogicalRepMsgType msgtype = pq_getmsgtype(s);

  switch (msgtype)
  {
     case LOGICAL_REP_MSG_BEGIN:
        apply_handle_begin();
!       return;
     ...
     case LOGICAL_REP_MSG_STREAM_COMMIT:
        apply_handle_begin();
!       return;
  }

  ereport(ERROR, (errmsg("invalid logical replication message type"..
}    
     
which is smaller and fast.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


Reply | Threaded
Open this post in threaded view
|

Re: Enumize logical replication message actions

Álvaro Herrera
In reply to this post by Ashutosh Bapat-3
On 2020-Oct-22, Ashutosh Bapat wrote:

> On Thu, 22 Oct 2020 at 14:46, Kyotaro Horiguchi <[hidden email]>
> wrote:

> > pg_send_logicalrep_msg_type() looks somewhat too-much.  If we need
> > something like that we shouldn't do this refactoring, I think.
>
> Enum is an integer, and we want to send byte. The function asserts that the
> enum fits a byte. If there's a way to declare byte long enums I would use
> that. But I didn't find a way to do that.

I didn't look at the code, but maybe it's sufficient to add a
StaticAssert?


Reply | Threaded
Open this post in threaded view
|

Re: Enumize logical replication message actions

Kyotaro Horiguchi-4
At Thu, 22 Oct 2020 22:31:41 -0300, Alvaro Herrera <[hidden email]> wrote in

> On 2020-Oct-22, Ashutosh Bapat wrote:
>
> > On Thu, 22 Oct 2020 at 14:46, Kyotaro Horiguchi <[hidden email]>
> > wrote:
>
> > > pg_send_logicalrep_msg_type() looks somewhat too-much.  If we need
> > > something like that we shouldn't do this refactoring, I think.
> >
> > Enum is an integer, and we want to send byte. The function asserts that the
> > enum fits a byte. If there's a way to declare byte long enums I would use
> > that. But I didn't find a way to do that.
>
> I didn't look at the code, but maybe it's sufficient to add a
> StaticAssert?

That check needs to visit all symbols in a enum and confirm that each
of them is in a certain range.

I thought of StaticAssert, but it cannot run a code and I don't know
of a syntax that loops through all symbols in a enumeration so I think
we needs to write a static assertion on every symbol in the
enumeration, which seems to be a kind of stupid.

enum hoge
{
  a = '1',
  b = '2',
  c = '3'
};

StaticAssertDecl((unsigned int)(a | b | c ...) <= 0xff, "too large symbol value");

I didn't come up with a way to apply static assertion on each symbol
definition line.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


Reply | Threaded
Open this post in threaded view
|

Re: Enumize logical replication message actions

Peter Smith
On Fri, Oct 23, 2020 at 5:20 PM Kyotaro Horiguchi
<[hidden email]> wrote:

>
> At Thu, 22 Oct 2020 22:31:41 -0300, Alvaro Herrera <[hidden email]> wrote in
> > On 2020-Oct-22, Ashutosh Bapat wrote:
> >
> > > On Thu, 22 Oct 2020 at 14:46, Kyotaro Horiguchi <[hidden email]>
> > > wrote:
> >
> > > > pg_send_logicalrep_msg_type() looks somewhat too-much.  If we need
> > > > something like that we shouldn't do this refactoring, I think.
> > >
> > > Enum is an integer, and we want to send byte. The function asserts that the
> > > enum fits a byte. If there's a way to declare byte long enums I would use
> > > that. But I didn't find a way to do that.

The pq_send_logicalrep_msg_type() function seemed a bit overkill to me.

The comment in the LogicalRepMsgType enum will sufficiently ensure
nobody is going to accidentally add any bad replication message codes.
And it's not like these are going to be changed often.

Why not simply downcast your enums when calling pq_sendbyte?
There are only a few of them.

e.g. pq_sendbyte(out, (uint8)LOGICAL_REP_MSG_STREAM_COMMIT);

Kind Regards.
Peter Smith
Fujitsu Australia.


Reply | Threaded
Open this post in threaded view
|

Re: Enumize logical replication message actions

Kyotaro Horiguchi-4
At Fri, 23 Oct 2020 19:53:00 +1100, Peter Smith <[hidden email]> wrote in

> On Fri, Oct 23, 2020 at 5:20 PM Kyotaro Horiguchi
> <[hidden email]> wrote:
> >
> > At Thu, 22 Oct 2020 22:31:41 -0300, Alvaro Herrera <[hidden email]> wrote in
> > > On 2020-Oct-22, Ashutosh Bapat wrote:
> > >
> > > > On Thu, 22 Oct 2020 at 14:46, Kyotaro Horiguchi <[hidden email]>
> > > > wrote:
> > >
> > > > > pg_send_logicalrep_msg_type() looks somewhat too-much.  If we need
> > > > > something like that we shouldn't do this refactoring, I think.
> > > >
> > > > Enum is an integer, and we want to send byte. The function asserts that the
> > > > enum fits a byte. If there's a way to declare byte long enums I would use
> > > > that. But I didn't find a way to do that.
>
> The pq_send_logicalrep_msg_type() function seemed a bit overkill to me.

Ah, yes, it is what I meant. I didn't come up with the word "overkill".

> The comment in the LogicalRepMsgType enum will sufficiently ensure
> nobody is going to accidentally add any bad replication message codes.
> And it's not like these are going to be changed often.

Agreed.

> Why not simply downcast your enums when calling pq_sendbyte?
> There are only a few of them.
>
> e.g. pq_sendbyte(out, (uint8)LOGICAL_REP_MSG_STREAM_COMMIT);

If you are worried about compiler warning, that explicit cast is not
required. Even if the symbol is larger than 0xff, the upper bytes are
silently truncated off.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


Reply | Threaded
Open this post in threaded view
|

Re: Enumize logical replication message actions

Ashutosh Bapat-3
In reply to this post by Kyotaro Horiguchi-4


On Fri, 23 Oct 2020 at 06:50, Kyotaro Horiguchi <[hidden email]> wrote:

Those two switch()es are apparently redundant. That code is exactly
equivalent to:

apply_dispatch(s)
{
  LogicalRepMsgType msgtype = pq_getmsgtype(s);

  switch (msgtype)
  {
     case LOGICAL_REP_MSG_BEGIN:
        apply_handle_begin();
!       return;
     ...
     case LOGICAL_REP_MSG_STREAM_COMMIT:
        apply_handle_begin();
!       return;
  }

  ereport(ERROR, (errmsg("invalid logical replication message type"..
}     

which is smaller and fast.

Good idea. Implemented in the latest patch posted with the next mail. 

--
Best Wishes,
Ashutosh
Reply | Threaded
Open this post in threaded view
|

Re: Enumize logical replication message actions

Ashutosh Bapat-3
In reply to this post by Kyotaro Horiguchi-4


On Fri, 23 Oct 2020 at 17:02, Kyotaro Horiguchi <[hidden email]> wrote:
At Fri, 23 Oct 2020 19:53:00 +1100, Peter Smith <[hidden email]> wrote in
> On Fri, Oct 23, 2020 at 5:20 PM Kyotaro Horiguchi
> <[hidden email]> wrote:
> >
> > At Thu, 22 Oct 2020 22:31:41 -0300, Alvaro Herrera <[hidden email]> wrote in
> > > On 2020-Oct-22, Ashutosh Bapat wrote:
> > >
> > > > On Thu, 22 Oct 2020 at 14:46, Kyotaro Horiguchi <[hidden email]>
> > > > wrote:
> > >
> > > > > pg_send_logicalrep_msg_type() looks somewhat too-much.  If we need
> > > > > something like that we shouldn't do this refactoring, I think.
> > > >
> > > > Enum is an integer, and we want to send byte. The function asserts that the
> > > > enum fits a byte. If there's a way to declare byte long enums I would use
> > > > that. But I didn't find a way to do that.
>
> The pq_send_logicalrep_msg_type() function seemed a bit overkill to me.

Ah, yes, it is what I meant. I didn't come up with the word "overkill".

> The comment in the LogicalRepMsgType enum will sufficiently ensure
> nobody is going to accidentally add any bad replication message codes.
> And it's not like these are going to be changed often.

Agreed.

> Why not simply downcast your enums when calling pq_sendbyte?
> There are only a few of them.
>
> e.g. pq_sendbyte(out, (uint8)LOGICAL_REP_MSG_STREAM_COMMIT);

If you are worried about compiler warning, that explicit cast is not
required. Even if the symbol is larger than 0xff, the upper bytes are
silently truncated off.


I agree with Peter that the prologue of  LogicalRepMsgType is enough.

I also agree with Kyotaro, that explicit cast is unnecessary.

All this together makes the second patch useless. Removed it. Instead used Kyotaro's idea in previous mail.

PFA updated patch.

--
Best Wishes,
Ashutosh

0001-Enumize-top-level-logical-replication-actions.v2.patch (11K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Enumize logical replication message actions

akapila
In reply to this post by Kyotaro Horiguchi-4
On Fri, Oct 23, 2020 at 11:50 AM Kyotaro Horiguchi
<[hidden email]> wrote:

>
> At Thu, 22 Oct 2020 22:31:41 -0300, Alvaro Herrera <[hidden email]> wrote in
> > On 2020-Oct-22, Ashutosh Bapat wrote:
> >
> > > On Thu, 22 Oct 2020 at 14:46, Kyotaro Horiguchi <[hidden email]>
> > > wrote:
> >
> > > > pg_send_logicalrep_msg_type() looks somewhat too-much.  If we need
> > > > something like that we shouldn't do this refactoring, I think.
> > >
> > > Enum is an integer, and we want to send byte. The function asserts that the
> > > enum fits a byte. If there's a way to declare byte long enums I would use
> > > that. But I didn't find a way to do that.
> >
> > I didn't look at the code, but maybe it's sufficient to add a
> > StaticAssert?
>
> That check needs to visit all symbols in a enum and confirm that each
> of them is in a certain range.
>

Can we define something like LOGICAL_REP_MSG_LAST (also add a comment
indicating this is a fake message and must be the last one) as the
last and just check that?

--
With Regards,
Amit Kapila.


Reply | Threaded
Open this post in threaded view
|

Re: Enumize logical replication message actions

Ashutosh Bapat-3


On Fri, 23 Oct 2020 at 18:23, Amit Kapila <[hidden email]> wrote:
On Fri, Oct 23, 2020 at 11:50 AM Kyotaro Horiguchi
<[hidden email]> wrote:
>
> At Thu, 22 Oct 2020 22:31:41 -0300, Alvaro Herrera <[hidden email]> wrote in
> > On 2020-Oct-22, Ashutosh Bapat wrote:
> >
> > > On Thu, 22 Oct 2020 at 14:46, Kyotaro Horiguchi <[hidden email]>
> > > wrote:
> >
> > > > pg_send_logicalrep_msg_type() looks somewhat too-much.  If we need
> > > > something like that we shouldn't do this refactoring, I think.
> > >
> > > Enum is an integer, and we want to send byte. The function asserts that the
> > > enum fits a byte. If there's a way to declare byte long enums I would use
> > > that. But I didn't find a way to do that.
> >
> > I didn't look at the code, but maybe it's sufficient to add a
> > StaticAssert?
>
> That check needs to visit all symbols in a enum and confirm that each
> of them is in a certain range.
>

Can we define something like LOGICAL_REP_MSG_LAST (also add a comment
indicating this is a fake message and must be the last one) as the
last and just check that?


I don't think that's required once I applied suggestions from Kyotaro and Peter. Please check the latest patch. 
Usually LAST is added to an enum when we need to cap the number of symbols or want to find the number of symbols. None of that is necessary here. Do you see any other use?

--
Best Wishes,
Ashutosh