Is there a reason why pgoutput sends data in text format? Seems to me that sending data in binary would provide a considerable performance improvement.
Dave Cramer |
On Mon, Jun 03, 2019 at 10:49:54AM -0400, Dave Cramer wrote:
> Is there a reason why pgoutput sends data in text format? Seems to > me that sending data in binary would provide a considerable > performance improvement. Are you seeing something that suggests that the text output is taking a lot of time or other resources? Best, David. -- David Fetter <david(at)fetter(dot)org> http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate |
Dave Cramer On Mon, 3 Jun 2019 at 20:54, David Fetter <[hidden email]> wrote: On Mon, Jun 03, 2019 at 10:49:54AM -0400, Dave Cramer wrote: Actually it's on the other end that there is improvement. Parsing text takes much longer for almost everything except ironically text. To be more transparent there is some desire to use pgoutput for something other than logical replication. Change Data Capture clients such as Debezium have a requirement for a stable plugin which is shipped with core as this is always available in cloud providers offerings. There's no reason that I am aware of that they cannot use pgoutput for this. There's also no reason that I am aware that binary outputs can't be supported. The protocol would have to change slightly and I am working on a POC patch. Thing is they aren't all written in C so using binary does provide a pretty substantial win on the decoding end. Dave |
Hi,
On 2019-06-04 15:47:04 -0400, Dave Cramer wrote: > On Mon, 3 Jun 2019 at 20:54, David Fetter <[hidden email]> wrote: > > > On Mon, Jun 03, 2019 at 10:49:54AM -0400, Dave Cramer wrote: > > > Is there a reason why pgoutput sends data in text format? Seems to > > > me that sending data in binary would provide a considerable > > > performance improvement. > > > > Are you seeing something that suggests that the text output is taking > > a lot of time or other resources? > > > > Actually it's on the other end that there is improvement. Parsing text > takes much longer for almost everything except ironically text. It's on both sides, I'd say. E.g. float (until v12), timestamp, bytea are all much more expensive to convert from binary to text. > To be more transparent there is some desire to use pgoutput for something > other than logical replication. Change Data Capture clients such as > Debezium have a requirement for a stable plugin which is shipped with core > as this is always available in cloud providers offerings. There's no reason > that I am aware of that they cannot use pgoutput for this. Except that that's not pgoutput's purpose, and we shouldn't make it meaningfully more complicated or slower to achieve this. Don't think there's a conflict in this case though. > There's also no reason that I am aware that binary outputs can't be > supported. Well, it *does* increase version dependencies, and does make replication more complicated, because type oids etc cannot be relied to be the same on source and target side. > The protocol would have to change slightly and I am working > on a POC patch. Hm, what would have to be changed protocol wise? IIRC that'd just be a different datum type? Or is that what you mean? pq_sendbyte(out, 't'); /* 'text' data follows */ IIRC there was code for the binary protocol in a predecessor of pgoutput. I think if we were to add binary output - and I think we should - we ought to only accept a patch if it's also used in core. Greetings, Andres Freund |
In reply to this post by Dave Cramer-4
Dave Cramer On Tue, 4 Jun 2019 at 16:30, Andres Freund <[hidden email]> wrote: Hi, agreed, my intent was to slightly bend it to my will :)
I was about to agree with this but if the type oids change from source to target you still can't decode the text version properly. Unless I mis-understand something here ?
I haven't really thought this through completely but one place JDBC has problems with binary is with timestamps with timezone as we don't know which timezone to use. Is it safe to assume everything is in UTC since the server stores in UTC ? Then there are UDF's. My original thought was to use options to send in the types that I wanted in binary, everything else could be sent as text. IIRC there was code for the binary protocol in a predecessor of Hmmm that might be good place to start. I will do some digging through git history
Certainly; as not doing so would make my work completely irrelevant for my purpose. Thanks, Dave |
Hi,
On 2019-06-04 16:39:32 -0400, Dave Cramer wrote: > On Tue, 4 Jun 2019 at 16:30, Andres Freund <[hidden email]> > wrote: > > > There's also no reason that I am aware that binary outputs can't be > > > supported. > > > > Well, it *does* increase version dependencies, and does make replication > > more complicated, because type oids etc cannot be relied to be the same > > on source and target side. > > > I was about to agree with this but if the type oids change from source > to target you still can't decode the text version properly. Unless I > mis-understand something here ? The text format doesn't care about oids. I don't see how it'd be a problem? Note that some people *intentionally* use different types from source to target system when logically replicating. So you can't rely on the target table's types under any circumstance. I think you really have to use the textual type which we already write out (cf logicalrep_write_typ()) to call the binary input functions. And you can send only data as binary that's from builtin types - otherwise there's no guarantee at all that the target system has something compatible. And even if you just assumed that all extensions etc are present, you can't transport arrays / composite types in binary: For hard to discern reasons we a) embed type oids in them b) verify them. b) won't ever work for non-builtin types, because oids are assigned dynamically. > > I think if we were to add binary output - and I think we should - we > > ought to only accept a patch if it's also used in core. > > > > Certainly; as not doing so would make my work completely irrelevant for my > purpose. What I mean is that the builtin logical replication would have to use this on the receiving side too. Greetings, Andres Freund |
On Tue, 4 Jun 2019 at 16:46, Andres Freund <[hidden email]> wrote: Hi, I figured arrays and UDT's would be problematic.
Got it, thanks for validating that the idea isn't nuts. Now I *have* to produce a POC. Thanks, Dave |
In reply to this post by Dave Cramer-4
On 6/4/19 4:39 PM, Dave Cramer wrote:
> I haven't really thought this through completely but one place JDBC has > problems with binary is with > timestamps with timezone as we don't know which timezone to use. Is it safe > to assume everything is in UTC > since the server stores in UTC ? PL/Java, when converting to the Java 8 java.time types (because those are sane), will turn a timestamp with timezone into an OffsetDateTime with explicit offset zero (UTC), no matter what timezone may have been used when the value was input (as you've observed, there's no way to recover that). In the return direction, if given an OffsetDateTime with any nonzero offset, it will adjust the value to UTC for postgres. So, yes, say I. Regards, -Chap |
In reply to this post by Dave Cramer-4
On Tue, Jun 04, 2019 at 04:55:33PM -0400, Dave Cramer wrote:
> On Tue, 4 Jun 2019 at 16:46, Andres Freund <[hidden email]> wrote: > > > Hi, > > > > On 2019-06-04 16:39:32 -0400, Dave Cramer wrote: > > > On Tue, 4 Jun 2019 at 16:30, Andres Freund < > > [hidden email]> > > > wrote: > > > > > There's also no reason that I am aware that binary outputs can't be > > > > > supported. > > > > > > > > Well, it *does* increase version dependencies, and does make > > replication > > > > more complicated, because type oids etc cannot be relied to be the same > > > > on source and target side. > > > > > > > I was about to agree with this but if the type oids change from source > > > to target you still can't decode the text version properly. Unless I > > > mis-understand something here ? > > > > The text format doesn't care about oids. I don't see how it'd be a > > problem? Note that some people *intentionally* use different types from > > source to target system when logically replicating. So you can't rely on > > the target table's types under any circumstance. > > > > I think you really have to use the textual type which we already write > > out (cf logicalrep_write_typ()) to call the binary input functions. And > > you can send only data as binary that's from builtin types - otherwise > > there's no guarantee at all that the target system has something > > compatible. And even if you just assumed that all extensions etc are > > present, you can't transport arrays / composite types in binary: For > > hard to discern reasons we a) embed type oids in them b) verify them. b) > > won't ever work for non-builtin types, because oids are assigned > > dynamically. > > > > I figured arrays and UDT's would be problematic. Would it make sense to work toward a binary format that's not architecture-specific? I recall from COPY that our binary format is not standardized across, for example, big- and little-endian machines. Best, David. -- David Fetter <david(at)fetter(dot)org> http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate |
Hi,
On 2019-06-05 00:05:02 +0200, David Fetter wrote: > Would it make sense to work toward a binary format that's not > architecture-specific? I recall from COPY that our binary format is > not standardized across, for example, big- and little-endian machines. I think you recall wrongly. It's obviously possible that we have bugs around this, but output/input routines are supposed to handle a endianess independent format. That usually means that you have to do endianess conversions, but that doesn't make it non-standardized. - Andres |
On Tue, 4 Jun 2019 at 18:08, Andres Freund <[hidden email]> wrote: Hi, Additionally there are a number of drivers that already know how to handle our binary types. I don't really think there's a win here. I also want to keep the changes small . Dave |
In reply to this post by Andres Freund
Hi,
On 05/06/2019 00:08, Andres Freund wrote: > Hi, > > On 2019-06-05 00:05:02 +0200, David Fetter wrote: >> Would it make sense to work toward a binary format that's not >> architecture-specific? I recall from COPY that our binary format is >> not standardized across, for example, big- and little-endian machines. > > I think you recall wrongly. It's obviously possible that we have bugs > around this, but output/input routines are supposed to handle a > endianess independent format. That usually means that you have to do > endianess conversions, but that doesn't make it non-standardized. > Yeah, there are really 3 formats of data we have, text protocol, binary network protocol and internal on disk format. The internal on disk format will not work across big/little-endian but network binary protocol will. FWIW I don't think the code for binary format was included in original logical replication patch (I really tried to keep it as minimal as possible), but the code and protocol is pretty much ready for adding that. That said, pglogical has code which handles this (I guess Andres means that by predecessor of pgoutput) so if you look for example at the write_tuple/read_tuple/decide_datum_transfer in https://github.com/2ndQuadrant/pglogical/blob/REL2_x_STABLE/pglogical_proto_native.c that can help you give some ideas on how to approach this. -- Petr Jelinek 2ndQuadrant - PostgreSQL Solutions https://www.2ndQuadrant.com/ |
Hi, On Wed, 5 Jun 2019 at 07:18, Petr Jelinek <[hidden email]> wrote: Hi, Yes, I looked through the public history and could not find it. Thanks for confirming.
Thanks for the tip! Dave Cramer |
On Wed, 5 Jun 2019 at 07:21, Dave Cramer <[hidden email]> wrote:
Looking at: this seems completely ignored. What was the intent? Dave |
Hi
On June 5, 2019 8:51:10 AM PDT, Dave Cramer <[hidden email]> wrote: >On Wed, 5 Jun 2019 at 07:21, Dave Cramer <[hidden email]> wrote: > >> Hi, >> >> >> On Wed, 5 Jun 2019 at 07:18, Petr Jelinek ><[hidden email]> >> wrote: >> >>> Hi, >>> >>> On 05/06/2019 00:08, Andres Freund wrote: >>> > Hi, >>> > >>> > On 2019-06-05 00:05:02 +0200, David Fetter wrote: >>> >> Would it make sense to work toward a binary format that's not >>> >> architecture-specific? I recall from COPY that our binary format >is >>> >> not standardized across, for example, big- and little-endian >machines. >>> > >>> > I think you recall wrongly. It's obviously possible that we have >bugs >>> > around this, but output/input routines are supposed to handle a >>> > endianess independent format. That usually means that you have to >do >>> > endianess conversions, but that doesn't make it non-standardized. >>> > >>> >>> Yeah, there are really 3 formats of data we have, text protocol, >binary >>> network protocol and internal on disk format. The internal on disk >>> format will not work across big/little-endian but network binary >>> protocol will. >>> >>> FWIW I don't think the code for binary format was included in >original >>> logical replication patch (I really tried to keep it as minimal as >>> possible), but the code and protocol is pretty much ready for adding >that. >>> >> Yes, I looked through the public history and could not find it. >Thanks for >> confirming. >> >>> >>> That said, pglogical has code which handles this (I guess Andres >means >>> that by predecessor of pgoutput) so if you look for example at the >>> write_tuple/read_tuple/decide_datum_transfer in >>> >>> >https://github.com/2ndQuadrant/pglogical/blob/REL2_x_STABLE/pglogical_proto_native.c >>> that can help you give some ideas on how to approach this. >>> >> >> Thanks for the tip! >> > >Looking at: >https://github.com/postgres/postgres/blob/8255c7a5eeba8f1a38b7a431c04909bde4f5e67d/src/backend/replication/pgoutput/pgoutput.c#L163 > >this seems completely ignored. What was the intent? That's about the output of the plugin, not the datatypes. And independent of text/binary output, the protocol contains non-printable chars. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity. |
Hi, On Wed, 5 Jun 2019 at 12:01, Andres Freund <[hidden email]> wrote: Hi So one of the things they would like added is to get not null information in the schema record. This is so they can mark the field Optional in Java. I presume this would also have some uses in other languages. As I understand it this would require a protocol bump. If this were to be accepted are there any outstanding asks that would useful to add if we were going to bump the protocol? Dave |
Hi,
On 2019-06-05 18:47:57 -0400, Dave Cramer wrote: > So one of the things they would like added is to get not null information > in the schema record. This is so they can mark the field Optional in Java. > I presume this would also have some uses in other languages. As I > understand it this would require a protocol bump. If this were to be > accepted are there any outstanding asks that would useful to add if we were > going to bump the protocol? I'm pretty strongly opposed to this. What's the limiting factor when adding such information? I think clients that want something like this ought to query the database for catalog information when getting schema information. Greetings, Andres Freund |
Hi, On Wed, 5 Jun 2019 at 18:50, Andres Freund <[hidden email]> wrote: Hi, I'm not intimately familiar with their code. I will query them more about the ask. I am curious why you are "strongly" opposed however. We already have the information. Adding doesn't seem onerous. Dave |
Hi,
On 2019-06-05 19:05:05 -0400, Dave Cramer wrote: > I am curious why you are "strongly" opposed however. We already have the > information. Adding doesn't seem onerous. (thought I'd already replied with this) The problem is that I don't recognize a limiting principle: If we want NOT NULL information for clients, why don't we include the underlying types for arrays, and the fields in composite types? What about foreign keys? And unique keys? And then we suddenly need tracking for all these, so we don't always send out that information when we previously already did - and in some of the cases there's no infrastructure for that. I just don't quite buy that the output plugin build for pg's logical replication needs is a good place to include a continually increasing amount of metadata that logical replication doesn't need. That's going to add overhead and make the code more complicated. Greetings, Andres Freund |
On 06/07/19 19:27, Andres Freund wrote:
> The problem is that I don't recognize a limiting principle: > > If we want NOT NULL information for clients, why don't we include the > underlying types for arrays, and the fields in composite types? What > about foreign keys? And unique keys? This reminds me of an idea I had for a future fe/be protocol version, right after a talk by Alyssa Ritchie and Henrietta Dombrovskaya at the last 2Q PGConf. [1] It seems they had ended up designing a whole 'nother "protocol level" involving queries wrapping their results as JSON and an app layer that unwraps again, after trying a simpler first approach that was foiled by the inability to see into arrays and anonymous record types in the 'describe' response. I thought, in a new protocol rev, why not let the driver send additional 'describe' messages after the first one, to drill into structure of individual columns mentioned in the first response, before sending the 'execute' message? If it doesn't want the further detail, it doesn't have to ask. > And then we suddenly need tracking for all these, so we don't always > send out that information when we previously already did If it's up to the client driver, it can track what it needs or already has. I haven't looked too deeply into the replication protocol ... it happens under a kind of copy-both, right?, so maybe there's a way for the receiver to send some inquiries back, but maybe in a windowed, full-duplex way where it might have to buffer some incoming messages before getting the response to an inquiry message it sent. Would those be thinkable thoughts for a future protocol rev? Regards, -Chap [1] https://www.2qpgconf.com/schedule/information-exchange-techniques-for-javapostgresql-applications/ |
Free forum by Nabble | Edit this page |