Let's document a bytea bug

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Let's document a bytea bug

Anna Akenteva
Hello!

Some time ago I've encountered a problem with the bytea type: we can't
SELECT
bytea strings whose textual representation is too big to fit into
StringInfoData.
And as a side effect, pg_dump refuses to dump tables with big bytea
strings.

It's a bug, it's pretty confusing, but it seems like there's no pretty
way
to fix it so far. Here's a link to a recent discussion on the issue:
https://www.postgresql.org/message-id/flat/c8bdf802d41ec37003ec3b726db79428@...#c8bdf802d41ec37003ec3b726db79428@...

Since it won't be fixed anytime soon, I thought it could be worth
documenting.
Attaching a patch for the documentation: I added some text to the
"Binary Data Types"
part where I tried to describe the issue and to explain how to deal with
it.

My patch in plain text (for convenience):

It is not recommended to use bytea strings whose textual representation
exceeds 1GB, as it may not be possible to SELECT them due to output size
limitations. Consequently, a table containing such big strings cannot be
properly processed by pg_dump, as pg_dump will try to SELECT these
values from the
table and fail. The exact size limit advised for bytea strings depends
on their
content, the external format and encoding that you are using, the
context in
which they will be selected. The general rule is that when you use
SELECT,
the returned tuple should not exceed 1GB. Although even if SELECT does
not
work, you can still retrieve big bytea strings using COPY in binary
format.

--
Anna Akenteva
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

long_bytea_doc_b06d8e5.patch (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Let's document a bytea bug

Andrey M. Borodin
Hi Anna!

> 23 мая 2018 г., в 20:33, Anna Akenteva <[hidden email]> написал(а):
>
>
> Some time ago I've encountered a problem with the bytea type: we can't SELECT
> bytea strings whose textual representation is too big to fit into StringInfoData.
> And as a side effect, pg_dump refuses to dump tables with big bytea strings.
>
> It's a bug, it's pretty confusing, but it seems like there's no pretty way
> to fix it so far. Here's a link to a recent discussion on the issue:
> https://www.postgresql.org/message-id/flat/c8bdf802d41ec37003ec3b726db79428@...#c8bdf802d41ec37003ec3b726db79428@...
>
> Since it won't be fixed anytime soon, I thought it could be worth documenting.
> Attaching a patch for the documentation: I added some text to the "Binary Data Types"
> part where I tried to describe the issue and to explain how to deal with it.
>
> My patch in plain text (for convenience):
>
> It is not recommended to use bytea strings whose textual representation
> exceeds 1GB, as it may not be possible to SELECT them due to output size
> limitations. Consequently, a table containing such big strings cannot be
> properly processed by pg_dump, as pg_dump will try to SELECT these values from the
> table and fail. The exact size limit advised for bytea strings depends on their
> content, the external format and encoding that you are using, the context in
> which they will be selected. The general rule is that when you use SELECT,
> the returned tuple should not exceed 1GB. Although even if SELECT does not
> work, you can still retrieve big bytea strings using COPY in binary format.

Thanks for this message. It took me a while to find out what was the problem.
+1 for documenting this, maybe even with exact error like
[ 2020-07-30 01:20:32.248 MSK pg_dump - 10.3.3.30,XX000 ]:ERROR:  invalid memory alloc request size 1472599557
It's really really scary. My first feeling was that it's TOAST corruption.

Best regards, Andrey Borodin.

Reply | Threaded
Open this post in threaded view
|

Re: Let's document a bytea bug

Bruce Momjian
On Fri, Jul 31, 2020 at 10:13:48AM +0500, Andrey M. Borodin wrote:

> Hi Anna!
>
> > 23 мая 2018 г., в 20:33, Anna Akenteva <[hidden email]> написал(а):
> >
> >
> > Some time ago I've encountered a problem with the bytea type: we can't SELECT
> > bytea strings whose textual representation is too big to fit into StringInfoData.
> > And as a side effect, pg_dump refuses to dump tables with big bytea strings.
> >
> > It's a bug, it's pretty confusing, but it seems like there's no pretty way
> > to fix it so far. Here's a link to a recent discussion on the issue:
> > https://www.postgresql.org/message-id/flat/c8bdf802d41ec37003ec3b726db79428@...#c8bdf802d41ec37003ec3b726db79428@...
> >
> > Since it won't be fixed anytime soon, I thought it could be worth documenting.
> > Attaching a patch for the documentation: I added some text to the "Binary Data Types"
> > part where I tried to describe the issue and to explain how to deal with it.
> >
> > My patch in plain text (for convenience):
> >
> > It is not recommended to use bytea strings whose textual representation
> > exceeds 1GB, as it may not be possible to SELECT them due to output size
> > limitations. Consequently, a table containing such big strings cannot be
> > properly processed by pg_dump, as pg_dump will try to SELECT these values from the
> > table and fail. The exact size limit advised for bytea strings depends on their
> > content, the external format and encoding that you are using, the context in
> > which they will be selected. The general rule is that when you use SELECT,
> > the returned tuple should not exceed 1GB. Although even if SELECT does not
> > work, you can still retrieve big bytea strings using COPY in binary format.
>
> Thanks for this message. It took me a while to find out what was the problem.
> +1 for documenting this, maybe even with exact error like
> [ 2020-07-30 01:20:32.248 MSK pg_dump - 10.3.3.30,XX000 ]:ERROR:  invalid memory alloc request size 1472599557
> It's really really scary. My first feeling was that it's TOAST corruption.
I still have Anna Akenteva's 2018 email in my mailbox because I wanted
to research this further.  Now that you have re-reported the problem, I
am on it!  ;-)

I looked for a clearer reproducible test case, and I have found this
one:

        $ awk 'BEGIN { printf "\\\\x"; for (i=0; i < 30000000; i++) \
                printf "7f7f7f7f7f7f7f7f7f7f"; print}' > /tmp/x
       
        $ psql test
       
        CREATE TABLE test(x BYTEA);
       
        COPY test FROM '/tmp/x';
       
        COPY test FROM '/tmp/x';
       
        CREATE TABLE big_data AS SELECT (string_agg(x,'')) AS x FROM test;
       
        SELECT length(x) FROM big_data;
          length
        -----------
         600000000
       
        SELECT octet_length(x) FROM big_data;
         octet_length
        --------------
            600000000

        SELECT x FROM big_data;
        ERROR:  invalid memory alloc request size 1200000003

        \q

        $ pg_dump -d test > /rtmp/a
        pg_dump: Dumping the contents of table "big_data" failed: PQgetResult() failed.
        pg_dump: Error message from server: ERROR:  invalid memory alloc request size 1200000003
        pg_dump: The command was: COPY public.big_data (x) TO stdout;

So, the character and octet length is 600 million, but on output, that
will be expanded, and both SELECT and pg_dump fail.  I also can't see
how to improve the error message since it happens so low in the stack.

In reading the previous posts, I understand that expanding the length
limit would be very hard, and it is difficult to see how to know if the
output will error at input time.  I am in favor of the 2018 proposed
patch, attached, perhaps with some small adjustments.

--
  Bruce Momjian  <[hidden email]>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee


long_bytea_doc_b06d8e5.patch (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Let's document a bytea bug

Joe Conway
On 8/3/20 4:20 PM, Bruce Momjian wrote:

> On Fri, Jul 31, 2020 at 10:13:48AM +0500, Andrey M. Borodin wrote:
>> Hi Anna!
>>
>> > 23 мая 2018 г., в 20:33, Anna Akenteva <[hidden email]> написал(а):
>> >
>> >
>> > Some time ago I've encountered a problem with the bytea type: we can't SELECT
>> > bytea strings whose textual representation is too big to fit into StringInfoData.
>> > And as a side effect, pg_dump refuses to dump tables with big bytea strings.
>> >
>> > It's a bug, it's pretty confusing, but it seems like there's no pretty way
>> > to fix it so far. Here's a link to a recent discussion on the issue:
>> > https://www.postgresql.org/message-id/flat/c8bdf802d41ec37003ec3b726db79428@...#c8bdf802d41ec37003ec3b726db79428@...
>> >
>> > Since it won't be fixed anytime soon, I thought it could be worth documenting.
>> > Attaching a patch for the documentation: I added some text to the "Binary Data Types"
>> > part where I tried to describe the issue and to explain how to deal with it.
>> >
>> > My patch in plain text (for convenience):
>> >
>> > It is not recommended to use bytea strings whose textual representation
>> > exceeds 1GB, as it may not be possible to SELECT them due to output size
>> > limitations. Consequently, a table containing such big strings cannot be
>> > properly processed by pg_dump, as pg_dump will try to SELECT these values from the
>> > table and fail. The exact size limit advised for bytea strings depends on their
>> > content, the external format and encoding that you are using, the context in
>> > which they will be selected. The general rule is that when you use SELECT,
>> > the returned tuple should not exceed 1GB. Although even if SELECT does not
>> > work, you can still retrieve big bytea strings using COPY in binary format.
>>
>> Thanks for this message. It took me a while to find out what was the problem.
>> +1 for documenting this, maybe even with exact error like
>> [ 2020-07-30 01:20:32.248 MSK pg_dump - 10.3.3.30,XX000 ]:ERROR:  invalid memory alloc request size 1472599557
>> It's really really scary. My first feeling was that it's TOAST corruption.
>
> I still have Anna Akenteva's 2018 email in my mailbox because I wanted
> to research this further.  Now that you have re-reported the problem, I
> am on it!  ;-)
>
> I looked for a clearer reproducible test case, and I have found this
> one:
It is easier to reproduce than that:

select repeat('x',600000000)::bytea;
ERROR:  invalid memory alloc request size 1200000003

select octet_length(repeat('x',600000000)::bytea);
 octet_length
--------------
    600000000
(1 row)

CREATE TABLE big_data AS
 select repeat('x',600000000)::bytea;
SELECT 1

SELECT repeat FROM big_data;
ERROR:  invalid memory alloc request size 1200000003

\q

pg_dump -d postgres > /tmp/a
pg_dump: error: Dumping the contents of table "big_data" failed: PQgetResult()
failed.
pg_dump: error: Error message from server: ERROR:  invalid memory alloc request
size 1200000003
pg_dump: error: The command was: COPY public.big_data (repeat) TO stdout;


There are other cases that also hit the StringInfo limits (i.e. MaxAllocSize)
with TEXT data type (and I guess others, like JSON or XML, too):

select repeat('x',600000000), repeat('y',600000000);
ERROR:  out of memory
DETAIL:  Cannot enlarge string buffer containing 600000010 bytes by 600000000
more bytes.


Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Let's document a bytea bug

Bruce Momjian
On Mon, Aug  3, 2020 at 04:44:38PM -0400, Joe Conway wrote:

> It is easier to reproduce than that:
>
> select repeat('x',600000000)::bytea;
> ERROR:  invalid memory alloc request size 1200000003
>
> select octet_length(repeat('x',600000000)::bytea);
>  octet_length
> --------------
>     600000000
> (1 row)
>
> CREATE TABLE big_data AS
>  select repeat('x',600000000)::bytea;
> SELECT 1

Oh, nice!  I played with repeat(), but couldn't get it to load, then
fail.  I didn't do the cast to bytea, which is the trick.

--
  Bruce Momjian  <[hidden email]>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee



Reply | Threaded
Open this post in threaded view
|

Re: Let's document a bytea bug

Alvaro Herrera-9
In reply to this post by Bruce Momjian
On 2020-Aug-03, Bruce Momjian wrote:

> So, the character and octet length is 600 million, but on output, that
> will be expanded, and both SELECT and pg_dump fail.  I also can't see
> how to improve the error message since it happens so low in the stack.

I think you'll find commits fa2fa9955280 and b66adb7b0c83 amusing.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Let's document a bytea bug

Bruce Momjian
On Mon, Aug  3, 2020 at 05:39:47PM -0400, Alvaro Herrera wrote:
> On 2020-Aug-03, Bruce Momjian wrote:
>
> > So, the character and octet length is 600 million, but on output, that
> > will be expanded, and both SELECT and pg_dump fail.  I also can't see
> > how to improve the error message since it happens so low in the stack.
>
> I think you'll find commits fa2fa9955280 and b66adb7b0c83 amusing.

Yeah, I saw those, but they were dealing with tuple length limits, but I
also saw they were reverted, so this is a hard problem.

--
  Bruce Momjian  <[hidden email]>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee