Invalid byte sequence errors on DB restore

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Invalid byte sequence errors on DB restore

Samuel Smith
My current DB backup routine is just to call pg_dump and pipe to gzip.
We recently started to get a failure to restore (which is basically just
using psql -f on the pg_dump file) with the following errors:

invalid byte sequence for encoding "UTF8": 0xa0
  and
invalid byte sequence for encoding "UTF8": 0xd7 0x20


This is on a pg 9.2.24 instance. Any tips to troubleshoot?

Regards,
Samuel Smith


Reply | Threaded
Open this post in threaded view
|

Re: Invalid byte sequence errors on DB restore

Laurenz Albe
On Sun, 2020-03-15 at 23:18 -0500, Samuel Smith wrote:

> My current DB backup routine is just to call pg_dump and pipe to gzip.
> We recently started to get a failure to restore (which is basically just
> using psql -f on the pg_dump file) with the following errors:
>
> invalid byte sequence for encoding "UTF8": 0xa0
>   and
> invalid byte sequence for encoding "UTF8": 0xd7 0x20
>
>
> This is on a pg 9.2.24 instance. Any tips to troubleshoot?

Simple.  Fix the offending string and upgrade.

Failure to enforce correct encoding is a bug in PostgreSQL, and a
number of such bugs have been fixed over the decades, so you might
be happier with a less ancient version.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com



Reply | Threaded
Open this post in threaded view
|

Re: Invalid byte sequence errors on DB restore

Adrian Klaver-4
In reply to this post by Samuel Smith
On 3/15/20 9:18 PM, Samuel Smith wrote:

> My current DB backup routine is just to call pg_dump and pipe to gzip.
> We recently started to get a failure to restore (which is basically just
> using psql -f on the pg_dump file) with the following errors:
>
> invalid byte sequence for encoding "UTF8": 0xa0
>   and
> invalid byte sequence for encoding "UTF8": 0xd7 0x20
>
>
> This is on a pg 9.2.24 instance. Any tips to troubleshoot?

What are the locale and encodings set to for the instance and databases
in it?

>
> Regards,
> Samuel Smith
>
>


--
Adrian Klaver
[hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: Invalid byte sequence errors on DB restore

Samuel Smith
In reply to this post by Laurenz Albe
On 3/16/20 2:49 AM, Laurenz Albe wrote:

> On Sun, 2020-03-15 at 23:18 -0500, Samuel Smith wrote:
>> My current DB backup routine is just to call pg_dump and pipe to gzip.
>> We recently started to get a failure to restore (which is basically just
>> using psql -f on the pg_dump file) with the following errors:
>>
>> invalid byte sequence for encoding "UTF8": 0xa0
>>    and
>> invalid byte sequence for encoding "UTF8": 0xd7 0x20
>>
>>
>> This is on a pg 9.2.24 instance. Any tips to troubleshoot?
>
> Simple.  Fix the offending string and upgrade.
>
> Failure to enforce correct encoding is a bug in PostgreSQL, and a
> number of such bugs have been fixed over the decades, so you might
> be happier with a less ancient version.
>
> Yours,
> Laurenz Albe
>



Yes I thought I had seen some info on bug fixes in newer versions. I
will pass this on to the system admins/stakeholders.

Thanks,
Samuel Smith


Reply | Threaded
Open this post in threaded view
|

Re: Invalid byte sequence errors on DB restore

Samuel Smith
In reply to this post by Adrian Klaver-4
On 3/16/20 9:33 AM, Adrian Klaver wrote:

> On 3/15/20 9:18 PM, Samuel Smith wrote:
>> My current DB backup routine is just to call pg_dump and pipe to gzip.
>> We recently started to get a failure to restore (which is basically
>> just using psql -f on the pg_dump file) with the following errors:
>>
>> invalid byte sequence for encoding "UTF8": 0xa0
>>   and
>> invalid byte sequence for encoding "UTF8": 0xd7 0x20
>>
>>
>> This is on a pg 9.2.24 instance. Any tips to troubleshoot?
>
> What are the locale and encodings set to for the instance and databases
> in it?
>
>>
>> Regards,
>> Samuel Smith
>>
>>
>
>


The server is in UTF8. The file made with pg_dump used 'SQL_ASCII', but
setting it to UTF8 (via SET client_encoding ) did not help either.
Having the pg_dump encoding set to 'latin1' seems to allow the file
created it by it to be loaded via psql -f and everything seems to work.
Is there any bad side to setting the encoding on pg_dump to latin1?


For the record, the problem characters are:
https://www.htmlsymbols.xyz/unicode/U+00D7
and
https://www.htmlsymbols.xyz/unicode/U+00A0

But those characters were in many places and not all were issues. They
only fail depending on the characters that precede it which makes it
complicated.

Thanks,
Samuel Smith



Reply | Threaded
Open this post in threaded view
|

Re: Invalid byte sequence errors on DB restore

Samuel Smith
In reply to this post by Samuel Smith
On 3/15/20 11:18 PM, Samuel Smith wrote:

> My current DB backup routine is just to call pg_dump and pipe to gzip.
> We recently started to get a failure to restore (which is basically just
> using psql -f on the pg_dump file) with the following errors:
>
> invalid byte sequence for encoding "UTF8": 0xa0
>   and
> invalid byte sequence for encoding "UTF8": 0xd7 0x20
>
>
> This is on a pg 9.2.24 instance. Any tips to troubleshoot?
>
> Regards,
> Samuel Smith
>
>

Our issue actually turned out to be that a couple of our production
database had the encoding set to SQL_ASCII while all of our development
servers had UTF-8. This meant in some cases where we would restore a
production database into development (for testing or bug hunting), there
would be a failure to parse the backup file. A similar issue to this
blog post:
https://www.endpoint.com/blog/2017/07/21/postgres-migrating-sqlascii-to-utf-8

So our fix was to dump the affected production databases using the
LATIN1 encoding for pg_dump followed by destroying and recreating the
database and setting its new encoding to UTF-8. Then we could restore
the data using the pg_dump file with LATIN1 encoding.

Regards,
Samuel Smith