pgagent unicode support

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

pgagent unicode support

Sergey Burladyan
Currently pgagent doesn't handle unicode correctly.

CharToWString function corrupt multibyte characters because it processes
string one byte at a time:
 148         std::string s = std::string(cstr);
 149         std::wstring wsTmp(s.begin(), s.end());

WStringToChar function does not take into account that there can be
_multi_byte character on wcstombs output and create buffer with
size = wcslen:
 157         int wstr_length = wcslen(wchar_str);
 158         char *dst = new char[wstr_length + 10];

Also pgagent do not setup locale with setlocale(), without it all
wcs/mbs functions cannot handle multibyte strings.

For example:

=== step code ===
select 'это проверка кириллицы в теле запроса pgagent'
=================

=== postgres log ===
2021-02-05 23:19:05 UTC [15600-1] postgres@postgres ERROR:  unterminated quoted string at or near "'" at character 8
2021-02-05 23:19:05 UTC [15600-2] postgres@postgres STATEMENT:  select '
====================

Please see attached patch.
I only test it on GNU/Linux and can't test it on Windows, sorry.

--
Sergey Burladyan


Fix-multibyte-strings-handling.patch (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pgagent unicode support

Dave Page-7
Hi

On Sat, Feb 6, 2021 at 5:00 AM Sergey Burladyan <[hidden email]> wrote:
Currently pgagent doesn't handle unicode correctly.

CharToWString function corrupt multibyte characters because it processes
string one byte at a time:
 148         std::string s = std::string(cstr);
 149         std::wstring wsTmp(s.begin(), s.end());

WStringToChar function does not take into account that there can be
_multi_byte character on wcstombs output and create buffer with
size = wcslen:
 157         int wstr_length = wcslen(wchar_str);
 158         char *dst = new char[wstr_length + 10];

Also pgagent do not setup locale with setlocale(), without it all
wcs/mbs functions cannot handle multibyte strings.

For example:

=== step code ===
select 'это проверка кириллицы в теле запроса pgagent'
=================

=== postgres log ===
2021-02-05 23:19:05 UTC [15600-1] postgres@postgres ERROR:  unterminated quoted string at or near "'" at character 8
2021-02-05 23:19:05 UTC [15600-2] postgres@postgres STATEMENT:  select '
====================

Please see attached patch.
I only test it on GNU/Linux and can't test it on Windows, sorry.

Thanks for the patch! Neel/Ashesh; can you take a look please? It looks OK to me, but then I'm not overly familiar with multibyte string handling. What, if anything, needs to be done on Windows?
 

--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EDB: http://www.enterprisedb.com

Reply | Threaded
Open this post in threaded view
|

Re: pgagent unicode support

Neel Patel
Thanks Sergey for the patch.

Sure Dave. 
There is some compilation warning in linux, I will fix those and test pgAgent in windows and update the thread.

On Mon, Feb 8, 2021 at 2:55 PM Dave Page <[hidden email]> wrote:
Hi

On Sat, Feb 6, 2021 at 5:00 AM Sergey Burladyan <[hidden email]> wrote:
Currently pgagent doesn't handle unicode correctly.

CharToWString function corrupt multibyte characters because it processes
string one byte at a time:
 148         std::string s = std::string(cstr);
 149         std::wstring wsTmp(s.begin(), s.end());

WStringToChar function does not take into account that there can be
_multi_byte character on wcstombs output and create buffer with
size = wcslen:
 157         int wstr_length = wcslen(wchar_str);
 158         char *dst = new char[wstr_length + 10];

Also pgagent do not setup locale with setlocale(), without it all
wcs/mbs functions cannot handle multibyte strings.

For example:

=== step code ===
select 'это проверка кириллицы в теле запроса pgagent'
=================

=== postgres log ===
2021-02-05 23:19:05 UTC [15600-1] postgres@postgres ERROR:  unterminated quoted string at or near "'" at character 8
2021-02-05 23:19:05 UTC [15600-2] postgres@postgres STATEMENT:  select '
====================

Please see attached patch.
I only test it on GNU/Linux and can't test it on Windows, sorry.

Thanks for the patch! Neel/Ashesh; can you take a look please? It looks OK to me, but then I'm not overly familiar with multibyte string handling. What, if anything, needs to be done on Windows?
 

--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EDB: http://www.enterprisedb.com

Reply | Threaded
Open this post in threaded view
|

Re: pgagent unicode support

Neel Patel
Hi Sergey,

Thank you for the patch. It looks good to me except below.

We have modified the patch as we fixed the memory leak ( review comment given by Ashesh ) and also fixed the compilation warnings.
Can you please review and let us know ?

Thanks,
Neel Patel

On Mon, Feb 15, 2021 at 6:15 PM Neel Patel <[hidden email]> wrote:
Thanks Sergey for the patch.

Sure Dave. 
There is some compilation warning in linux, I will fix those and test pgAgent in windows and update the thread.

On Mon, Feb 8, 2021 at 2:55 PM Dave Page <[hidden email]> wrote:
Hi

On Sat, Feb 6, 2021 at 5:00 AM Sergey Burladyan <[hidden email]> wrote:
Currently pgagent doesn't handle unicode correctly.

CharToWString function corrupt multibyte characters because it processes
string one byte at a time:
 148         std::string s = std::string(cstr);
 149         std::wstring wsTmp(s.begin(), s.end());

WStringToChar function does not take into account that there can be
_multi_byte character on wcstombs output and create buffer with
size = wcslen:
 157         int wstr_length = wcslen(wchar_str);
 158         char *dst = new char[wstr_length + 10];

Also pgagent do not setup locale with setlocale(), without it all
wcs/mbs functions cannot handle multibyte strings.

For example:

=== step code ===
select 'это проверка кириллицы в теле запроса pgagent'
=================

=== postgres log ===
2021-02-05 23:19:05 UTC [15600-1] postgres@postgres ERROR:  unterminated quoted string at or near "'" at character 8
2021-02-05 23:19:05 UTC [15600-2] postgres@postgres STATEMENT:  select '
====================

Please see attached patch.
I only test it on GNU/Linux and can't test it on Windows, sorry.

Thanks for the patch! Neel/Ashesh; can you take a look please? It looks OK to me, but then I'm not overly familiar with multibyte string handling. What, if anything, needs to be done on Windows?
 

--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EDB: http://www.enterprisedb.com


pgagent_unicode.patch (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pgagent unicode support

Neel Patel
Hi Sergey,

Do you have any comments for this updated patch ? Let us know ASAP so that we can commit it.

Thanks,
Neel Patel

On Thu, Feb 18, 2021 at 5:12 PM Neel Patel <[hidden email]> wrote:
Hi Sergey,

Thank you for the patch. It looks good to me except below.

We have modified the patch as we fixed the memory leak ( review comment given by Ashesh ) and also fixed the compilation warnings.
Can you please review and let us know ?

Thanks,
Neel Patel

On Mon, Feb 15, 2021 at 6:15 PM Neel Patel <[hidden email]> wrote:
Thanks Sergey for the patch.

Sure Dave. 
There is some compilation warning in linux, I will fix those and test pgAgent in windows and update the thread.

On Mon, Feb 8, 2021 at 2:55 PM Dave Page <[hidden email]> wrote:
Hi

On Sat, Feb 6, 2021 at 5:00 AM Sergey Burladyan <[hidden email]> wrote:
Currently pgagent doesn't handle unicode correctly.

CharToWString function corrupt multibyte characters because it processes
string one byte at a time:
 148         std::string s = std::string(cstr);
 149         std::wstring wsTmp(s.begin(), s.end());

WStringToChar function does not take into account that there can be
_multi_byte character on wcstombs output and create buffer with
size = wcslen:
 157         int wstr_length = wcslen(wchar_str);
 158         char *dst = new char[wstr_length + 10];

Also pgagent do not setup locale with setlocale(), without it all
wcs/mbs functions cannot handle multibyte strings.

For example:

=== step code ===
select 'это проверка кириллицы в теле запроса pgagent'
=================

=== postgres log ===
2021-02-05 23:19:05 UTC [15600-1] postgres@postgres ERROR:  unterminated quoted string at or near "'" at character 8
2021-02-05 23:19:05 UTC [15600-2] postgres@postgres STATEMENT:  select '
====================

Please see attached patch.
I only test it on GNU/Linux and can't test it on Windows, sorry.

Thanks for the patch! Neel/Ashesh; can you take a look please? It looks OK to me, but then I'm not overly familiar with multibyte string handling. What, if anything, needs to be done on Windows?
 

--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EDB: http://www.enterprisedb.com

Reply | Threaded
Open this post in threaded view
|

Re: pgagent unicode support

Sergey Burladyan
Neel Patel <[hidden email]> writes:

> Do you have any comments for this updated patch ? Let us know ASAP so that
> we can commit it.

Sorry about the delay.

It looks better than my patch, thanks!

There is only two comments, about wcstombs_s and about setlocale.

About win32 wcstombs_s:
+        wcstombs_s(&charsConverted, mbs, mb_len + 10, wchar_str, mb_len + 1);

https://docs.microsoft.com/ru-ru/cpp/c-runtime-library/reference/wcstombs-s-wcstombs-s-l?view=msvc-160
>> sizeInBytes
>> The size in bytes of the mbstr buffer.
>> count
>> The maximum number of bytes to store in the mbstr buffer, not including
>> the terminating null character

Maybe it should look something like this:
+        wcstombs_s(&charsConverted, mbs, mb_len + 1, wchar_str, mb_len);

About setlocale, I think you missed it in your patch:
diff --git a/unix.cpp b/unix.cpp
index 9a41e38..d4b0d3d 100644
--- a/unix.cpp
+++ b/unix.cpp
@@ -155,6 +155,8 @@ static void daemonize(void)
 
 int main(int argc, char **argv)
 {
+ setlocale(LC_ALL, "");
+
  std::wstring executable;
  executable.assign(CharToWString(argv[0]));

--
Sergey Burladyan


Reply | Threaded
Open this post in threaded view
|

Re: pgagent unicode support

Sergey Burladyan
Sergey Burladyan <[hidden email]> writes:

> Maybe it should look something like this:
> +        wcstombs_s(&charsConverted, mbs, mb_len + 1, wchar_str, mb_len);

Ah, and we missed check for error.

Something like this, maybe:
+#ifdef __WIN32__
+        size_t charsConverted = 0;
+        if (wcstombs_s(&charsConverted, mbs, mb_len + 1, wchar_str, mb_len) != 0) {
+                delete [] mbs;
+                return NULL;
+        }
+#else

--
Sergey Burladyan


Reply | Threaded
Open this post in threaded view
|

Re: pgagent unicode support

Neel Patel
Hi Sergey,

Thanks for the review. Please find the attached updated patch.
Do review it and let me know in case of any comments.

Thanks,
Neel Patel

On Wed, Feb 24, 2021 at 5:12 PM Sergey Burladyan <[hidden email]> wrote:
Sergey Burladyan <[hidden email]> writes:

> Maybe it should look something like this:
> +        wcstombs_s(&charsConverted, mbs, mb_len + 1, wchar_str, mb_len);

Ah, and we missed check for error.

Something like this, maybe:
+#ifdef __WIN32__
+        size_t charsConverted = 0;
+        if (wcstombs_s(&charsConverted, mbs, mb_len + 1, wchar_str, mb_len) != 0) {
+                delete [] mbs;
+                return NULL;
+        }
+#else

--
Sergey Burladyan

pgagent_unicode_v2.patch (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pgagent unicode support

Sergey Burladyan
Neel Patel <[hidden email]> writes:

> Thanks for the review. Please find the attached updated patch.
> Do review it and let me know in case of any comments.

Looks good, thanks!

--
Sergey Burladyan


Reply | Threaded
Open this post in threaded view
|

Re: pgagent unicode support

Neel Patel
Hi Dave/Ashesh,

Do you have any further comments ?

Thanks,
Neel Patel

On Wed, Feb 24, 2021 at 8:14 PM Sergey Burladyan <[hidden email]> wrote:
Neel Patel <[hidden email]> writes:

> Thanks for the review. Please find the attached updated patch.
> Do review it and let me know in case of any comments.

Looks good, thanks!

--
Sergey Burladyan
Reply | Threaded
Open this post in threaded view
|

Re: pgagent unicode support

Dave Page-7


On Fri, Feb 26, 2021 at 6:36 AM Neel Patel <[hidden email]> wrote:
Hi Dave/Ashesh,

Do you have any further comments ?

Thanks Sergey, Neel. I have no further comments.
 

Thanks,
Neel Patel

On Wed, Feb 24, 2021 at 8:14 PM Sergey Burladyan <[hidden email]> wrote:
Neel Patel <[hidden email]> writes:

> Thanks for the review. Please find the attached updated patch.
> Do review it and let me know in case of any comments.

Looks good, thanks!

--
Sergey Burladyan


--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EDB: http://www.enterprisedb.com