[PATCH] - Provide robust alternatives for replace_string

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

[PATCH] - Provide robust alternatives for replace_string

Georgios
Hi,

In our testing framework, backed by pg_regress, there exists the ability to use special strings
that can be replaced by environment based ones. Such an example is '@testtablespace@'. The
function used for this replacement is replace_string which inline replaces these occurrences in
original line. It is documented that the original line buffer should be large enough to accommodate.

However, it is rather possible and easy for subtle errors to occur, especially if there are multiple
occurrences to be replaced in long enough lines. Please find two distinct versions of a possible
solution. One, which is preferred, is using StringInfo though it requires for stringinfo.h to be included
in pg_regress.c. The other patch is more basic and avoids including stringinfo.h. As a reminder
stringinfo became available in the frontend in commit (26aaf97b683d)

Because the original replace_string() is exposed to other users, it is currently left intact.
Also if required, an error can be raised in the original function, in cases that the string is not
long enough to accommodate the replacements.

Worthwhile to mention that currently there are no such issues present in the test suits. It should
not hurt to do a bit better though.
 
//Asim and Georgios



0001-Use-stringInfo-instead-of-char-in-replace_string.patch (4K) Download Attachment
0001-Heap-allocated-string-version-of-replace_string.patch (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] - Provide robust alternatives for replace_string

Alvaro Herrera-9
What happens if a replacement string happens to be split in the middle
by the fgets buffering?  I think it'll fail to be replaced.  This
applies to both versions.

In the stringinfo version it seemed to me that using pnstrdup is
possible to avoid copying trailing bytes.

If you're asking for opinion, mine is that StringInfo looks to be the
better approach, and also you don't need to keep API compatibility.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] - Provide robust alternatives for replace_string

Asim Praveen
Thank you Alvaro for reviewing the patch!

> On 01-Aug-2020, at 7:22 AM, Alvaro Herrera <[hidden email]> wrote:
>
> What happens if a replacement string happens to be split in the middle
> by the fgets buffering?  I think it'll fail to be replaced.  This
> applies to both versions.

Can a string to be replaced be split across multiple lines in the source file?  If I understand correctly, fgets reads one line from input file at a time.  If I do not, in the worst case, we will get an un-replaced string in the output, such as “@abs_dir@“ and it should be easily detected by a failing diff.

> In the stringinfo version it seemed to me that using pnstrdup is
> possible to avoid copying trailing bytes.
>

That’s a good suggestion.  Using pnstrdup would look like this:

--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -465,7 +465,7 @@ replace_stringInfo(StringInfo string, const char *replace, const char *replaceme

        while ((ptr = strstr(string->data, replace)) != NULL)
        {
-               char       *dup = pg_strdup(string->data);
+              char       *dup = pnstrdup(string->data, string->maxlen);
                size_t          pos = ptr - string->data;

                string->len = pos;

 
> If you're asking for opinion, mine is that StringInfo looks to be the
> better approach, and also you don't need to keep API compatibility.
>

Thank you.  We also prefer StringInfo solution.

Asim
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] - Provide robust alternatives for replace_string

Alvaro Herrera-9
On 2020-Aug-03, Asim Praveen wrote:

> Thank you Alvaro for reviewing the patch!
>
> > On 01-Aug-2020, at 7:22 AM, Alvaro Herrera <[hidden email]> wrote:
> >
> > What happens if a replacement string happens to be split in the middle
> > by the fgets buffering?  I think it'll fail to be replaced.  This
> > applies to both versions.
>
> Can a string to be replaced be split across multiple lines in the source file?  If I understand correctly, fgets reads one line from input file at a time.  If I do not, in the worst case, we will get an un-replaced string in the output, such as “@abs_dir@“ and it should be easily detected by a failing diff.

I meant what if the line is longer than 1023 chars and the replace
marker starts at byte 1021, for example.  Then the first fgets would get
"@ab" and the second fgets would get "s_dir@" and none would see it as
replaceable.

> > In the stringinfo version it seemed to me that using pnstrdup is
> > possible to avoid copying trailing bytes.
>
> That’s a good suggestion.  Using pnstrdup would look like this:
>
> --- a/src/test/regress/pg_regress.c
> +++ b/src/test/regress/pg_regress.c
> @@ -465,7 +465,7 @@ replace_stringInfo(StringInfo string, const char *replace, const char *replaceme
>
>         while ((ptr = strstr(string->data, replace)) != NULL)
>         {
> -               char       *dup = pg_strdup(string->data);
> +              char       *dup = pnstrdup(string->data, string->maxlen);

I was thinking pnstrdup(string->data, ptr - string->data) to avoid
copying the chars beyond ptr.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] - Provide robust alternatives for replace_string

Asim Praveen


> On 03-Aug-2020, at 8:36 PM, Alvaro Herrera <[hidden email]> wrote:
>
> On 2020-Aug-03, Asim Praveen wrote:
>
>> Thank you Alvaro for reviewing the patch!
>>
>>> On 01-Aug-2020, at 7:22 AM, Alvaro Herrera <[hidden email]> wrote:
>>>
>>> What happens if a replacement string happens to be split in the middle
>>> by the fgets buffering?  I think it'll fail to be replaced.  This
>>> applies to both versions.
>>
>> Can a string to be replaced be split across multiple lines in the source file?  If I understand correctly, fgets reads one line from input file at a time.  If I do not, in the worst case, we will get an un-replaced string in the output, such as “@abs_dir@“ and it should be easily detected by a failing diff.
>
> I meant what if the line is longer than 1023 chars and the replace
> marker starts at byte 1021, for example.  Then the first fgets would get
> "@ab" and the second fgets would get "s_dir@" and none would see it as
> replaceable.

Thanks for the patient explanation, I had missed the obvious.  To keep the code simple, I’m in favour of relying on the diff of a failing test to catch the split-replacement string problem.

>
>>> In the stringinfo version it seemed to me that using pnstrdup is
>>> possible to avoid copying trailing bytes.
>>
>> That’s a good suggestion.  Using pnstrdup would look like this:
>>
>> --- a/src/test/regress/pg_regress.c
>> +++ b/src/test/regress/pg_regress.c
>> @@ -465,7 +465,7 @@ replace_stringInfo(StringInfo string, const char *replace, const char *replaceme
>>
>>        while ((ptr = strstr(string->data, replace)) != NULL)
>>        {
>> -               char       *dup = pg_strdup(string->data);
>> +              char       *dup = pnstrdup(string->data, string->maxlen);
>
> I was thinking pnstrdup(string->data, ptr - string->data) to avoid
> copying the chars beyond ptr.
>

In fact, what we need in the dup are chars beyond ptr.  Copying of characters prefixing the string to be replaced can be avoided, like so:

--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -465,12 +465,12 @@ replace_stringInfo(StringInfo string, const char *replace, const char *replaceme

        while ((ptr = strstr(string->data, replace)) != NULL)
        {
-               char       *dup = pg_strdup(string->data);
+               char       *suffix = pnstrdup(ptr + strlen(replace), string->maxlen);
                size_t          pos = ptr - string->data;

                string->len = pos;
                appendStringInfoString(string, replacement);
-               appendStringInfoString(string, dup + pos + strlen(replace));
+               appendStringInfoString(string, suffix);

-               free(dup);
+               free(suffix);
        }
}


Asim
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] - Provide robust alternatives for replace_string

Asim Praveen
In reply to this post by Alvaro Herrera-9

> On 03-Aug-2020, at 8:36 PM, Alvaro Herrera <[hidden email]> wrote:
>
> On 2020-Aug-03, Asim Praveen wrote:
>
>> Thank you Alvaro for reviewing the patch!
>>
>>> On 01-Aug-2020, at 7:22 AM, Alvaro Herrera <[hidden email]> wrote:
>>>
>>> What happens if a replacement string happens to be split in the middle
>>> by the fgets buffering?  I think it'll fail to be replaced.  This
>>> applies to both versions.
>>
>> Can a string to be replaced be split across multiple lines in the source file?  If I understand correctly, fgets reads one line from input file at a time.  If I do not, in the worst case, we will get an un-replaced string in the output, such as “@abs_dir@“ and it should be easily detected by a failing diff.
>
> I meant what if the line is longer than 1023 chars and the replace
> marker starts at byte 1021, for example.  Then the first fgets would get
> "@ab" and the second fgets would get "s_dir@" and none would see it as
> replaceable.
>

Please find attached a StringInfo based solution to this problem.  It uses fgetln instead of fgets such that a line is read in full, without ever splitting it.

Asim


0001-Use-a-stringInfo-instead-of-a-char-for-replace_strin.patch (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] - Provide robust alternatives for replace_string

Alvaro Herrera-9
On 2020-Aug-05, Asim Praveen wrote:

> Please find attached a StringInfo based solution to this problem.  It
> uses fgetln instead of fgets such that a line is read in full, without
> ever splitting it.

never heard of fgetln, my system doesn't have a manpage for it, and we
don't use it anywhere AFAICS.  Are you planning to add something to
src/common for it?

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] - Provide robust alternatives for replace_string

Asim Praveen


> On 05-Aug-2020, at 7:01 PM, Alvaro Herrera <[hidden email]> wrote:
>
> On 2020-Aug-05, Asim Praveen wrote:
>
>> Please find attached a StringInfo based solution to this problem.  It
>> uses fgetln instead of fgets such that a line is read in full, without
>> ever splitting it.
>
> never heard of fgetln, my system doesn't have a manpage for it, and we
> don't use it anywhere AFAICS.  Are you planning to add something to
> src/common for it?
>
Indeed!  I noticed fgetln on the man page of fgets and used it without checking.  And this happened on a MacOS system.

Please find a revised version that uses fgetc instead.

Asim


v2-0001-Use-a-stringInfo-instead-of-a-char-for-replace_st.patch (4K) Download Attachment