[PATCH] COPY command's data format option allows only lowercase csv, text or binary

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

[PATCH] COPY command's data format option allows only lowercase csv, text or binary

Bharath Rupireddy
Hi,

COPY command's FORMAT option allows only all lowercase csv, text or
binary, this is true because strcmp is being used while parsing these
values.

It would be nice if the uppercase or combination of lower and upper
case format options such as CSV, TEXT, BINARY, Csv, Text, Binary so
on. is also allowed.

To achieve this pg_strcasecmp() is used instead of strcmp.

Attached is a patch having above changes.

Request the community to review the patch, if it makes sense.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

v1-0001-COPY-command-s-data-format-option-allows-only-low.patch (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] COPY command's data format option allows only lowercase csv, text or binary

Tom Lane-2
Bharath Rupireddy <[hidden email]> writes:
> COPY command's FORMAT option allows only all lowercase csv, text or
> binary, this is true because strcmp is being used while parsing these
> values.

This is nonsense, actually:

regression=# create table foo (f1 int);
CREATE TABLE
regression=# copy foo from stdin (format CSV);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.

As that shows, there's already a round of lowercasing done by the parser.
The only way that strcasecmp in copy.c would be useful is if you wanted to
accept things like
        copy foo from stdin (format "CSV");
I don't find that to be a terribly good idea.  The normal implication
of quoting is that it *prevents* case folding, so why should that
happen anyway?

More generally, though, why would we want to change this policy only
here?  I believe we're reasonably consistent about letting the parser
do any required down-casing and then just checking keyword matches
with strcmp.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] COPY command's data format option allows only lowercase csv, text or binary

Robert Haas
On Wed, Jun 24, 2020 at 10:27 AM Tom Lane <[hidden email]> wrote:
> More generally, though, why would we want to change this policy only
> here?  I believe we're reasonably consistent about letting the parser
> do any required down-casing and then just checking keyword matches
> with strcmp.

I've had the feeling in the past that our use of pg_strcasecmp() was a
bit random. Looking through the output of 'git grep pg_strcasecmp', it
seems like we don't typically don't use it on option names, but
sometimes we use it on option values. For instance, DefineCollation()
uses pg_strcasecmp() on the collproviderstr, and DefineType() uses it
on storageEl; and also, not to be overlooked, defGetBoolean() uses it
when matching true/false/on/off, which probably affects a lot of
places. On the other hand, ExplainQuery() matches the format using
plain old strcmp(), despite indirectly using pg_strcasecmp() for the
Boolean parameters. So I guess I'm not really convinced that there is
all that much consistency here.

Mind you, I'm not sure whether or not anything really needs to be
changed, or exactly what ought to be done. I'm just making the
observation that it might not be as consistent as you may think.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] COPY command's data format option allows only lowercase csv, text or binary

Tom Lane-2
Robert Haas <[hidden email]> writes:
> On Wed, Jun 24, 2020 at 10:27 AM Tom Lane <[hidden email]> wrote:
>> More generally, though, why would we want to change this policy only
>> here?  I believe we're reasonably consistent about letting the parser
>> do any required down-casing and then just checking keyword matches
>> with strcmp.

> ... Mind you, I'm not sure whether or not anything really needs to be
> changed, or exactly what ought to be done. I'm just making the
> observation that it might not be as consistent as you may think.

Yeah, I'm sure there are a few inconsistencies.  We previously made a
pass to get rid of pg_strcasecmp for anything that had been through
the parser's downcasing (commit fb8697b31) but I wouldn't be surprised
if that missed a few cases, or if new ones have snuck in.  Anyway,
"don't use pg_strcasecmp unnecessarily" was definitely the agreed-to
policy as of Jan 2018.

My vague recollection is that there are a few exceptions (defGetBoolean
may well be one of them) where pg_strcasecmp still seemed necessary
because the input might not have come through the parser in some usages.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] COPY command's data format option allows only lowercase csv, text or binary

Michael Paquier-2
On Wed, Jun 24, 2020 at 12:55:22PM -0400, Tom Lane wrote:
> Yeah, I'm sure there are a few inconsistencies.  We previously made a
> pass to get rid of pg_strcasecmp for anything that had been through
> the parser's downcasing (commit fb8697b31) but I wouldn't be surprised
> if that missed a few cases, or if new ones have snuck in.  Anyway,
> "don't use pg_strcasecmp unnecessarily" was definitely the agreed-to
> policy as of Jan 2018.

0d8c9c1 has introduced some in parse_basebackup_options() for the
new manifest option, and fe30e7e for AlterType(), no?

> My vague recollection is that there are a few exceptions (defGetBoolean
> may well be one of them) where pg_strcasecmp still seemed necessary
> because the input might not have come through the parser in some usages.

Yep, there were a couple of exceptions.  What was done at this time
was a case-by-case lookup to check what came only from the parser.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] COPY command's data format option allows only lowercase csv, text or binary

Bharath Rupireddy
In reply to this post by Tom Lane-2
> As that shows, there's already a round of lowercasing done by the parser.
> The only way that strcasecmp in copy.c would be useful is if you wanted to
> accept things like
>         copy foo from stdin (format "CSV");
> I don't find that to be a terribly good idea.  The normal implication
> of quoting is that it *prevents* case folding, so why should that
> happen anyway?
>

I was able to know that the parser does the lowercasing for other
parts of the query,
what I missed in my understanding is that about the proper usage of quoting.

Thanks for letting me know this point.

I agree with the above understanding to not change that behavior.

Please ignore this patch.

Thank you all for your responses.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] COPY command's data format option allows only lowercase csv, text or binary

Michael Paquier-2
In reply to this post by Michael Paquier-2
On Thu, Jun 25, 2020 at 11:07:33AM +0900, Michael Paquier wrote:
> 0d8c9c1 has introduced some in parse_basebackup_options() for the
> new manifest option, and fe30e7e for AlterType(), no?

Please forget this one.  I had a moment of brain fade.  Those have
been added for the option values, and on the option names we use
directly strcmp(), so I am not actually seeing a code path on HEAD
where we use pg_strcasecmp for something coming only from the parser.
--
Michael

signature.asc (849 bytes) Download Attachment