Camel case identifiers and folding

classic Classic list List threaded Threaded
20 messages Options
Reply | Threaded
Open this post in threaded view
|

Camel case identifiers and folding

Steve Haresnape
I'm porting a sql server database to postgresql 9.6. My camelCase identifiers are having their humps removed. This is disconcerting and sad.

Is there a cure for this?

I don't want to quote my identifiers unless I have to. I don't want to use underscores. Everywhere else I write code I use camel case. I makes me feel warm inside.

This is folding in the same sense Procrustes folded his guests to fit his bed.

The sql standard has some dumb stuff in it. Do not appeal to it. It can be safely ignored in this regard (by Microsoft, and anyone else wishing to prevent the propagation of stupidity).

The only mechanism available to put any semantics into the database is via the naming of identifiers. Why screw with that? Imagine what would happen if your IDE decided to do that to your java code.


Reply | Threaded
Open this post in threaded view
|

Re: Camel case identifiers and folding

David G Johnston
On Thu, Mar 14, 2019 at 4:07 PM Steve Haresnape
<[hidden email]> wrote:
>
> I'm porting a sql server database to postgresql 9.6. My camelCase identifiers are having their humps removed. This is disconcerting and sad.
>
> Is there a cure for this?

No

>I don't want to quote my identifiers unless I have to.

PostgreSQL made the choice long ago to normalize unquoted identifiers
to lower case.  Quoting them will preserve whatever you type,
including case.

David J.

Reply | Threaded
Open this post in threaded view
|

Re: Camel case identifiers and folding

Thomas Kellerer
In reply to this post by Steve Haresnape
Steve Haresnape schrieb am 15.03.2019 um 00:06:
> I'm porting a sql server database to postgresql 9.6. My camelCase
> identifiers are having their humps removed. This is disconcerting and
> sad.
>
> Is there a cure for this?
>
> I don't want to quote my identifiers unless I have to.

You don't need to.
Nothing prevents you from writing CustomerOrders if the table was not created with double quotes.

The only thing is, that the name is stored differently.
But you can still use CamelCase all over the place if stick to unquoted identifiers.




Reply | Threaded
Open this post in threaded view
|

Re: Camel case identifiers and folding

Wim Bertels-3
In reply to this post by Steve Haresnape
Steve Haresnape schreef op vr 15-03-2019 om 12:06 [+1300]:
> I'm porting a sql server database to postgresql 9.6. My camelCase
> identifiers are having their humps removed. This is disconcerting and
> sad.

the SQL ISO defines exactly this behaviour (as you note),
so this is be expected

suggestion : instead of CamelCase use seperate_lower_case_by_underscore

You don't like this notation, maybe this can help.
It is suggested that is more relaxing for the brain, our brain tends to
focus on the Capitals first.
Reasoning: So scanning a literal like dogFeedFood first goes the F of
Feed and Food and only then goes back to dog.

Changing habits can be hard, unfortunately.

"" > literal
'' > string

--
mvg,
Wim
--
Lector
UC Leuven-Limburg
--
O, it is excellent
To have a giant's strength; but it is tyrannous
To use it like a giant.
                -- Shakespeare, "Measure for Measure", II, 2

Reply | Threaded
Open this post in threaded view
|

Re: Camel case identifiers and folding

Morris de Oryx
The original question has already been answered really well, but it reminds me to mention that Postgres text/varchar values are case-sensitive. Here's a list of the times when I would like a case-sensitive text field:

   Never

Now here's the list of times I would like a case-blind text field:

   Everywhere else.

If this is how you feel too, there are several alternatives. The one that I've chosen is to use the citext extension instead of text fields.This takes care of the problem without having to add extra function calls to your queries, do anything special with indexes, etc.

 If you have JSON, which has case-sensitive element names, use JSONB. 

Your requirements may differ than mine! Other people have good reason to want case-sensitive searches. I just never do. (30+ years in programming and I can't remember a time I wanted user data to be treated case-sensitively...but you never know...one day...maybe.) There's also an extension for stripping accents, which I've not needed.

I've idly wondered if using a different collation on a text field might be a better answer than using citext everywhere? If anyone wants to set me straight on this, I'd be grateful.



lup
Reply | Threaded
Open this post in threaded view
|

Re: Camel case identifiers and folding

lup


On Mar 15, 2019, at 4:43 PM, Morris de Oryx <[hidden email]> wrote:

The original question has already been answered really well, but it reminds me to mention that Postgres text/varchar values are case-sensitive. Here's a list of the times when I would like a case-sensitive text field:

   Never

Now here's the list of times I would like a case-blind text field:

   Everywhere else.

If this is how you feel too, there are several alternatives. The one that I've chosen is to use the citext extension instead of text fields.This takes care of the problem without having to add extra function calls to your queries, do anything special with indexes, etc.

 If you have JSON, which has case-sensitive element names, use JSONB. 

Your requirements may differ than mine! Other people have good reason to want case-sensitive searches. I just never do. (30+ years in programming and I can't remember a time I wanted user data to be treated case-sensitively...but you never know...one day...maybe.) There's also an extension for stripping accents, which I've not needed.

I've idly wondered if using a different collation on a text field might be a better answer than using citext everywhere? If anyone wants to set me straight on this, I'd be grateful.



What sort of content is in your field of type text?  Certainly, in English prose, “rob” is different than “Rob” and if the content is for a web page (or in my experience, the content of medical reference books) these differences are critical.
Reply | Threaded
Open this post in threaded view
|

Re: Camel case identifiers and folding

Morris de Oryx
We definitely store data case-sensitively, we just never want to search on it case-sensitively. That's what citext gives us.  Many databases perform this way as a default. Postgres does not, but it offers alternatives. The OP is coming from MySQL which, if I remember correctly, treated non-binary-like text containers as case-insensitive in searches. That's why I mentioned the Postgres behavior, it's a gotcha if you're assuming something else will happen.

More to the point, users never want case-sensitive searches, it's just confusing for them.

There are places where we've got data where byte-level/code page differences are significant. But a lot of that is binary-like data. These are rare, and I'm willing to do a bit of extra work for them. I can't even think of such a case off the top of my head.

UUIDs as a type are an interesting case in Postgres. They're stored as a large numeric for efficiency (good!), but are presented by default in the 36-byte format with the dashes. However, you can also search using the dashes 32-character format....and it all works. Case-insensitively. Postgres converses 36/32 char strings of any case combination back into the relevant number and then searches. Anything else would be pointlessly hard to deal with.

There are also cases where case-sensitivity is not optional. For example, we save and generate JSON (like everyone else) for various tasks. JSON element names are case-sensitive. Not our call, just the way it is. Personally, I think that case-sensitive language element names are one of the stupidest design choices in history...but no one asked me. There are solid arguments in favor of the idea (Dijkstra himself argued in their favor), and it's an unchangeable fact of life. So in those cases, yeah, case-sensitivity matters. Namely, if the data itself is case-sensitive. The truth is, I rarely have a reason to use a 0NF packed field type like JSON....so the issue doesn't come up in our Postgres searches. But if I did plan to store JSON, say API response logs, I'd want those searches to be case-sensitive and would use JSONB and the necessary operators.


Reply | Threaded
Open this post in threaded view
|

Fwd: Camel case identifiers and folding

Steve Haresnape
In reply to this post by David G Johnston
My apologies I sent this reply to David only by mistake.

---------- Forwarded message ---------
From: Steve Haresnape <[hidden email]>
Date: Fri, 15 Mar 2019 at 13:09
Subject: Re: Camel case identifiers and folding
To: David G. Johnston <[hidden email]>


As I said, I don't want to quote my identifiers. I know what that does. I want to specify them in a certain way, see them in that same way, but refer to them in any old way.

You can call it normalize or fold or whatever. It's a bad design choice, and not even a completely compliant choice. 

Is a cure contemplated? I know it's not just me that dislikes this.

On Fri, 15 Mar 2019 at 12:21, David G. Johnston <[hidden email]> wrote:
On Thu, Mar 14, 2019 at 4:07 PM Steve Haresnape
<[hidden email]> wrote:
>
> I'm porting a sql server database to postgresql 9.6. My camelCase identifiers are having their humps removed. This is disconcerting and sad.
>
> Is there a cure for this?

No

>I don't want to quote my identifiers unless I have to.

PostgreSQL made the choice long ago to normalize unquoted identifiers
to lower case.  Quoting them will preserve whatever you type,
including case.

David J.


--
Steve Haresnape
60 Kauri Road, Awhitu, RD 4 Waiuku 2684
Phone: (09) 235 1698
Mobile: 021 514 666


--
Steve Haresnape
60 Kauri Road, Awhitu, RD 4 Waiuku 2684
Phone: (09) 235 1698
Mobile: 021 514 666
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Camel case identifiers and folding

Adrian Klaver-4
On 3/16/19 1:53 AM, Steve Haresnape wrote:

> My apologies I sent this reply to David only by mistake.
>
> ---------- Forwarded message ---------
> From: *Steve Haresnape* <[hidden email]
> <mailto:[hidden email]>>
> Date: Fri, 15 Mar 2019 at 13:09
> Subject: Re: Camel case identifiers and folding
> To: David G. Johnston <[hidden email]
> <mailto:[hidden email]>>
>
>
> As I said, I don't want to quote my identifiers. I know what that does.
> I want to specify them in a certain way, see them in that same way, but
> refer to them in any old way.
>
> You can call it normalize or fold or whatever. It's a bad design choice,
> and not even a completely compliant choice.

It is SQL standard(sort of) so I would not call it a bad choice. It
deviates from the standard in that it folds down not up, but for your
case that would not matter.

>
> Is a cure contemplated? I know it's not just me that dislikes this.

I would say no. Probably matched by folks who would dislike having it
changed.

>
> On Fri, 15 Mar 2019 at 12:21, David G. Johnston
> <[hidden email] <mailto:[hidden email]>> wrote:
>
>     On Thu, Mar 14, 2019 at 4:07 PM Steve Haresnape
>     <[hidden email]
>     <mailto:[hidden email]>> wrote:
>      >
>      > I'm porting a sql server database to postgresql 9.6. My camelCase
>     identifiers are having their humps removed. This is disconcerting
>     and sad.
>      >
>      > Is there a cure for this?
>
>     No
>
>      >I don't want to quote my identifiers unless I have to.
>
>     PostgreSQL made the choice long ago to normalize unquoted identifiers
>     to lower case.  Quoting them will preserve whatever you type,
>     including case.
>
>     David J.
>
>
>
> --
> Steve Haresnape
> 60 Kauri Road, Awhitu, RD 4 Waiuku 2684
> [hidden email]
> <mailto:[hidden email]>
> Phone: (09) 235 1698
> Mobile: 021 514 666
>
>
> --
> Steve Haresnape
> 60 Kauri Road, Awhitu, RD 4 Waiuku 2684
> [hidden email]
> <mailto:[hidden email]>
> Phone: (09) 235 1698
> Mobile: 021 514 666


--
Adrian Klaver
[hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Camel case identifiers and folding

Tom Lane-2
In reply to this post by Steve Haresnape
Steve Haresnape <[hidden email]> writes:
> As I said, I don't want to quote my identifiers. I know what that does. I
> want to specify them in a certain way, see them in that same way, but refer
> to them in any old way.
> You can call it normalize or fold or whatever. It's a bad design choice,
> and not even a completely compliant choice.

> Is a cure contemplated? I know it's not just me that dislikes this.

No.

There have been previous discussions of allowing variant case-folding
rules, and the conclusion has always been that it would break so much
stuff as to be entirely not worth the trouble.

The big problem with making significant semantics changes like this
be optional is that authors of general-purpose tools then have to be
prepared to cope with all the possibilities.  That's a pretty enormous
cost to load onto other people.  If it *only* affected the core code,
maybe you could find somebody to do the work and call it done, but
actually the implications would reverberate across the entire Postgres
ecosystem.  That's a tough call to make for a change that can't even
be painted as meeting a widely-favored goal like better SQL spec
compliance.

Now, in the spirit of full disclosure, I should say that the only form
of this idea that people have really spent significant effort looking
at is exactly the fully-SQL-spec-compliant case-folding rule, ie just
like Postgres normally does it except unquoted identifiers fold to
all-upper-case not all-lower.  Perhaps there's some reason why what
you want would be less painful than that turns out to be ... but I'm
not seeing such a reason offhand.  In fact I suspect your preference
is actually worse, it'd require behavior changes in more places.
As an example, I believe your request would require case-insensitive
uniqueness enforcement in the system catalogs' unique indexes on names.
You have no idea how large a can of worms that opens (but I'll just
mention that "which characters are letters" doesn't even have a well
defined universal answer).

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: Camel case identifiers and folding

Peter J. Holzer
In reply to this post by lup
On 2019-03-15 17:09:49 -0600, Rob Sargent wrote:

>     On Mar 15, 2019, at 4:43 PM, Morris de Oryx <[hidden email]> wrote:
>
>     The original question has already been answered really well, but it reminds
>     me to mention that Postgres text/varchar values are case-sensitive. Here's
>     a list of the times when I would like a case-sensitive text field:
>
>        Never
>
>     Now here's the list of times I would like a case-blind text field:
>
>        Everywhere else.
>
[...]
> What sort of content is in your field of type text?  Certainly, in English
> prose, “rob” is different than “Rob”

I disagree. While the grammar for written English has rules when to
write "rob" and when to write "Rob", that distinction usually carries no
semantic difference. Consider:

"How to Rob the Hump of a Camel"

"the go programming language was invented by rob pike, ken thompson and
robert griesemer"

Here "Rob" is a verb and "rob" is a first name, the opposite of what you
probably intended. Yet the the first sentence is grammatically correct
if it is a title and while the second isn't correct, few people will
have difficulties understanding it (many probably won't even notice that
it is all lower case).

Spoken English of course doesn't even have a case distinction.

> and if the content is for a web page (or in my experience, the content
> of medical reference books) these differences are critical.

A web page? Rarely, at least for the human readable parts. Medicine? I
don't know. There may be names for different substances which differ
only in case. But those are parts of a formal language, and as
programmers we already know about case-sensitive formal languages.

        hp

--
   _  | Peter J. Holzer    | we build much bigger, better disasters now
|_|_) |                    | because we have much more sophisticated
| |   | [hidden email]         | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>

signature.asc (849 bytes) Download Attachment
lup
Reply | Threaded
Open this post in threaded view
|

Re: Camel case identifiers and folding

lup


What sort of content is in your field of type text?  Certainly, in English
prose, “rob” is different than “Rob”

I disagree. While the grammar for written English has rules when to
write "rob" and when to write "Rob", that distinction usually carries no
semantic difference. Consider:

"How to Rob the Hump of a Camel"

"the go programming language was invented by rob pike, ken thompson and
robert griesemer"

Here "Rob" is a verb and "rob" is a first name, the opposite of what you
probably intended. Yet the the first sentence is grammatically correct
if it is a title and while the second isn't correct, few people will
have difficulties understanding it (many probably won't even notice that
it is all lower case).

Spoken English of course doesn't even have a case distinction.

and if the content is for a web page (or in my experience, the content
of medical reference books) these differences are critical.

A web page? Rarely, at least for the human readable parts. Medicine? I
don't know. There may be names for different substances which differ
only in case. But those are parts of a formal language, and as
programmers we already know about case-sensitive formal languages.

I don’t think it’s solely about the semantics.  One might be contractually obligated to always spell a name in some exact way including it capitalization. For instance if referring to "Rob Sargent” as a quote or accreditation, then it’s not okay to let a typo “rob Sargent” go through.

Reply | Threaded
Open this post in threaded view
|

Re: Camel case identifiers and folding

Andrew Gierth
In reply to this post by Morris de Oryx
>>>>> "Morris" == Morris de Oryx <[hidden email]> writes:

 Morris> UUIDs as a type are an interesting case in Postgres. They're
 Morris> stored as a large numeric for efficiency (good!), but are
 Morris> presented by default in the 36-byte format with the dashes.
 Morris> However, you can also search using the dashes 32-character
 Morris> format....and it all works. Case-insensitively.

That works because UUIDs have a convenient canonical form (the raw
bytes) which all input is converted to before comparison.

Text is ... not like this.

Even citext is really only a hack - it assumes that comparisons can be
done by conversion to lowercase, which may work well enough for English
but I'm pretty sure it does not correctly handle the edge cases in, for
example, German (consider 'SS', 'ss', 'ß') or Greek (final sigma). Doing
it better would require proper application of case-folding rules, and
even that would require handling of edge cases (the Unicode case folding
algorithm is designed to be language-independent, which means that it
breaks for Turkish without special-case exceptions).

--
Andrew (irc:RhodiumToad)

Reply | Threaded
Open this post in threaded view
|

Re: Camel case identifiers and folding

Peter J. Holzer
In reply to this post by lup
On 2019-03-16 14:00:34 -0600, Rob Sargent wrote:
>         What sort of content is in your field of type text?  Certainly, in
>         English
>         prose, “rob” is different than “Rob”
>
>
>     I disagree. While the grammar for written English has rules when to
>     write "rob" and when to write "Rob", that distinction usually carries no
>     semantic difference. Consider:
[...]
> I don’t think it’s solely about the semantics.  One might be contractually
> obligated to always spell a name in some exact way including it capitalization.
> For instance if referring to "Rob Sargent” as a quote or accreditation, then
> it’s not okay to let a typo “rob Sargent” go through.

1) Such contracts might exist, but they are only binding to the signing
parties, they don't affect what is commonly understood as "the English
language". Everybody else will see it as an obvious typo and won't
assume that this refers to some "rob Sargent" who is a different person
than "Rob Sargent".

2) I don't think the OP was talking about spell-checking. And in any
case spell-checking is more complicated than simply comparing strings
byte by byte.

        hp


--
   _  | Peter J. Holzer    | we build much bigger, better disasters now
|_|_) |                    | because we have much more sophisticated
| |   | [hidden email]         | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Camel case identifiers and folding

Morris de Oryx
Sounds like I may have touched a nerve with some. If so, no offense intended!

There are cases where case-sensitivity is required or desirable, it would be silly to argue otherwise. Where you have such cases, then case-sensitive queries are great. Some RDBMS systems default to case-sensitive searches, others default to case-blind searches. I'll take that to mean that either choice has merit, and it's not foolish to prefer either. I need case-blind searches vanishingly close to 100% of the time, but other people have different conditions and may find that they almost never need case-blind searches. I find this extreme hard to imagine, but find it easy to imagine people who need case-sensitive searches quite often.

What I've been thinking about most are user-driven searches on user-oriented data. Users, at least any user I've ever had, don't want case-sensitive searches. They also don't care about diacritical characters, at least in English. I worked for a French company for many years. Diacritical searches were not always preferred, but sometimes were. It depends on your user community and their requirements and norms. Sometimes it comes down to an individual user. Options are good! I was really just trying to warn someone coming from a base-blind default about Postgres behavior because, well, it hurts if you aren't expecting it. That doesn't make Postgres wrong (it's not that kind of a choice), but it is important to know about.

I'm new to Postgres (only about a year in), and it's great. But I'm used to a case-blind search as a default. And, honestly, I can *never* remember a case when a user asked for a case-sensitive search. Ever. In 30+ years. Maybe it's just me. Just kidding, it's not just me. If you're presenting users with a search interface, you can find out by asking them. Or you can AB test a search UI where there is the option of case-sensitive/blind searching, but you randomly flip which is the default. For users, case-sensitive searches are assumed. That's what Google does. Seriously, Google === Search. It's not a hard test to run. If you find that with a case-blind search, 30% of user tick the box to make it case-sensitive, then you've got users that often do care about case-sensitive search.

And since it seems to be unclear at a few places in the discussion above: It absolutely makes sense to store data in its original form and to allow for case-sensitive searches, when required. It would be very weird to store

     Call me back Ishmael, I've gotta go...

and get back anything else, be that

     call me back ishmael, i've gotta go...

or

     CALL ME BACK ISHMAEL, I'VE GOTTA GO...

As far as I understand it in Postgres, you can:

* Use something like UPPER or LOWER in Every. Single. Search.

* Fold text to one case consistently in an index to make searches case-blind.

* Use citext.

* Teach and convince every user to enter and search for data case-sensitively correctly Every. Single. Time.

On that last point, good luck. Here's an example, I'm keen on birds. Do you write it:

Black-shouldered Kite
Black-Shouldered Kite


On Sun, Mar 17, 2019 at 8:46 AM Peter J. Holzer <[hidden email]> wrote:
On 2019-03-16 14:00:34 -0600, Rob Sargent wrote:
>         What sort of content is in your field of type text?  Certainly, in
>         English
>         prose, “rob” is different than “Rob”
>
>
>     I disagree. While the grammar for written English has rules when to
>     write "rob" and when to write "Rob", that distinction usually carries no
>     semantic difference. Consider:
[...]
> I don’t think it’s solely about the semantics.  One might be contractually
> obligated to always spell a name in some exact way including it capitalization.
> For instance if referring to "Rob Sargent” as a quote or accreditation, then
> it’s not okay to let a typo “rob Sargent” go through.

1) Such contracts might exist, but they are only binding to the signing
parties, they don't affect what is commonly understood as "the English
language". Everybody else will see it as an obvious typo and won't
assume that this refers to some "rob Sargent" who is a different person
than "Rob Sargent".

2) I don't think the OP was talking about spell-checking. And in any
case spell-checking is more complicated than simply comparing strings
byte by byte.

        hp


--
   _  | Peter J. Holzer    | we build much bigger, better disasters now
|_|_) |                    | because we have much more sophisticated
| |   | [hidden email]         | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>
-----BEGIN PGP SIGNATURE-----

iQIzBAABCAAdFiEETtJbRjyPwVTYGJ5k8g5IURL+KF0FAlyNbrwACgkQ8g5IURL+
KF2mKhAAq8b9RLFBihsO+dQf3pxjLt3bWoD9mhgMEy8GvxADuxAF4dLo1SqfX2S0
aIZFEyQgMoNKWOaiylj7TYPiBRilfVSCxggisPsirUKLKpXr4Bw9oIGiPBiE+21m
ajlaONOZNiaM9D+BFthFkPM0TcjR2FHTaXOch0HbFnPnDWMgEPwY9yyDeN8ZPCOn
7G002EB3wxHcnhoFm8jGO2E8SL9l0NLU6+CVlCPAenGI0H6gbFatHi1LKTmNe+PH
d32Il7Pf4GE0o/k92X0Yil3oqMSU/vV08LSrMVGxUDVAz+cL0/W40V/3BthELi95
l7kGJRJfgic7LU7IK84yszxR1ZJt1aBLyFIUpEoEsPl6XuSQsTOm/sFt/1C0L8Io
iLYNTnLh2CveKt/bldITAAPlHZ1rF92BUsAJRA2Wci5yv5V7orykDPgE8HzhB5U9
m1cHfjInqMi1ogGgSLw0sPH6ZDlT76p1H8RjjQQhD9eAFNYt29xFOUARyyWHVEut
ZyWMIeHzWpLRr/h+5y6uJ+LhH4NVMONc3unE6fy1MgIZtIcmeFrbx8+7XGKmv/kD
RY12aNOrCF85HdiIRd/ImXOsCWWiq7u7itZQrsqhQVRXNMnV/7M4Hlpe1rBeQ9V+
k2FXuFM/eEtyEO+KKYwct+1iYerTTKY6hyCQwBINVewe/Fohd+A=
=8B/x
-----END PGP SIGNATURE-----
Reply | Threaded
Open this post in threaded view
|

Re: Camel case identifiers and folding

Morris de Oryx
...hit send by accident

..or BLACK SHOULDERED KITE

You'll see all three regularly, depending on context and writer. It's always the same bird. A natural search turns up all matches, regardless of capitalization differences. The hyphens are hard....I'm a huge, huge, huge fan of fuzzy string matching and n-grams, but that's an unrelated topic.


On Mon, Mar 18, 2019 at 10:18 PM Morris de Oryx <[hidden email]> wrote:
Sounds like I may have touched a nerve with some. If so, no offense intended!

There are cases where case-sensitivity is required or desirable, it would be silly to argue otherwise. Where you have such cases, then case-sensitive queries are great. Some RDBMS systems default to case-sensitive searches, others default to case-blind searches. I'll take that to mean that either choice has merit, and it's not foolish to prefer either. I need case-blind searches vanishingly close to 100% of the time, but other people have different conditions and may find that they almost never need case-blind searches. I find this extreme hard to imagine, but find it easy to imagine people who need case-sensitive searches quite often.

What I've been thinking about most are user-driven searches on user-oriented data. Users, at least any user I've ever had, don't want case-sensitive searches. They also don't care about diacritical characters, at least in English. I worked for a French company for many years. Diacritical searches were not always preferred, but sometimes were. It depends on your user community and their requirements and norms. Sometimes it comes down to an individual user. Options are good! I was really just trying to warn someone coming from a base-blind default about Postgres behavior because, well, it hurts if you aren't expecting it. That doesn't make Postgres wrong (it's not that kind of a choice), but it is important to know about.

I'm new to Postgres (only about a year in), and it's great. But I'm used to a case-blind search as a default. And, honestly, I can *never* remember a case when a user asked for a case-sensitive search. Ever. In 30+ years. Maybe it's just me. Just kidding, it's not just me. If you're presenting users with a search interface, you can find out by asking them. Or you can AB test a search UI where there is the option of case-sensitive/blind searching, but you randomly flip which is the default. For users, case-sensitive searches are assumed. That's what Google does. Seriously, Google === Search. It's not a hard test to run. If you find that with a case-blind search, 30% of user tick the box to make it case-sensitive, then you've got users that often do care about case-sensitive search.

And since it seems to be unclear at a few places in the discussion above: It absolutely makes sense to store data in its original form and to allow for case-sensitive searches, when required. It would be very weird to store

     Call me back Ishmael, I've gotta go...

and get back anything else, be that

     call me back ishmael, i've gotta go...

or

     CALL ME BACK ISHMAEL, I'VE GOTTA GO...

As far as I understand it in Postgres, you can:

* Use something like UPPER or LOWER in Every. Single. Search.

* Fold text to one case consistently in an index to make searches case-blind.

* Use citext.

* Teach and convince every user to enter and search for data case-sensitively correctly Every. Single. Time.

On that last point, good luck. Here's an example, I'm keen on birds. Do you write it:

Black-shouldered Kite
Black-Shouldered Kite


On Sun, Mar 17, 2019 at 8:46 AM Peter J. Holzer <[hidden email]> wrote:
On 2019-03-16 14:00:34 -0600, Rob Sargent wrote:
>         What sort of content is in your field of type text?  Certainly, in
>         English
>         prose, “rob” is different than “Rob”
>
>
>     I disagree. While the grammar for written English has rules when to
>     write "rob" and when to write "Rob", that distinction usually carries no
>     semantic difference. Consider:
[...]
> I don’t think it’s solely about the semantics.  One might be contractually
> obligated to always spell a name in some exact way including it capitalization.
> For instance if referring to "Rob Sargent” as a quote or accreditation, then
> it’s not okay to let a typo “rob Sargent” go through.

1) Such contracts might exist, but they are only binding to the signing
parties, they don't affect what is commonly understood as "the English
language". Everybody else will see it as an obvious typo and won't
assume that this refers to some "rob Sargent" who is a different person
than "Rob Sargent".

2) I don't think the OP was talking about spell-checking. And in any
case spell-checking is more complicated than simply comparing strings
byte by byte.

        hp


--
   _  | Peter J. Holzer    | we build much bigger, better disasters now
|_|_) |                    | because we have much more sophisticated
| |   | [hidden email]         | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>
-----BEGIN PGP SIGNATURE-----

iQIzBAABCAAdFiEETtJbRjyPwVTYGJ5k8g5IURL+KF0FAlyNbrwACgkQ8g5IURL+
KF2mKhAAq8b9RLFBihsO+dQf3pxjLt3bWoD9mhgMEy8GvxADuxAF4dLo1SqfX2S0
aIZFEyQgMoNKWOaiylj7TYPiBRilfVSCxggisPsirUKLKpXr4Bw9oIGiPBiE+21m
ajlaONOZNiaM9D+BFthFkPM0TcjR2FHTaXOch0HbFnPnDWMgEPwY9yyDeN8ZPCOn
7G002EB3wxHcnhoFm8jGO2E8SL9l0NLU6+CVlCPAenGI0H6gbFatHi1LKTmNe+PH
d32Il7Pf4GE0o/k92X0Yil3oqMSU/vV08LSrMVGxUDVAz+cL0/W40V/3BthELi95
l7kGJRJfgic7LU7IK84yszxR1ZJt1aBLyFIUpEoEsPl6XuSQsTOm/sFt/1C0L8Io
iLYNTnLh2CveKt/bldITAAPlHZ1rF92BUsAJRA2Wci5yv5V7orykDPgE8HzhB5U9
m1cHfjInqMi1ogGgSLw0sPH6ZDlT76p1H8RjjQQhD9eAFNYt29xFOUARyyWHVEut
ZyWMIeHzWpLRr/h+5y6uJ+LhH4NVMONc3unE6fy1MgIZtIcmeFrbx8+7XGKmv/kD
RY12aNOrCF85HdiIRd/ImXOsCWWiq7u7itZQrsqhQVRXNMnV/7M4Hlpe1rBeQ9V+
k2FXuFM/eEtyEO+KKYwct+1iYerTTKY6hyCQwBINVewe/Fohd+A=
=8B/x
-----END PGP SIGNATURE-----
Reply | Threaded
Open this post in threaded view
|

Re: Camel case identifiers and folding

Chris Travers-5
In reply to this post by Steve Haresnape


On Thu, Mar 14, 2019 at 11:07 PM Steve Haresnape <[hidden email]> wrote:
I'm porting a sql server database to postgresql 9.6. My camelCase identifiers are having their humps removed. This is disconcerting and sad.

Is there a cure for this?

Double quote them. 

I don't want to quote my identifiers unless I have to. I don't want to use underscores. Everywhere else I write code I use camel case. I makes me feel warm inside.

The SQL standard mandates case folding.  We don't quite follow the standard (which mandates folding to upper case) because nobody here likes that, and so we fold to lower case.  In most cases, however, our behavior is close to that of the standard. 

This is folding in the same sense Procrustes folded his guests to fit his bed.

The sql standard has some dumb stuff in it. Do not appeal to it. It can be safely ignored in this regard (by Microsoft, and anyone else wishing to prevent the propagation of stupidity).

MySQL also ignores the standard and does not fold cases.  You can weigh this point vs what you give up by going that direction..... 

The only mechanism available to put any semantics into the database is via the naming of identifiers. Why screw with that? Imagine what would happen if your IDE decided to do that to your java code.


Different languages address case and identifiers differently, no?

Why should PostgreSQL follow a Java standard? 


--
Best Wishes,
Chris Travers

Efficito:  Hosted Accounting and ERP.  Robust and Flexible.  No vendor lock-in.
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Camel case identifiers and folding

Chris Travers-5
In reply to this post by Tom Lane-2


On Sat, Mar 16, 2019 at 3:15 PM Tom Lane <[hidden email]> wrote:
Steve Haresnape <[hidden email]> writes:
> As I said, I don't want to quote my identifiers. I know what that does. I
> want to specify them in a certain way, see them in that same way, but refer
> to them in any old way.
> You can call it normalize or fold or whatever. It's a bad design choice,
> and not even a completely compliant choice.

> Is a cure contemplated? I know it's not just me that dislikes this.

No.

There have been previous discussions of allowing variant case-folding
rules, and the conclusion has always been that it would break so much
stuff as to be entirely not worth the trouble.

The big problem with making significant semantics changes like this
be optional is that authors of general-purpose tools then have to be
prepared to cope with all the possibilities.  That's a pretty enormous
cost to load onto other people.  If it *only* affected the core code,
maybe you could find somebody to do the work and call it done, but
actually the implications would reverberate across the entire Postgres
ecosystem.  That's a tough call to make for a change that can't even
be painted as meeting a widely-favored goal like better SQL spec
compliance.

Yeah.  I remember the annoyances caused by the removal of the implicit casts in 8.3 and I think this would be orders of magnitude worse.   The cast changes were clearly worth the (significant) pain.  I seriously doubt changing case folding would.

Now, in the spirit of full disclosure, I should say that the only form
of this idea that people have really spent significant effort looking
at is exactly the fully-SQL-spec-compliant case-folding rule, ie just
like Postgres normally does it except unquoted identifiers fold to
all-upper-case not all-lower.  Perhaps there's some reason why what
you want would be less painful than that turns out to be ... but I'm
not seeing such a reason offhand.  In fact I suspect your preference
is actually worse, it'd require behavior changes in more places.
As an example, I believe your request would require case-insensitive
uniqueness enforcement in the system catalogs' unique indexes on names.
You have no idea how large a can of worms that opens (but I'll just
mention that "which characters are letters" doesn't even have a well
defined universal answer).

+1 

                        regards, tom lane



--
Best Wishes,
Chris Travers

Efficito:  Hosted Accounting and ERP.  Robust and Flexible.  No vendor lock-in.
Reply | Threaded
Open this post in threaded view
|

Re: Camel case identifiers and folding

Thomas Kellerer
In reply to this post by Chris Travers-5
Chris Travers schrieb am 18.03.2019 um 13:16:
> MySQL also ignores the standard and does not fold cases.  
Unless you configure it to fold to lowercase.

Which in turn is a cure for the fact, that case-sensitivity of table names depends on the case-sensitivity of the underlying file system (not the operating system!) and the storage engine that is used (so in short: it's a mess and you can never be sure what the behaviour is)

Thomas


lup
Reply | Threaded
Open this post in threaded view
|

Re: Camel case identifiers and folding

lup
In reply to this post by Morris de Oryx


On 3/18/19 5:18 AM, Morris de Oryx wrote:
Sounds like I may have touched a nerve with some. If so, no offense intended!

There are cases where case-sensitivity is required or desirable, it would be silly to argue otherwise. Where you have such cases, then case-sensitive queries are great. Some RDBMS systems default to case-sensitive searches, others default to case-blind searches. I'll take that to mean that either choice has merit, and it's not foolish to prefer either. I need case-blind searches vanishingly close to 100% of the time, but other people have different conditions and may find that they almost never need case-blind searches. I find this extreme hard to imagine, but find it easy to imagine people who need case-sensitive searches quite often.

What I've been thinking about most are user-driven searches on user-oriented data. Users, at least any user I've ever had, don't want case-sensitive searches. They also don't care about diacritical characters, at least in English. I worked for a French company for many years. Diacritical searches were not always preferred, but sometimes were. It depends on your user community and their requirements and norms. Sometimes it comes down to an individual user. Options are good! I was really just trying to warn someone coming from a base-blind default about Postgres behavior because, well, it hurts if you aren't expecting it. That doesn't make Postgres wrong (it's not that kind of a choice), but it is important to know about.

I'm new to Postgres (only about a year in), and it's great. But I'm used to a case-blind search as a default. And, honestly, I can *never* remember a case when a user asked for a case-sensitive search. Ever. In 30+ years. Maybe it's just me. Just kidding, it's not just me. If you're presenting users with a search interface, you can find out by asking them. Or you can AB test a search UI where there is the option of case-sensitive/blind searching, but you randomly flip which is the default. For users, case-sensitive searches are assumed. That's what Google does. Seriously, Google === Search. It's not a hard test to run. If you find that with a case-blind search, 30% of user tick the box to make it case-sensitive, then you've got users that often do care about case-sensitive search.

And since it seems to be unclear at a few places in the discussion above: It absolutely makes sense to store data in its original form and to allow for case-sensitive searches, when required. It would be very weird to store

     Call me back Ishmael, I've gotta go...

and get back anything else, be that

     call me back ishmael, i've gotta go...

or

     CALL ME BACK ISHMAEL, I'VE GOTTA GO...

As far as I understand it in Postgres, you can:

* Use something like UPPER or LOWER in Every. Single. Search.

* Fold text to one case consistently in an index to make searches case-blind.

* Use citext.

* Teach and convince every user to enter and search for data case-sensitively correctly Every. Single. Time.

On that last point, good luck. Here's an example, I'm keen on birds. Do you write it:

Black-shouldered Kite
Black-Shouldered Kite


Since you say you're new(ish) to postgres I have to ask have you bumped into the tilde(~) operator for text? And the critical tildeAstersik ( ~* )?