PostgreSQL vs SQL Standard

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

PostgreSQL vs SQL Standard

Andrew Gierth
I created this wiki page:

https://wiki.postgresql.org/wiki/PostgreSQL_vs_SQL_Standard

I'd been thinking of collecting this information for a while, but was
spurred into further action when someone referred me to Markus Winand's
PGCon talk slides.

I think I got all the issues I currently know of, but there may be
more, and others may disagree with my classification of issues or the
rationales for violating the spec. Any feedback?

--
Andrew (irc:RhodiumToad)

Reply | Threaded
Open this post in threaded view
|

Re: PostgreSQL vs SQL Standard

Tom Lane-2
Andrew Gierth <[hidden email]> writes:
> I created this wiki page:
> https://wiki.postgresql.org/wiki/PostgreSQL_vs_SQL_Standard

Good idea!

> I think I got all the issues I currently know of, but there may be
> more, and others may disagree with my classification of issues or the
> rationales for violating the spec. Any feedback?

WRT 1.1 ... I doubt that redefining DROP DOMAIN as you describe has "no
major issues".  It sounds to me like an incredibly ugly wart on the
cascaded dependency logic.  Quite aside from wartiness, adding new
objects/dependencies as part of a DROP is broken by design.  What if
the domain drop has cascaded from something the domain's constraints
themselves depend on?  I'd put this as a "has design-level problems" item.

WRT 3.2 on select-list aliases, the postfix-operator issue is only one of
several reasons why we can't support that.  There was some more-detailed
discussion about that awhile back,
https://www.postgresql.org/message-id/flat/99ad0450-b1ab-702f-48ef-6972b630bc87%40BlueTreble.com

Constraint name scope: I think it's an overstatement to say that this
makes some info-schema views "useless".  "Under-determined" might be an
appropriate word.  Or you could say "useless unless the application
limits itself to follow the SQL spec's restriction on names".

Object ownership scope: I have not really dug into the spec on this
point, but I recall from our own docs that "schema owner owns all
contained objects too" is true only in DBs that implement some minimal
subset of the standard.  So that might need more explanation.  In
any case, we can surely mount a very strong defense in terms of
usability/flexibility here, we don't need to say it's just historical.

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: PostgreSQL vs SQL Standard

Andrew Gierth
>>>>> "Tom" == Tom Lane <[hidden email]> writes:

 >> I think I got all the issues I currently know of, but there may be
 >> more, and others may disagree with my classification of issues or the
 >> rationales for violating the spec. Any feedback?

 Tom> WRT 1.1 ... I doubt that redefining DROP DOMAIN as you describe
 Tom> has "no major issues". It sounds to me like an incredibly ugly
 Tom> wart on the cascaded dependency logic. Quite aside from wartiness,
 Tom> adding new objects/dependencies as part of a DROP is broken by
 Tom> design. What if the domain drop has cascaded from something the
 Tom> domain's constraints themselves depend on? I'd put this as a "has
 Tom> design-level problems" item.

Oh wow, I hadn't noticed that dropping a function referenced from a
domain's default or constraint drops the whole domain rather than just
removing the default or constraint the way it would with a table.

That seems pretty bad to me, in the sense of being potentially a nasty
footgun for anyone using domains, but certainly you are correct on the
effect on how we categorize the problem.

(If it were not the case, then the only way we'd end up cascading to
dropping a domain would be if we dropped the base type, in which case
the columns are going to go away anyway)

 Tom> WRT 3.2 on select-list aliases, the postfix-operator issue is only
 Tom> one of several reasons why we can't support that. There was some
 Tom> more-detailed discussion about that awhile back,

 Tom> https://www.postgresql.org/message-id/flat/99ad0450-b1ab-702f-48ef-6972b630bc87%40BlueTreble.com

OK, so to summarize, we'd also have to remove ISNULL or make it a
reserved word, and also make VARYING reserved (as it is in the spec)?

The spec doesn't allow "SELECT col AS reservedword", even though we do,
so we don't really have to support "SELECT col reservedword". i.e. we
don't need to get all the way to allowing a_expr ColLabel production, it
would suffice to get to a_expr ColId.

 Tom> Constraint name scope: I think it's an overstatement to say that
 Tom> this makes some info-schema views "useless". "Under-determined"
 Tom> might be an appropriate word.

But in practice that makes it useless except in cases where you
generally don't care about i_s anyway.

 Tom> Or you could say "useless unless the application limits itself to
 Tom> follow the SQL spec's restriction on names".

I'm not sure any applications use i_s to introspect on their own foreign
key constraints; every time I've had to give the "that doesn't work
because PG's constraint name scope differs from the standard" speech to
someone it's because they've been trying to write something more generic
than a single application.

(Though that could be selection bias I guess.)

Someone also pointed out the last time this came up that handling of
constraint names on inherited tables means that an application may be
unable to avoid using duplicate names.

 Tom> Object ownership scope: I have not really dug into the spec on
 Tom> this point, but I recall from our own docs that "schema owner owns
 Tom> all contained objects too" is true only in DBs that implement some
 Tom> minimal subset of the standard.

The spec literally does not use the term "owns" or "owned by" (in the
sense of an authorization identifier owning an object) anywhere except
for schemas.

If you look at <table definition>, you'll see that the authorization
identifier A which is the recipient of the table's initial GRANTs is
defined as being that of the schema.

 Tom> So that might need more explanation. In any case, we can surely
 Tom> mount a very strong defense in terms of usability/flexibility
 Tom> here, we don't need to say it's just historical.

Sure.

--
Andrew (irc:RhodiumToad)

Reply | Threaded
Open this post in threaded view
|

Re: PostgreSQL vs SQL Standard

Andrew Gierth
In reply to this post by Tom Lane-2
>>>>> "Tom" == Tom Lane <[hidden email]> writes:

 Tom> WRT 3.2 on select-list aliases, the postfix-operator issue is only
 Tom> one of several reasons why we can't support that. There was some
 Tom> more-detailed discussion about that awhile back,

I beat at the grammar a bit to see what it would take to fix it at least
to the extent of allowing a_expr ColId in a select list after removing
postfix ops. It looked like it was doable by making these keywords more
reserved (all of which are already reserved words per spec):

  DOUBLE, DAY, FILTER, HOUR, MINUTE, MONTH, OVER, PRECISION, SECOND,
  VARYING, WITHIN, WITHOUT, YEAR

(notice that ISNULL is not actually a problem since it's already a
type_func_name_keyword and thus not a legal ColId)

Example ambiguous inputs that would require these to be reserved:

  SELECT double precision FROM ...
  SELECT interval '1' second FROM ...   -- also year,month,day,hour,minute
  SELECT count(x) filter (...
  SELECT percentile_cont(0.5) within group (...
  SELECT x::character varying FROM ...
  SELECT x::timestamp without time zone FROM ...

--
Andrew (irc:RhodiumToad)

Reply | Threaded
Open this post in threaded view
|

Re: PostgreSQL vs SQL Standard

Andrew Gierth
>>>>> "Andrew" == Andrew Gierth <[hidden email]> writes:

 Andrew> I beat at the grammar a bit to see what it would take to fix it
 Andrew> at least to the extent of allowing a_expr ColId in a select
 Andrew> list after removing postfix ops. It looked like it was doable
 Andrew> by making these keywords more reserved (all of which are
 Andrew> already reserved words per spec):

 Andrew>   DOUBLE, DAY, FILTER, HOUR, MINUTE, MONTH, OVER, PRECISION, SECOND,
 Andrew>   VARYING, WITHIN, WITHOUT, YEAR

oops, also NATIONAL

--
Andrew (irc:RhodiumToad)

Reply | Threaded
Open this post in threaded view
|

Re: PostgreSQL vs SQL Standard

Tom Lane-2
In reply to this post by Andrew Gierth
Andrew Gierth <[hidden email]> writes:
> Oh wow, I hadn't noticed that dropping a function referenced from a
> domain's default or constraint drops the whole domain rather than just
> removing the default or constraint the way it would with a table.

Ouch.  Seems like possibly a bug ... shouldn't we make only that
constraint depend on the function?  But that's orthogonal to the
DROP DOMAIN behavior you were describing.

> (If it were not the case, then the only way we'd end up cascading to
> dropping a domain would be if we dropped the base type, in which case
> the columns are going to go away anyway)

Nope, drop schema and drop owned by (at the least) could also cascade to
a domain.

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: PostgreSQL vs SQL Standard

Tom Lane-2
In reply to this post by Andrew Gierth
Andrew Gierth <[hidden email]> writes:
> I beat at the grammar a bit to see what it would take to fix it at least
> to the extent of allowing a_expr ColId in a select list after removing
> postfix ops. It looked like it was doable by making these keywords more
> reserved (all of which are already reserved words per spec):
>   DOUBLE, DAY, FILTER, HOUR, MINUTE, MONTH, OVER, PRECISION, SECOND,
>   VARYING, WITHIN, WITHOUT, YEAR

Yeah, a side effect of allowing "a_expr ColId" is that we can expect,
going forward, that a lot of new keywords are going to have to become
fully reserved that otherwise wouldn't have to be.  This is particularly
a problem because of the SQL committee's COBOL-hangover tendency to
invent new syntax that involves sequences of keywords; we usually
don't have a good way to deal with that short of making the first
keyword(s) reserved.

It's arguable that going down that path will, in the long run, lead to
breaking more applications (via their table/column names suddenly becoming
reserved in some new version) than we rescue from having to quote their
SELECT aliases.  At the very least we need to recognize that this is far
from cost-free.

(wanders away wondering exactly what parsing technology the SQL committee
thinks implementations use...)

                        regards, tom lane

PS: My older message about this mentioned only the VARYING and ISNULL
cases, which I think means that I had ideas about how not to have to
reserve the other ones you mention.  But the larger point holds.

Reply | Threaded
Open this post in threaded view
|

Re: PostgreSQL vs SQL Standard

David Fetter
On Sun, Jun 10, 2018 at 11:32:56AM -0400, Tom Lane wrote:

> Andrew Gierth <[hidden email]> writes:
> > I beat at the grammar a bit to see what it would take to fix it at least
> > to the extent of allowing a_expr ColId in a select list after removing
> > postfix ops. It looked like it was doable by making these keywords more
> > reserved (all of which are already reserved words per spec):
> >   DOUBLE, DAY, FILTER, HOUR, MINUTE, MONTH, OVER, PRECISION, SECOND,
> >   VARYING, WITHIN, WITHOUT, YEAR
>
> Yeah, a side effect of allowing "a_expr ColId" is that we can expect,
> going forward, that a lot of new keywords are going to have to become
> fully reserved that otherwise wouldn't have to be.  This is particularly
> a problem because of the SQL committee's COBOL-hangover tendency to
> invent new syntax that involves sequences of keywords; we usually
> don't have a good way to deal with that short of making the first
> keyword(s) reserved.
>
> It's arguable that going down that path will, in the long run, lead to
> breaking more applications (via their table/column names suddenly becoming
> reserved in some new version) than we rescue from having to quote their
> SELECT aliases.  At the very least we need to recognize that this is far
> from cost-free.
>
> (wanders away wondering exactly what parsing technology the SQL committee
> thinks implementations use...)

If you're wondering why people think specifications that don't include
a reference implementation are a bad idea...

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Reply | Threaded
Open this post in threaded view
|

Re: PostgreSQL vs SQL Standard

Alvaro Herrera-9
In reply to this post by Tom Lane-2
On 2018-Jun-10, Tom Lane wrote:

> (wanders away wondering exactly what parsing technology the SQL committee
> thinks implementations use...)

Umm, doesn't this come from our decision to make the AS optional there?

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: PostgreSQL vs SQL Standard

Tom Lane-2
Alvaro Herrera <[hidden email]> writes:
> On 2018-Jun-10, Tom Lane wrote:
>> (wanders away wondering exactly what parsing technology the SQL committee
>> thinks implementations use...)

> Umm, doesn't this come from our decision to make the AS optional there?

No, it was THEIR decision to make AS optional.  I'd like to think that
we'd never have done anything so stupid if we were designing the syntax.

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: PostgreSQL vs SQL Standard

Robert Haas
In reply to this post by Tom Lane-2
On Sun, Jun 10, 2018 at 11:32 AM, Tom Lane <[hidden email]> wrote:

> Andrew Gierth <[hidden email]> writes:
>> I beat at the grammar a bit to see what it would take to fix it at least
>> to the extent of allowing a_expr ColId in a select list after removing
>> postfix ops. It looked like it was doable by making these keywords more
>> reserved (all of which are already reserved words per spec):
>>   DOUBLE, DAY, FILTER, HOUR, MINUTE, MONTH, OVER, PRECISION, SECOND,
>>   VARYING, WITHIN, WITHOUT, YEAR
>
> Yeah, a side effect of allowing "a_expr ColId" is that we can expect,
> going forward, that a lot of new keywords are going to have to become
> fully reserved that otherwise wouldn't have to be.  This is particularly
> a problem because of the SQL committee's COBOL-hangover tendency to
> invent new syntax that involves sequences of keywords; we usually
> don't have a good way to deal with that short of making the first
> keyword(s) reserved.
>
> It's arguable that going down that path will, in the long run, lead to
> breaking more applications (via their table/column names suddenly becoming
> reserved in some new version) than we rescue from having to quote their
> SELECT aliases.  At the very least we need to recognize that this is far
> from cost-free.

It depends on the source of those applications.  Applications
originally written for PostgreSQL are going to break if we reserve
those keywords.  Applications originally written for other databases
that already reserve those words won't, and migrating to PostgreSQL
will become easier for those people.  I think this is to some extent a
question about whether it's more important to have backward
compatibility with our own previous releases or compatibility with
other SQL database systems.

> (wanders away wondering exactly what parsing technology the SQL committee
> thinks implementations use...)

I wonder about this, too, because I've noticed that one other system
in particular seems to do parsing in a quite different manner than we
do (but I have no idea specifically how they do it).  bison is a good
tool in many ways and converting to something else would be a huge
amount of work, but there are also a number of things about it that
are not so great:

- The fact that conflicts can only be resolved with precedence
declarations sucks.  You can only set the priority of rule A relative
to rule B unless they've both got a precedence assigned, and there's
frequently no non-invasive way to accomplish that.  It would be nice
to be able to mark the postfix operator production as "weak" so that
it's only used when no other interpretation is possible.  Of course
this kind of thing could be overdone leading to a confusing grammar,
but it's not a bad idea to be able to do things like this to handle
corner cases, at least IMHO.

- It really only handles LR(1), but for certain cases more tokens of
lookahead would be really useful.  NULLS_LA and NOT_LA are nasty hacks
to work around the lack of adequate lookahead, and there are other
cases where we could do nicer things if we had it.

- There's no way to modify the parser at runtime, which means our
extension facility can't invent new SQL syntax, not even e.g. CREATE
THINGY and DROP THINGY.  (Our lack of support for thingys is really
holding us back.)

- The release cadence is extremely slow.  They just released 3.0.5,
but 3.0.4 came out in 2015 and 3.0.3 in 2013.  It's hard to imagine
bison going away completely, but it's also pretty hard to imagine it
ever having any new features that would help us.  And if we find a
bug, waiting for the next release to fix it is going to be painful.
To all appearances, it's only a little ahead of abandonware.

I'm not sure that there's a better tool than bison out there and I'm
not volunteering to write one in my spare time, but if something turns
up, we might not want to dismiss it out of hand.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Reply | Threaded
Open this post in threaded view
|

Re: PostgreSQL vs SQL Standard

Alvaro Herrera-9
In reply to this post by Andrew Gierth
On 2018-Jun-10, Andrew Gierth wrote:

> I created this wiki page:
>
> https://wiki.postgresql.org/wiki/PostgreSQL_vs_SQL_Standard

Great stuff.

May I suggest that it would be useful to have links to discussions
related to each point, where some exist?

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: PostgreSQL vs SQL Standard

Brad DeJong-2
In reply to this post by Andrew Gierth
On Sat, Jun 9, 2018 at 8:37 PM, Andrew Gierth <[hidden email]> wrote:
> I think I got all the issues I currently know of, but there may be
> more, and others may disagree with my classification of issues or the
> rationales for violating the spec. Any feedback?

Related to, but I think distinct from

    "Trailing spaces in character(n)"
        Trailing spaces in char(n) values are removed in contexts where the spec requires they be kept.

E021-03 Character literals subclause 5.3, "<literal>": <quote> [ <character representation>... ] <quote>

In subclause 5.3 (SQL standard 2011 part 2: foundation), <character string literal> is defined (ignoring
the character set specifier and the weird multi-line continuations) as 

    <character string literal> ::=  <quote> [ <character representation>... ] <quote>

syntax rule 17 states

    "The declared type of a <character string literal> is fixed-length character string."

In PostgreSQL, the same syntax (<quote> [ <character representation>... ] <quote>) is a string constant and in
the PostgreSQL documentation, section 4.1.2 constants, it says that string constants are one
of the three kinds of implicitly typed constants.

Where you see the difference is that trailing spaces in a string constant are retained when the type
is resolved to text, but if the constant was typed as bpchar and then cast to text, the trailing spaces
would be trimmed.


You also see that the type of a string constant is implicit in cases where PostgreSQL is
unable to determine the type of a string constant from its context.


I could be reading some of this incorrectly, but putting it all together, it looks to me like
PostgreSQL does not meet a strict interpretation of E021-03.

I'd put it in the WONTFIX section.