some grammar refactoring

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

some grammar refactoring

Peter Eisentraut-6
Here is a series of patches to do some refactoring in the grammar around
the commands COMMENT, DROP, SECURITY LABEL, and ALTER EXTENSION ...
ADD/DROP.  In the grammar, these commands (with some exceptions)
basically just take a reference to an object and later look it up in C
code.  Some of that was already generalized individually for each
command (drop_type_any_name, drop_type_name, etc.).  This patch combines
it into common lists for all these commands.

Advantages:

- Avoids having to list each object type at least four times.

- Object types not supported by security labels or extensions are now
explicitly listed and give a proper error message.  Previously, this was
just encoded in the grammar itself and specifying a non-supported object
type would just give a parse error.

- Reduces lines of code in gram.y.

- Removes some old cruft.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

v1-0001-Remove-redundant-grammar-symbols.patch (19K) Download Attachment
v1-0002-Fold-AlterForeignTableStmt-into-AlterTableStmt.patch (4K) Download Attachment
v1-0003-Remove-deprecated-syntax-from-CREATE-DROP-LANGUAG.patch (4K) Download Attachment
v1-0004-Refactor-DROP-LANGUAGE-grammar.patch (3K) Download Attachment
v1-0005-Grammar-object-type-refactoring.patch (18K) Download Attachment
v1-0006-Refactor-AlterExtensionContentsStmt-grammar.patch (12K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: some grammar refactoring

Robert Haas
On Tue, May 19, 2020 at 2:43 AM Peter Eisentraut
<[hidden email]> wrote:
> Here is a series of patches to do some refactoring in the grammar around
> the commands COMMENT, DROP, SECURITY LABEL, and ALTER EXTENSION ...
> ADD/DROP.  In the grammar, these commands (with some exceptions)
> basically just take a reference to an object and later look it up in C
> code.  Some of that was already generalized individually for each
> command (drop_type_any_name, drop_type_name, etc.).  This patch combines
> it into common lists for all these commands.

I haven't reviewed the code, but +1 for the idea.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: some grammar refactoring

Rushabh Lathia
In reply to this post by Peter Eisentraut-6


On Tue, May 19, 2020 at 12:13 PM Peter Eisentraut <[hidden email]> wrote:
Here is a series of patches to do some refactoring in the grammar around
the commands COMMENT, DROP, SECURITY LABEL, and ALTER EXTENSION ...
ADD/DROP.  In the grammar, these commands (with some exceptions)
basically just take a reference to an object and later look it up in C
code.  Some of that was already generalized individually for each
command (drop_type_any_name, drop_type_name, etc.).  This patch combines
it into common lists for all these commands.

Advantages:

- Avoids having to list each object type at least four times.

- Object types not supported by security labels or extensions are now
explicitly listed and give a proper error message.  Previously, this was
just encoded in the grammar itself and specifying a non-supported object
type would just give a parse error.

- Reduces lines of code in gram.y.

- Removes some old cruft.


I liked the idea.

I had quick glance through the patches and also did quick review and testing.
I haven't found any issue with the patch.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Rushabh Lathia
Reply | Threaded
Open this post in threaded view
|

Re: some grammar refactoring

Peter Eisentraut-6
In reply to this post by Peter Eisentraut-6
On 2020-05-19 08:43, Peter Eisentraut wrote:
> Here is a series of patches to do some refactoring in the grammar around
> the commands COMMENT, DROP, SECURITY LABEL, and ALTER EXTENSION ...
> ADD/DROP.  In the grammar, these commands (with some exceptions)
> basically just take a reference to an object and later look it up in C
> code.  Some of that was already generalized individually for each
> command (drop_type_any_name, drop_type_name, etc.).  This patch combines
> it into common lists for all these commands.

While most of this patch set makes no behavior changes by design, I
should point out this little change hidden in the middle:

     Remove deprecated syntax from CREATE/DROP LANGUAGE

     Remove the option to specify the language name as a single-quoted
     string.  This has been obsolete since ee8ed85da3b.  Removing it allows
     better grammar refactoring.

     The syntax of the CREATE FUNCTION LANGUAGE clause is not changed.

(ee8ed85da3b is in PG 7.2.)

I expect this to be uncontroversial, but it should be pointed out.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: some grammar refactoring

Mark Dilger-5
In reply to this post by Peter Eisentraut-6


> On May 18, 2020, at 11:43 PM, Peter Eisentraut <[hidden email]> wrote:
>
> Here is a series of patches to do some refactoring in the grammar around the commands COMMENT, DROP, SECURITY LABEL, and ALTER EXTENSION ... ADD/DROP.  In the grammar, these commands (with some exceptions) basically just take a reference to an object and later look it up in C code.  Some of that was already generalized individually for each command (drop_type_any_name, drop_type_name, etc.).  This patch combines it into common lists for all these commands.
>
> Advantages:
>
> - Avoids having to list each object type at least four times.
>
> - Object types not supported by security labels or extensions are now explicitly listed and give a proper error message.  Previously, this was just encoded in the grammar itself and specifying a non-supported object type would just give a parse error.
>
> - Reduces lines of code in gram.y.
>
> - Removes some old cruft.

I like the general direction you are going with this, but the decision in v1-0006 to move the error for invalid object types out of gram.y and into extension.c raises an organizational question.   At some places in gram.y, there is C code that checks parsed tokens and ereports if they are invalid, in some sense extending the grammar right within gram.y.  In many other places, including what you are doing in this patch, the token is merely stored in a Stmt object with the error checking delayed until command processing.  For tokens which need to be checked against the catalogs, that decision makes perfect sense.  But for ones where all the information necessary to validate the token exists in the parser, it is not clear to me why it gets delayed until command processing.  Is there a design principle behind when these checks are done in gram.y vs. when they are delayed to the command processing?  I'm guessing in v1-0006 that you are doing it this way because there are multiple places in gram.y where tokens would need to be checked, and by delaying the check until ExecAlterExtensionContentsStmt, you can put the check all in one place.  Is that all it is?

I have had reason in the past to want to reorganize gram.y to have all these types of checks in a single, consistent format and location, rather than scattered through gram.y and backend/commands/.  Does anybody else have an interest in this?

My interest in this stems from the fact that bison can be run to generate data files that can then be used in reverse to generate random SQL.  The more the parsing logic is visible to bison, the more useful the generated data files are.  But a single, consistent design for extra-grammatical error checks could help augment those files fairly well, too.


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Reply | Threaded
Open this post in threaded view
|

Re: some grammar refactoring

Peter Eisentraut-6
On 2020-05-22 18:53, Mark Dilger wrote:
> I like the general direction you are going with this, but the decision in v1-0006 to move the error for invalid object types out of gram.y and into extension.c raises an organizational question.   At some places in gram.y, there is C code that checks parsed tokens and ereports if they are invalid, in some sense extending the grammar right within gram.y.  In many other places, including what you are doing in this patch, the token is merely stored in a Stmt object with the error checking delayed until command processing.  For tokens which need to be checked against the catalogs, that decision makes perfect sense.  But for ones where all the information necessary to validate the token exists in the parser, it is not clear to me why it gets delayed until command processing.  Is there a design principle behind when these checks are done in gram.y vs. when they are delayed to the command processing?  I'm guessing in v1-0006 that you are doing it this way because there are multiple places in gram.y where tokens would need to be checked, and by delaying the check until ExecAlterExtensionContentsStmt, you can put the check all in one place.  Is that all it is?

We have been for some time moving to a style where we rely on switch
statements around OBJECT_* constants to (a) decide what is allowed with
certain object types, and (b) make sure we have an explicit decision on
each object type and don't forget any.  This has worked well, I think.

This is more of that.  Before this patch, it would have been pretty hard
to find out which object types are supported with extensions or security
labels, except by very carefully reading the grammar.

Moreover, you now get a proper error message for unsupported object
types rather than just a generic parse error.

> I have had reason in the past to want to reorganize gram.y to have all these types of checks in a single, consistent format and location, rather than scattered through gram.y and backend/commands/.  Does anybody else have an interest in this?
>
> My interest in this stems from the fact that bison can be run to generate data files that can then be used in reverse to generate random SQL.  The more the parsing logic is visible to bison, the more useful the generated data files are.  But a single, consistent design for extra-grammatical error checks could help augment those files fairly well, too.

It's certainly already the case that the grammar accepts statements that
end up being invalid, even if you ignore catalog lookup.  I don't think
my patch moves the needle on this in a significant way.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Reply | Threaded
Open this post in threaded view
|

Re: some grammar refactoring

Mark Dilger-5


> On May 25, 2020, at 2:55 AM, Peter Eisentraut <[hidden email]> wrote:
>
> On 2020-05-22 18:53, Mark Dilger wrote:
>> I like the general direction you are going with this, but the decision in v1-0006 to move the error for invalid object types out of gram.y and into extension.c raises an organizational question.   At some places in gram.y, there is C code that checks parsed tokens and ereports if they are invalid, in some sense extending the grammar right within gram.y.  In many other places, including what you are doing in this patch, the token is merely stored in a Stmt object with the error checking delayed until command processing.  For tokens which need to be checked against the catalogs, that decision makes perfect sense.  But for ones where all the information necessary to validate the token exists in the parser, it is not clear to me why it gets delayed until command processing.  Is there a design principle behind when these checks are done in gram.y vs. when they are delayed to the command processing?  I'm guessing in v1-0006 that you are doing it this way because there are multiple places in gram.y where tokens would need to be checked, and by delaying the check until ExecAlterExtensionContentsStmt, you can put the check all in one place.  Is that all it is?
>
> We have been for some time moving to a style where we rely on switch statements around OBJECT_* constants to (a) decide what is allowed with certain object types, and (b) make sure we have an explicit decision on each object type and don't forget any.  This has worked well, I think.

Yes, I think so, too.  I like that overall design.

> This is more of that.

Yes, it is.

> Before this patch, it would have been pretty hard to find out which object types are supported with extensions or security labels, except by very carefully reading the grammar.

Fair enough.

> Moreover, you now get a proper error message for unsupported object types rather than just a generic parse error.

Sounds great.

>> I have had reason in the past to want to reorganize gram.y to have all these types of checks in a single, consistent format and location, rather than scattered through gram.y and backend/commands/.  Does anybody else have an interest in this?
>> My interest in this stems from the fact that bison can be run to generate data files that can then be used in reverse to generate random SQL.  The more the parsing logic is visible to bison, the more useful the generated data files are.  But a single, consistent design for extra-grammatical error checks could help augment those files fairly well, too.
>
> It's certainly already the case that the grammar accepts statements that end up being invalid, even if you ignore catalog lookup.  I don't think my patch moves the needle on this in a significant way.

I don't think it moves the needle too much, either.  But since your patch is entirely a refactoring patch and not a feature patch, I thought it would be fair to ask larger questions about how the code should be structured.  I like using enums and switch statements and getting better error messages, but there doesn't seem to be any fundamental reason why that should be in the command execution step.  It feels like a layering violation to me.

I don't object to this patch getting committed.  A subsequent patch to consolidate all the grammar checks into src/backend/parser and out of src/backend/commands won't be blocked by this.


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Reply | Threaded
Open this post in threaded view
|

Re: some grammar refactoring

Peter Eisentraut-6
On 2020-05-25 21:09, Mark Dilger wrote:
> I don't think it moves the needle too much, either.  But since your patch is entirely a refactoring patch and not a feature patch, I thought it would be fair to ask larger questions about how the code should be structured.  I like using enums and switch statements and getting better error messages, but there doesn't seem to be any fundamental reason why that should be in the command execution step.  It feels like a layering violation to me.

Most utility commands don't have an intermediate parse analysis pass.
They just go straight from the grammar to the execution.  Maybe that
could be rethought, but that's the way it is now.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: some grammar refactoring

Robert Haas
On Tue, May 26, 2020 at 4:28 AM Peter Eisentraut
<[hidden email]> wrote:
> On 2020-05-25 21:09, Mark Dilger wrote:
> > I don't think it moves the needle too much, either.  But since your patch is entirely a refactoring patch and not a feature patch, I thought it would be fair to ask larger questions about how the code should be structured.  I like using enums and switch statements and getting better error messages, but there doesn't seem to be any fundamental reason why that should be in the command execution step.  It feels like a layering violation to me.
>
> Most utility commands don't have an intermediate parse analysis pass.
> They just go straight from the grammar to the execution.  Maybe that
> could be rethought, but that's the way it is now.

I think it can and should be rethought at some point. The present
split leads to a lot of weird coding. We've had security
vulnerabilities that were due to things like passing the same RangeVar
to two different places, leading to two different lookups for the name
that could be induced to return different OIDs. It also leads to a lot
of fuzzy thinking about where locks are taken, in which order, and how
many times, and with what strength. The code for queries seems to have
been thought through a lot more carefully, because the existence of
prepared queries makes mistakes a lot more noticeable. I hope some day
someone will be motivated to improve the situation for DDL as well,
though it will probably be a thankless task.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: some grammar refactoring

Tom Lane-2
Robert Haas <[hidden email]> writes:
> On Tue, May 26, 2020 at 4:28 AM Peter Eisentraut
> <[hidden email]> wrote:
>> Most utility commands don't have an intermediate parse analysis pass.
>> They just go straight from the grammar to the execution.  Maybe that
>> could be rethought, but that's the way it is now.

> I think it can and should be rethought at some point.

The other problem is that the ones that do have explicit parse analysis
tend to be doing it at the wrong time.  I've fixed some ALTER TABLE
problems by rearranging that, but we still have open bugs that are due
to this type of mistake, eg [1].  I agree that we need a rethink, and
we need it badly.

If this patch is changing when any parse-analysis-like actions happen,
then I would say that it needs very careful review --- much more than
the "refactoring" label would suggest.  Maybe it's making things better,
or maybe it doesn't matter; but this area is a minefield.

                        regards, tom lane

[1] https://www.postgresql.org/message-id/flat/16272-6e32da020e9a9381%40postgresql.org