pg_upgrade fails with non-standard ACL

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

pg_upgrade fails with non-standard ACL

Anastasia Lubennikova
pg_upgrade from 9.6 fails if old cluster had non-standard ACL
on pg_catalog functions that have changed between versions,
for example pg_stop_backup(boolean).

Error:

pg_restore: creating ACL "pg_catalog.FUNCTION "pg_stop_backup"()"
pg_restore: creating ACL "pg_catalog.FUNCTION
"pg_stop_backup"("exclusive" boolean, OUT "lsn" "pg_lsn", OUT
"labelfile" "text", OUT "spcmapfile" "text")"
pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 2169; 0 0 ACL FUNCTION
"pg_stop_backup"("exclusive" boolean, OUT "lsn" "pg_lsn", OUT
"labelfile" "text", OUT "spcmapfile" "text") anastasia
pg_restore: [archiver (db)] could not execute query: ERROR: function
pg_catalog.pg_stop_backup(boolean) does not exist
     Command was: GRANT ALL ON FUNCTION
"pg_catalog"."pg_stop_backup"("exclusive" boolean, OUT "lsn" "pg_lsn",
OUT "labelfile" "text", OUT "spcmapfile" "text") TO "backup";

Steps to reproduce:
1) create a database with pg9.6
2) create a user and change grants on pg_stop_backup(boolean):
CREATE ROLE backup WITH LOGIN;
GRANT USAGE ON SCHEMA pg_catalog TO backup;
GRANT EXECUTE ON FUNCTION pg_stop_backup() TO backup;
GRANT EXECUTE ON FUNCTION pg_stop_backup(boolean) TO backup;
3) perform pg_upgrade to v10 (or any version above)

The problem exists since we added to pg_dump support of ACL changes of
pg_catalog functions in commit 23f34fa4b.

I think this is a bug since it unpredictably affects user experience, so
I propose to backpatch the fix.
Script to reproduce the problem and the patch to fix it (credit to
Arthur Zakirov) are attached.

Current patch contains a flag for  pg_dump --change-old-names to enforce
correct behavior.
I wonder, if we can make it default behavior for pg_upgrade?

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


pg_dump_ACL_test.sh (2K) Download Attachment
pg_dump_ACL_fix_v0.patch (8K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pg_upgrade fails with non-standard ACL

Bruce Momjian
On Thu, Jul 18, 2019 at 06:53:12PM +0300, Anastasia Lubennikova wrote:

> pg_upgrade from 9.6 fails if old cluster had non-standard ACL
> on pg_catalog functions that have changed between versions,
> for example pg_stop_backup(boolean).
>
> Error:
>
> pg_restore: creating ACL "pg_catalog.FUNCTION "pg_stop_backup"()"
> pg_restore: creating ACL "pg_catalog.FUNCTION "pg_stop_backup"("exclusive"
> boolean, OUT "lsn" "pg_lsn", OUT "labelfile" "text", OUT "spcmapfile"
> "text")"
> pg_restore: [archiver (db)] Error while PROCESSING TOC:
> pg_restore: [archiver (db)] Error from TOC entry 2169; 0 0 ACL FUNCTION
> "pg_stop_backup"("exclusive" boolean, OUT "lsn" "pg_lsn", OUT "labelfile"
> "text", OUT "spcmapfile" "text") anastasia
> pg_restore: [archiver (db)] could not execute query: ERROR: function
> pg_catalog.pg_stop_backup(boolean) does not exist
>     Command was: GRANT ALL ON FUNCTION
> "pg_catalog"."pg_stop_backup"("exclusive" boolean, OUT "lsn" "pg_lsn", OUT
> "labelfile" "text", OUT "spcmapfile" "text") TO "backup";
>
> Steps to reproduce:
> 1) create a database with pg9.6
> 2) create a user and change grants on pg_stop_backup(boolean):
> CREATE ROLE backup WITH LOGIN;
> GRANT USAGE ON SCHEMA pg_catalog TO backup;
> GRANT EXECUTE ON FUNCTION pg_stop_backup() TO backup;
> GRANT EXECUTE ON FUNCTION pg_stop_backup(boolean) TO backup;
> 3) perform pg_upgrade to v10 (or any version above)
>
> The problem exists since we added to pg_dump support of ACL changes of
> pg_catalog functions in commit 23f34fa4b.
>
> I think this is a bug since it unpredictably affects user experience, so I
> propose to backpatch the fix.
> Script to reproduce the problem and the patch to fix it (credit to Arthur
> Zakirov) are attached.

Uh, wouldn't this affect any default-installed function where the
permission are modified?  Is fixing only a few functions really helpful?

--
  Bruce Momjian  <[hidden email]>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +


Reply | Threaded
Open this post in threaded view
|

Re: pg_upgrade fails with non-standard ACL

Tom Lane-2
Bruce Momjian <[hidden email]> writes:
> On Thu, Jul 18, 2019 at 06:53:12PM +0300, Anastasia Lubennikova wrote:
>> pg_upgrade from 9.6 fails if old cluster had non-standard ACL
>> on pg_catalog functions that have changed between versions,
>> for example pg_stop_backup(boolean).

> Uh, wouldn't this affect any default-installed function where the
> permission are modified?  Is fixing only a few functions really helpful?

No, it's just functions whose signatures have changed enough that
a GRANT won't find them.  I think the idea is that the set of
potentially-affected functions is determinate.  I have to say that
the proposed patch seems like a complete kluge, though.  For one
thing we'd have to maintain the list of affected functions in each
future release, and I have no faith in our remembering to do that.

It's also fair to question whether pg_upgrade should even try to
cope with such cases.  If the function has changed signature,
it might well be that it's also changed behavior enough so that
any previously-made grants need reconsideration.  (Maybe we should
just suppress the old grant rather than transferring it.)

Still, this does seem like a gap in the pg_init_privs mechanism.
I wonder if Stephen has any thoughts about what ought to happen.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: pg_upgrade fails with non-standard ACL

Stephen Frost
Greetings,

* Tom Lane ([hidden email]) wrote:

> Bruce Momjian <[hidden email]> writes:
> > On Thu, Jul 18, 2019 at 06:53:12PM +0300, Anastasia Lubennikova wrote:
> >> pg_upgrade from 9.6 fails if old cluster had non-standard ACL
> >> on pg_catalog functions that have changed between versions,
> >> for example pg_stop_backup(boolean).
>
> > Uh, wouldn't this affect any default-installed function where the
> > permission are modified?  Is fixing only a few functions really helpful?
>
> No, it's just functions whose signatures have changed enough that
> a GRANT won't find them.  I think the idea is that the set of
> potentially-affected functions is determinate.  I have to say that
> the proposed patch seems like a complete kluge, though.  For one
> thing we'd have to maintain the list of affected functions in each
> future release, and I have no faith in our remembering to do that.
Well, we aren't likely to do that ourselves, no, but perhaps we could
manage it with some prodding by having the buildfarm check for such
cases, not unlike how library maintainers check the ABI between versions
of the library they manage.  Extension authors also deal with these
kinds of changes routinely when writing the upgrade scripts to go
between versions of their extension.  I'm not convinced that this is a
great approach to go down either, to be clear.  When going across major
versions, making people update their systems/code and re-test is
typically entirely reasonable to me.

> It's also fair to question whether pg_upgrade should even try to
> cope with such cases.  If the function has changed signature,
> it might well be that it's also changed behavior enough so that
> any previously-made grants need reconsideration.  (Maybe we should
> just suppress the old grant rather than transferring it.)

Suppressing the GRANT strikes me as pretty reasonable as an approach but
wouldn't that require us to similairly track what's changed between
major versions..?  Unless we arrange to ignore the GRANT failing, but
that seems like it would involve a fair bit of hacking around in pg_dump
to have some option to ignore certain GRANTs failing.  Did you have some
other idea about how to suppress the old GRANT?

A way to make things work for users while suppressing the GRANTS would
be to add a default role for things like file-level-backup, which would
be allowed to execute file-level-backup related functions, presumably
even if we make changes to exactly what those function signatures are,
and then encourage users to GRANT that role to the role that's allowed
to log in and run the file-level backup.  Obviously we wouldn't be doing
that in the back-branches, but we could moving forward.

> Still, this does seem like a gap in the pg_init_privs mechanism.
> I wonder if Stephen has any thoughts about what ought to happen.

So, in an interesting sort of way, we have a way to deal with this
problem when it comes to *extensions* and I suppose that's why we
haven't seen it there- namely the upgrade script, which can decide if it
wants to drop an object and recreate it, or if it wants to do a
create-or-replace, which would preserve the privileges (though the API
has to stay the same, so that isn't exactly the same) and avoid dropping
dependant objects.

Unfortunately, we don't have any good way today to add an optional
argument to a function while preserving the privileges on it, which
would make a case like this one (and others where you'd prefer to not
drop/recreate the function due to dependencies) work, for extensions.

Suppressing the GRANT also seems reasonable for the case of objects
which have been renamed- clearly whatever is using those functions is
going to have to be modified to deal with the new name of the function,
requiring that the GRANT be re-issued doesn't seem like it's that much
more to ask of users.  On the other hand, properly written tools that
check the version of PG and use the right function names could possibly
"just work" following a major version upgrade, if the privilege was
brought across to the new major version correctly.

We also don't want to mistakenly GRANT users more access then they
should have though- if pg_stop_backup() one day grows an
optional argument to run some server-side script, I don't think we'd
want to necessairly just give access to that ability to roles who,
today, can execute the current pg_stop_backup() function.  Of course, if
we added such a capability, hopefully we would do so in a way that less
privileged roles could continue to use the existing capability without
having access to run such a server-side script.

I also don't think that the current patch is actually sufficient to deal
with all the changes we've made between the versions- what about column
names on catalog tables/views that were removed, or changed/renamed..?

In an ideal world, it seems like we'd make a judgement call and arrange
to pull the privileges across when we can do so without granting the
user privileges beyond those that were intended, and otherwise we'd
suppress the GRANT to avoid getting an error.  I'm not sure what a good
way is to go about either figuring out a way to pull the privileges
across, or to suppress the GRANTs when we need to (or always), would be
though.  Neither seems easy to solve in a clean way.

Certainly open to suggestions.

Thanks,

Stephen

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pg_upgrade fails with non-standard ACL

Anastasia Lubennikova
29.07.2019 18:37, Stephen Frost wrote:

> Greetings,
>
> * Tom Lane ([hidden email]) wrote:
>> Bruce Momjian <[hidden email]> writes:
>>> On Thu, Jul 18, 2019 at 06:53:12PM +0300, Anastasia Lubennikova wrote:
>>>> pg_upgrade from 9.6 fails if old cluster had non-standard ACL
>>>> on pg_catalog functions that have changed between versions,
>>>> for example pg_stop_backup(boolean).
>>> Uh, wouldn't this affect any default-installed function where the
>>> permission are modified?  Is fixing only a few functions really helpful?
>> No, it's just functions whose signatures have changed enough that
>> a GRANT won't find them.  I think the idea is that the set of
>> potentially-affected functions is determinate.  I have to say that
>> the proposed patch seems like a complete kluge, though.  For one
>> thing we'd have to maintain the list of affected functions in each
>> future release, and I have no faith in our remembering to do that.
> Well, we aren't likely to do that ourselves, no, but perhaps we could
> manage it with some prodding by having the buildfarm check for such
> cases, not unlike how library maintainers check the ABI between versions
> of the library they manage.  Extension authors also deal with these
> kinds of changes routinely when writing the upgrade scripts to go
> between versions of their extension.  I'm not convinced that this is a
> great approach to go down either, to be clear.  When going across major
> versions, making people update their systems/code and re-test is
> typically entirely reasonable to me.
Whatever we choose to do, we need to keep a list of changed functions. I
don't
think that it will add too much extra work to maintaining other catalog
changes
such as adding or renaming columns.
What's more, we must mention changed functions in migration release
notes. I've
checked documentation [1] and found out, that function API changes are not
described properly.

I think it is an important omission, so I attached the patch for
documentation.
Not quite sure, how many users have already migrated to version 10, still, I
believe it will help many others.

> Suppressing the GRANT also seems reasonable for the case of objects
> which have been renamed- clearly whatever is using those functions is
> going to have to be modified to deal with the new name of the function,
> requiring that the GRANT be re-issued doesn't seem like it's that much
> more to ask of users.  On the other hand, properly written tools that
> check the version of PG and use the right function names could possibly
> "just work" following a major version upgrade, if the privilege was
> brought across to the new major version correctly.
That's exactly the case.

> We also don't want to mistakenly GRANT users more access then they
> should have though- if pg_stop_backup() one day grows an
> optional argument to run some server-side script, I don't think we'd
> want to necessairly just give access to that ability to roles who,
> today, can execute the current pg_stop_backup() function.  Of course, if
> we added such a capability, hopefully we would do so in a way that less
> privileged roles could continue to use the existing capability without
> having access to run such a server-side script.
>
> I also don't think that the current patch is actually sufficient to deal
> with all the changes we've made between the versions- what about column
> names on catalog tables/views that were removed, or changed/renamed..?
I can't get the problem you describe here. As far as I understand, various
changes of catalog tables and views are already handled correctly in
pg_upgrade.

> In an ideal world, it seems like we'd make a judgement call and arrange
> to pull the privileges across when we can do so without granting the
> user privileges beyond those that were intended, and otherwise we'd
> suppress the GRANT to avoid getting an error.  I'm not sure what a good
> way is to go about either figuring out a way to pull the privileges
> across, or to suppress the GRANTs when we need to (or always), would be
> though.  Neither seems easy to solve in a clean way.
>
> Certainly open to suggestions.
Based on our initial bug report, I would vote against suppressing the
GRANTS,
because it leads to an unexpected failure in case a user has a special
role for
use in a third-party utility such as a backup tool, which already took
care of
internal API changes.

Still I agree with your arguments about possibility of providing more grants
than expected. Ideally, we do not change behaviour of existing functions
that
much, but in real-world it may happen.

Maybe, as a compromise, we can reset grants to default for all changed
functions
and also generate a script that will allow a user to preserve privileges
of the
old cluster by analogy with analyze_new_cluster script.
What do you think?

[1] https://www.postgresql.org/docs/10/release-10.html

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


doc-function-API-change_v10.patch (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pg_upgrade fails with non-standard ACL

Bruce Momjian
On Tue, Aug 13, 2019 at 07:04:42PM +0300, Anastasia Lubennikova wrote:

> > In an ideal world, it seems like we'd make a judgement call and arrange
> > to pull the privileges across when we can do so without granting the
> > user privileges beyond those that were intended, and otherwise we'd
> > suppress the GRANT to avoid getting an error.  I'm not sure what a good
> > way is to go about either figuring out a way to pull the privileges
> > across, or to suppress the GRANTs when we need to (or always), would be
> > though.  Neither seems easy to solve in a clean way.
> >
> > Certainly open to suggestions.
> Based on our initial bug report, I would vote against suppressing the
> GRANTS,
> because it leads to an unexpected failure in case a user has a special role
> for
> use in a third-party utility such as a backup tool, which already took care
> of
> internal API changes.
>
> Still I agree with your arguments about possibility of providing more grants
> than expected. Ideally, we do not change behaviour of existing functions
> that
> much, but in real-world it may happen.
>
> Maybe, as a compromise, we can reset grants to default for all changed
> functions
> and also generate a script that will allow a user to preserve privileges of
> the
> old cluster by analogy with analyze_new_cluster script.
> What do you think?

I agree pg_upgrade should work without user correction as much as
possible.  However, as you can see, it can fail when user objects
reference system table objects that have changed between major releases.

As much as it would be nice if the release notes covered all that, and
we updated pg_upgrade to somehow handle them, it just isn't realistic.
As we can see here, the problems often take years to show up, and even
then there were probably many other people who had the problem who never
reported it to us.

I think a realistic approach is to come up with a list of all the user
behaviors that can cause pg_upgrade to break (by reviewing previous
pg_upgrade bug reports), and then add code to pg_upgrade to detect them
and either fix them or report them in --check mode.

In summary, I am saying that the odds that patch authors, committers,
release note writers, and pg_upgrade maintainers are going to form a
consistent work flow that catches all these changes is unrealistic ---
our best bet is to create something in the pg_upgrade code to detects
this.  pg_upgrade already connects to the old and new cluster, so
technically it can perform system table modification checks itself.

The only positive is that when pg_upgrade does fail, at least we have a
system that clearly points to the cause, but unfortunately usually at
run-time, not at --check time.

--
  Bruce Momjian  <[hidden email]>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +


Reply | Threaded
Open this post in threaded view
|

Re: pg_upgrade fails with non-standard ACL

Stephen Frost
Greetings,

* Bruce Momjian ([hidden email]) wrote:

> On Tue, Aug 13, 2019 at 07:04:42PM +0300, Anastasia Lubennikova wrote:
> > Maybe, as a compromise, we can reset grants to default for all changed
> > functions
> > and also generate a script that will allow a user to preserve privileges of
> > the
> > old cluster by analogy with analyze_new_cluster script.
> > What do you think?
>
> I agree pg_upgrade should work without user correction as much as
> possible.  However, as you can see, it can fail when user objects
> reference system table objects that have changed between major releases.
Right.

> As much as it would be nice if the release notes covered all that, and
> we updated pg_upgrade to somehow handle them, it just isn't realistic.
> As we can see here, the problems often take years to show up, and even
> then there were probably many other people who had the problem who never
> reported it to us.

Yeah, the possible changes when you think about column-level privileges
as well really gets to be quite large..

This is why my thinking is that we should come up with additional
default roles, which aren't tied to specific catalog structures but
instead are for a more general set of capabilities which we manage and
users can either decide to use, or not.  If they decide to work with the
individual functions then they have to manage the upgrade process if and
when we make changes to those functions.

> I think a realistic approach is to come up with a list of all the user
> behaviors that can cause pg_upgrade to break (by reviewing previous
> pg_upgrade bug reports), and then add code to pg_upgrade to detect them
> and either fix them or report them in --check mode.

In this case, we could, at least conceptually, perform a comparison
between the different major versions and then check for any non-default
privileges for any of the objects changed and then report on those in
--check mode with a recommendation to revert to the default privileges
in the old cluster before running pg_upgrade, and then apply whatever
privileges are desired in the new cluster after the upgrade completes.

> In summary, I am saying that the odds that patch authors, committers,
> release note writers, and pg_upgrade maintainers are going to form a
> consistent work flow that catches all these changes is unrealistic ---
> our best bet is to create something in the pg_upgrade code to detects
> this.  pg_upgrade already connects to the old and new cluster, so
> technically it can perform system table modification checks itself.

It'd be pretty neat if pg_upgrade could connect to the old and new
clusters concurrently and then perform a generic catalog comparison
between them and identify all objects which have been changed and
determine if there's any non-default ACLs or dependencies on the catalog
objects which are different between the clusters.  That seems like an
awful lot of work though, and I'm not sure there's really any need,
given that we don't change the catalog for a given major version- we
could just generate the list using some queries whenever we release a
new major version and update pg_upgrade with it.

> The only positive is that when pg_upgrade does fail, at least we have a
> system that clearly points to the cause, but unfortunately usually at
> run-time, not at --check time.

Getting it to be at check time would certainly be a great improvement.

Solving this in pg_upgrade does seem like it's probably the better
approach rather than trying to do it in pg_dump.  Unfortunately, that
likely means that all we can do is have pg_upgrade point out to the user
when something will fail, with recommendations on how to address it, but
that's also something users are likely used to and willing to accept,
and puts the onus on them to consider their ACL decisions when we're
making catalog changes, and it keeps these issues out of pg_dump.

Thanks,

Stephen

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pg_upgrade fails with non-standard ACL

Bruce Momjian
On Tue, Aug 13, 2019 at 08:28:12PM -0400, Stephen Frost wrote:
> Getting it to be at check time would certainly be a great improvement.
>
> Solving this in pg_upgrade does seem like it's probably the better
> approach rather than trying to do it in pg_dump.  Unfortunately, that
> likely means that all we can do is have pg_upgrade point out to the user
> when something will fail, with recommendations on how to address it, but
> that's also something users are likely used to and willing to accept,
> and puts the onus on them to consider their ACL decisions when we're
> making catalog changes, and it keeps these issues out of pg_dump.

Yeah, I think we just need to bite the bullet and create some
infrastructure to help folks solve the problem.

--
  Bruce Momjian  <[hidden email]>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +


Reply | Threaded
Open this post in threaded view
|

Re: pg_upgrade fails with non-standard ACL

Anastasia Lubennikova
In reply to this post by Stephen Frost
14.08.2019 3:28, Stephen Frost wrote:

> * Bruce Momjian ([hidden email]) wrote:
>
>> As much as it would be nice if the release notes covered all that, and
>> we updated pg_upgrade to somehow handle them, it just isn't realistic.
>> As we can see here, the problems often take years to show up, and even
>> then there were probably many other people who had the problem who never
>> reported it to us.
> Yeah, the possible changes when you think about column-level privileges
> as well really gets to be quite large..
>
> This is why my thinking is that we should come up with additional
> default roles, which aren't tied to specific catalog structures but
> instead are for a more general set of capabilities which we manage and
> users can either decide to use, or not.  If they decide to work with the
> individual functions then they have to manage the upgrade process if and
> when we make changes to those functions.
Idea of having special roles looks good to me, though, I don't see
how to define what grants are needed for each role. Let's say, we
define role "backupuser" that obviously must have grants on
pg_start_backup()
and pg_stop_backup(). Should it also have access to pg_is_in_recovery()
or for example version()?

> It'd be pretty neat if pg_upgrade could connect to the old and new
> clusters concurrently and then perform a generic catalog comparison
> between them and identify all objects which have been changed and
> determine if there's any non-default ACLs or dependencies on the catalog
> objects which are different between the clusters.  That seems like an
> awful lot of work though, and I'm not sure there's really any need,
> given that we don't change the catalog for a given major version- we
> could just generate the list using some queries whenever we release a
> new major version and update pg_upgrade with it.
>
>> The only positive is that when pg_upgrade does fail, at least we have a
>> system that clearly points to the cause, but unfortunately usually at
>> run-time, not at --check time.
> Getting it to be at check time would certainly be a great improvement.
>
> Solving this in pg_upgrade does seem like it's probably the better
> approach rather than trying to do it in pg_dump.  Unfortunately, that
> likely means that all we can do is have pg_upgrade point out to the user
> when something will fail, with recommendations on how to address it, but
> that's also something users are likely used to and willing to accept,
> and puts the onus on them to consider their ACL decisions when we're
> making catalog changes, and it keeps these issues out of pg_dump.

I wrote a prototype to check API and ACL compatibility (see attachment).
In the current implementation it fetches the list of system procedures
for both old and new clusters
and reports deleted functions or ACL changes during pg_upgrade check:


Performing Consistency Checks
-----------------------------
...
Checking for system functions API compatibility
dbname postgres : check procsig is equal pg_stop_backup(), procacl not
equal {anastasia=X/anastasia,backup=X/anastasia} vs {anastasia=X/anastasia}
dbname postgres : procedure pg_stop_backup(exclusive boolean, OUT lsn
pg_lsn, OUT labelfile text, OUT spcmapfile text) doesn't exist in
new_cluster
dbname postgres : procedure pg_switch_xlog() doesn't exist in new_cluster
dbname postgres : procedure pg_xlog_replay_pause() doesn't exist in
new_cluster
dbname postgres : procedure pg_xlog_replay_resume() doesn't exist in
new_cluster
...

I think it's a good first step in the right direction.
Now I have questions about implementation details:

1) How exactly should we report this incompatibility to a user?
I think it's fine to leave the warnings and also write some hint for the
user by analogy with other checks.
"Reset ACL on the problem functions to default in the old cluster to
continue"

Is it enough?

2) This approach can be extended to other catalog modifications, you
mentioned above.
I don't see what else can break pg_upgrade in the same way. However, I
don't mind
implementing more checks, while I work on this issue, if you can point
on them.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


pg_upgrade_ACL_check_v1.patch (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pg_upgrade fails with non-standard ACL

Stephen Frost
Greetings,

* Anastasia Lubennikova ([hidden email]) wrote:

> 14.08.2019 3:28, Stephen Frost wrote:
> >* Bruce Momjian ([hidden email]) wrote:
> >>As much as it would be nice if the release notes covered all that, and
> >>we updated pg_upgrade to somehow handle them, it just isn't realistic.
> >>As we can see here, the problems often take years to show up, and even
> >>then there were probably many other people who had the problem who never
> >>reported it to us.
> >Yeah, the possible changes when you think about column-level privileges
> >as well really gets to be quite large..
> >
> >This is why my thinking is that we should come up with additional
> >default roles, which aren't tied to specific catalog structures but
> >instead are for a more general set of capabilities which we manage and
> >users can either decide to use, or not.  If they decide to work with the
> >individual functions then they have to manage the upgrade process if and
> >when we make changes to those functions.
>
> Idea of having special roles looks good to me, though, I don't see
> how to define what grants are needed for each role. Let's say, we
> define role "backupuser" that obviously must have grants on
> pg_start_backup()
> and pg_stop_backup(). Should it also have access to pg_is_in_recovery()
> or for example version()?
pg_is_in_recovery() and version() are already allowed to be executed by
public, and I don't see any particular reason to change that, so I don't
believe those would need to be explicitly GRANT'd to this new role.

I would think the specific set would be those listed under:

https://www.postgresql.org/docs/current/functions-admin.html#FUNCTIONS-ADMIN-BACKUP

which currently require superuser access.

This isn't a new idea, btw, there was a great deal of discussion three
years ago around all of this.  Particularly relevant is this:

https://www.postgresql.org/message-id/20160104175516.GC3685%40tamriel.snowman.net

> >It'd be pretty neat if pg_upgrade could connect to the old and new
> >clusters concurrently and then perform a generic catalog comparison
> >between them and identify all objects which have been changed and
> >determine if there's any non-default ACLs or dependencies on the catalog
> >objects which are different between the clusters.  That seems like an
> >awful lot of work though, and I'm not sure there's really any need,
> >given that we don't change the catalog for a given major version- we
> >could just generate the list using some queries whenever we release a
> >new major version and update pg_upgrade with it.
> >
> >>The only positive is that when pg_upgrade does fail, at least we have a
> >>system that clearly points to the cause, but unfortunately usually at
> >>run-time, not at --check time.
> >Getting it to be at check time would certainly be a great improvement.
> >
> >Solving this in pg_upgrade does seem like it's probably the better
> >approach rather than trying to do it in pg_dump.  Unfortunately, that
> >likely means that all we can do is have pg_upgrade point out to the user
> >when something will fail, with recommendations on how to address it, but
> >that's also something users are likely used to and willing to accept,
> >and puts the onus on them to consider their ACL decisions when we're
> >making catalog changes, and it keeps these issues out of pg_dump.
>
> I wrote a prototype to check API and ACL compatibility (see attachment).
> In the current implementation it fetches the list of system procedures for
> both old and new clusters
> and reports deleted functions or ACL changes during pg_upgrade check:
>
>
> Performing Consistency Checks
> -----------------------------
> ...
> Checking for system functions API compatibility
> dbname postgres : check procsig is equal pg_stop_backup(), procacl not equal
> {anastasia=X/anastasia,backup=X/anastasia} vs {anastasia=X/anastasia}
> dbname postgres : procedure pg_stop_backup(exclusive boolean, OUT lsn
> pg_lsn, OUT labelfile text, OUT spcmapfile text) doesn't exist in
> new_cluster
> dbname postgres : procedure pg_switch_xlog() doesn't exist in new_cluster
> dbname postgres : procedure pg_xlog_replay_pause() doesn't exist in
> new_cluster
> dbname postgres : procedure pg_xlog_replay_resume() doesn't exist in
> new_cluster
> ...
>
> I think it's a good first step in the right direction.
> Now I have questions about implementation details:
>
> 1) How exactly should we report this incompatibility to a user?
> I think it's fine to leave the warnings and also write some hint for the
> user by analogy with other checks.
> "Reset ACL on the problem functions to default in the old cluster to
> continue"
>
> Is it enough?
Not really sure what else we could do there..?  Did you have something
else in mind?  We could possibly provide the specific commands to run,
that seems like about the only other thing we could possibly do.

> 2) This approach can be extended to other catalog modifications, you
> mentioned above.
> I don't see what else can break pg_upgrade in the same way. However, I don't
> mind
> implementing more checks, while I work on this issue, if you can point on
> them.

I was thinking of, for example, column-level privileges on system
relations (tables or views) where that column was later removed, for
example.  I do appreciate that such changes are relatively rare but they
do happen...

Will try to take a look at the actual patch later today.

Thanks,

Stephen

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pg_upgrade fails with non-standard ACL

Bruce Momjian
In reply to this post by Anastasia Lubennikova
On Tue, Aug 20, 2019 at 04:38:18PM +0300, Anastasia Lubennikova wrote:

> > Solving this in pg_upgrade does seem like it's probably the better
> > approach rather than trying to do it in pg_dump.  Unfortunately, that
> > likely means that all we can do is have pg_upgrade point out to the user
> > when something will fail, with recommendations on how to address it, but
> > that's also something users are likely used to and willing to accept,
> > and puts the onus on them to consider their ACL decisions when we're
> > making catalog changes, and it keeps these issues out of pg_dump.
>
>
> I wrote a prototype to check API and ACL compatibility (see attachment).
> In the current implementation it fetches the list of system procedures for
> both old and new clusters
> and reports deleted functions or ACL changes during pg_upgrade check:
>
>
> Performing Consistency Checks
> -----------------------------
> ...
> Checking for system functions API compatibility
> dbname postgres : check procsig is equal pg_stop_backup(), procacl not equal
> {anastasia=X/anastasia,backup=X/anastasia} vs {anastasia=X/anastasia}
> dbname postgres : procedure pg_stop_backup(exclusive boolean, OUT lsn
> pg_lsn, OUT labelfile text, OUT spcmapfile text) doesn't exist in
> new_cluster
> dbname postgres : procedure pg_switch_xlog() doesn't exist in new_cluster
> dbname postgres : procedure pg_xlog_replay_pause() doesn't exist in
> new_cluster
> dbname postgres : procedure pg_xlog_replay_resume() doesn't exist in
> new_cluster
> ...
>
> I think it's a good first step in the right direction.
> Now I have questions about implementation details:
>
> 1) How exactly should we report this incompatibility to a user?
> I think it's fine to leave the warnings and also write some hint for the
> user by analogy with other checks.
> "Reset ACL on the problem functions to default in the old cluster to
> continue"

Yes, I think it is good to at least throw an error during --check so
they don't have to find out during a live upgrade.  Odds are it will
require manual repair.

--
  Bruce Momjian  <[hidden email]>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +