[PATCH] Stop ALTER SYSTEM from making bad assumptions

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
115 messages Options
1234 ... 6
Reply | Threaded
Open this post in threaded view
|

[PATCH] Stop ALTER SYSTEM from making bad assumptions

Ian Barwick-3
Hi

Consider the following cascading standby setup with PostgreSQL 12:

- there exists a running primary "A"
- standby "B" is cloned from primary "A" using "pg_basebackup --write-recovery-conf"
- standby "C" is cloned from standby "B" using "pg_basebackup --write-recovery-conf"

So far, so good, everything works as expected.

Now, for whatever reason, the user wishes standby "C" to follow another upstream
node (which one is not important here), so the user, in the comfort of their own psql
command line (no more pesky recovery.conf editing!) issues the following:

     ALTER SYSTEM SET primary_conninfo = 'host=someothernode';

and restarts the instance, and... it stays connected to the original upstream node.

Which is unexpected.

Examining the the restarted instance, "SHOW primary_conninfo" still displays
the original value for "primary_conninfo". Mysteriouser and mysteriouser.

What has happened here is that with the option "--write-recovery-conf", Pg12's
pg_basebackup (correctly IMHO) appends the recovery settings to "postgresql.auto.conf".

However, on standby "C", pg_basebackup has dutifully copied over standby "B"'s
existing "postgresql.auto.conf", which already contains standby "B"'s
replication configuration, and appended standby "C"'s replication configuration
to that, which (before "ALTER SYSTEM" was invoked) looked something like this:

        # Do not edit this file manually!
        # It will be overwritten by the ALTER SYSTEM command.
        primary_conninfo = 'host=node_A'
        primary_conninfo = 'host=node_B'

which is expected, and works because the last entry in the file is evaluated, so
on startup, standby "C" follows standby "B".

However, executing "ALTER SYSTEM SET primary_conninfo = 'host=someothernode'" left
standby "C"'s "postgresql.auto.conf" file looking like this:

        # Do not edit this file manually!
        # It will be overwritten by the ALTER SYSTEM command.
        primary_conninfo = 'host=someothernode'
        primary_conninfo = 'host=node_B'

which seems somewhat broken, to say the least.

As-is, the user will either need to repeatedly issue "ALTER SYSTEM RESET primary_conninfo"
until the duplicates are cleared (which means "ALTER SYSTEM RESET ..." doesn't work as
advertised, and is not an obvious solution anyway), or ignore the "Do not edit this file manually!"
warning and remove the offending entry/entries (which, if done safely, should involve
shutting down the instance first).

Note this issue is not specific to pg_basebackup, primary_conninfo (or any other settings
formerly in recovery.conf), it has just manifested itself as the built-in toolset as of now
provides a handy way of getting into this situation without too much effort (and any
utilities which clone standbys and append the replication configuration to
"postgresql.auto.conf" in lieu of creating "recovery.conf" will be prone to running into
the same situation).

I had previously always assumed that ALTER SYSTEM  would change the *last* occurrence for
the parameter in "postgresql.auto.conf", in the same way you'd need to be sure to change
the last occurrence in the normal configuration files, however this actually not the case -
as per replace_auto_config_value() ( src/backend/utils/misc/guc.c ):

     /* Search the list for an existing match (we assume there's only one) */

the *first* match is replaced.

Attached patch attempts to rectify this situation by having replace_auto_config_value()
deleting any duplicate entries first, before making any changes to the last entry.

Arguably it might be sufficient (and simpler) to just scan the list for the last
entry, but removing preceding duplicates seems cleaner, as it's pointless
(and a potential source of confusion) keeping entries around which will never be used.

Also attached is a set of TAP tests to check ALTER SYSTEM works as expected (or
at least as seems correct to me).


Regards

Ian Barwick

--
  Ian Barwick                   https://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

alter-system-replace-last-entry.v1.patch (3K) Download Attachment
test-postgresql-auto-conf.v1.patch (8K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Stop ALTER SYSTEM from making bad assumptions

Stephen Frost
Greetings,

* Ian Barwick ([hidden email]) wrote:

> Consider the following cascading standby setup with PostgreSQL 12:
>
> - there exists a running primary "A"
> - standby "B" is cloned from primary "A" using "pg_basebackup --write-recovery-conf"
> - standby "C" is cloned from standby "B" using "pg_basebackup --write-recovery-conf"
>
> So far, so good, everything works as expected.
>
> Now, for whatever reason, the user wishes standby "C" to follow another upstream
> node (which one is not important here), so the user, in the comfort of their own psql
> command line (no more pesky recovery.conf editing!) issues the following:
>
>     ALTER SYSTEM SET primary_conninfo = 'host=someothernode';
>
> and restarts the instance, and... it stays connected to the original upstream node.
>
> Which is unexpected.
>
> Examining the the restarted instance, "SHOW primary_conninfo" still displays
> the original value for "primary_conninfo". Mysteriouser and mysteriouser.
>
> What has happened here is that with the option "--write-recovery-conf", Pg12's
> pg_basebackup (correctly IMHO) appends the recovery settings to "postgresql.auto.conf".
>
> However, on standby "C", pg_basebackup has dutifully copied over standby "B"'s
> existing "postgresql.auto.conf", which already contains standby "B"'s
> replication configuration, and appended standby "C"'s replication configuration
> to that, which (before "ALTER SYSTEM" was invoked) looked something like this:
>
> # Do not edit this file manually!
> # It will be overwritten by the ALTER SYSTEM command.
> primary_conninfo = 'host=node_A'
> primary_conninfo = 'host=node_B'
>
> which is expected, and works because the last entry in the file is evaluated, so
> on startup, standby "C" follows standby "B".
>
> However, executing "ALTER SYSTEM SET primary_conninfo = 'host=someothernode'" left
> standby "C"'s "postgresql.auto.conf" file looking like this:
>
> # Do not edit this file manually!
> # It will be overwritten by the ALTER SYSTEM command.
> primary_conninfo = 'host=someothernode'
> primary_conninfo = 'host=node_B'
>
> which seems somewhat broken, to say the least.
Yes, it's completely broken, which I've complained about at least twice
on this list to no avail.

Thanks for putting together an example case pointing out why it's a
serious issue.  The right thing to do here it so create an open item for
PG12 around this.

> As-is, the user will either need to repeatedly issue "ALTER SYSTEM RESET primary_conninfo"
> until the duplicates are cleared (which means "ALTER SYSTEM RESET ..." doesn't work as
> advertised, and is not an obvious solution anyway), or ignore the "Do not edit this file manually!"
> warning and remove the offending entry/entries (which, if done safely, should involve
> shutting down the instance first).
>
> Note this issue is not specific to pg_basebackup, primary_conninfo (or any other settings
> formerly in recovery.conf), it has just manifested itself as the built-in toolset as of now
> provides a handy way of getting into this situation without too much effort (and any
> utilities which clone standbys and append the replication configuration to
> "postgresql.auto.conf" in lieu of creating "recovery.conf" will be prone to running into
> the same situation).
This is absolutely the fault of the system for putting in multiple
entries into the auto.conf, which it wasn't ever written to handle.

> I had previously always assumed that ALTER SYSTEM  would change the *last* occurrence for
> the parameter in "postgresql.auto.conf", in the same way you'd need to be sure to change
> the last occurrence in the normal configuration files, however this actually not the case -
> as per replace_auto_config_value() ( src/backend/utils/misc/guc.c ):
>
>     /* Search the list for an existing match (we assume there's only one) */
>
> the *first* match is replaced.
>
> Attached patch attempts to rectify this situation by having replace_auto_config_value()
> deleting any duplicate entries first, before making any changes to the last entry.
While this might be a good belt-and-suspenders kind of change to
include, I don't think pg_basebackup should be causing us to have
multiple entries in the file in the first place..

> Arguably it might be sufficient (and simpler) to just scan the list for the last
> entry, but removing preceding duplicates seems cleaner, as it's pointless
> (and a potential source of confusion) keeping entries around which will never be used.

I don't think we should only change the last entry, that seems like a
really bad idea.  I agree that we should clean up the file if we come
across it being invalid.

> Also attached is a set of TAP tests to check ALTER SYSTEM works as expected (or
> at least as seems correct to me).

In my view, at least, we should have a similar test for pg_basebackup to
make sure that it doesn't create an invalid .auto.conf file.

Thanks!

Stephen

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Stop ALTER SYSTEM from making bad assumptions

akapila
On Fri, Jun 14, 2019 at 9:38 PM Stephen Frost <[hidden email]> wrote:

> * Ian Barwick ([hidden email]) wrote:
> >
> > Note this issue is not specific to pg_basebackup, primary_conninfo (or any other settings
> > formerly in recovery.conf), it has just manifested itself as the built-in toolset as of now
> > provides a handy way of getting into this situation without too much effort (and any
> > utilities which clone standbys and append the replication configuration to
> > "postgresql.auto.conf" in lieu of creating "recovery.conf" will be prone to running into
> > the same situation).
>
> This is absolutely the fault of the system for putting in multiple
> entries into the auto.conf, which it wasn't ever written to handle.
>

Right.  I think if possible, it should use existing infrastructure to
write to postgresql.auto.conf rather than inventing a new way to
change it.  Apart from this issue, if we support multiple ways to edit
postgresql.auto.conf, we might end up with more problems like this in
the future where one system is not aware of the way file being edited
by another system.

> > I had previously always assumed that ALTER SYSTEM  would change the *last* occurrence for
> > the parameter in "postgresql.auto.conf", in the same way you'd need to be sure to change
> > the last occurrence in the normal configuration files, however this actually not the case -
> > as per replace_auto_config_value() ( src/backend/utils/misc/guc.c ):
> >
> >     /* Search the list for an existing match (we assume there's only one) */
> >
> > the *first* match is replaced.
> >
> > Attached patch attempts to rectify this situation by having replace_auto_config_value()
> > deleting any duplicate entries first, before making any changes to the last entry.
>
> While this might be a good belt-and-suspenders kind of change to
> include,
>

Another possibility to do something on these lines is to extend Alter
System Reset <config_param> to remove all the duplicate entries.  Then
the user has a way to remove all duplicate entries if any and set the
new value.  I think handling duplicate entries in *.auto.conf files is
an enhancement of the existing system and there could be multiple
things we can do there, so we shouldn't try to do that as a bug-fix.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Stop ALTER SYSTEM from making bad assumptions

Stephen Frost
Greetings,

* Amit Kapila ([hidden email]) wrote:

> On Fri, Jun 14, 2019 at 9:38 PM Stephen Frost <[hidden email]> wrote:
> > * Ian Barwick ([hidden email]) wrote:
> > >
> > > Note this issue is not specific to pg_basebackup, primary_conninfo (or any other settings
> > > formerly in recovery.conf), it has just manifested itself as the built-in toolset as of now
> > > provides a handy way of getting into this situation without too much effort (and any
> > > utilities which clone standbys and append the replication configuration to
> > > "postgresql.auto.conf" in lieu of creating "recovery.conf" will be prone to running into
> > > the same situation).
> >
> > This is absolutely the fault of the system for putting in multiple
> > entries into the auto.conf, which it wasn't ever written to handle.
>
> Right.  I think if possible, it should use existing infrastructure to
> write to postgresql.auto.conf rather than inventing a new way to
> change it.  Apart from this issue, if we support multiple ways to edit
> postgresql.auto.conf, we might end up with more problems like this in
> the future where one system is not aware of the way file being edited
> by another system.
I agere that there should have been some effort put into making the way
ALTER SYSTEM is modified be consistent between the backend and utilities
like pg_basebackup (which would also help third party tools understand
how a non-backend application should be modifying the file).

> > > I had previously always assumed that ALTER SYSTEM  would change the *last* occurrence for
> > > the parameter in "postgresql.auto.conf", in the same way you'd need to be sure to change
> > > the last occurrence in the normal configuration files, however this actually not the case -
> > > as per replace_auto_config_value() ( src/backend/utils/misc/guc.c ):
> > >
> > >     /* Search the list for an existing match (we assume there's only one) */
> > >
> > > the *first* match is replaced.
> > >
> > > Attached patch attempts to rectify this situation by having replace_auto_config_value()
> > > deleting any duplicate entries first, before making any changes to the last entry.
> >
> > While this might be a good belt-and-suspenders kind of change to
> > include,
>
> Another possibility to do something on these lines is to extend Alter
> System Reset <config_param> to remove all the duplicate entries.  Then
> the user has a way to remove all duplicate entries if any and set the
> new value.  I think handling duplicate entries in *.auto.conf files is
> an enhancement of the existing system and there could be multiple
> things we can do there, so we shouldn't try to do that as a bug-fix.
Unless there's actually a use-case for duplicate entries in
postgresql.auto.conf, what we should do is clean them up (and possibly
throw a WARNING or similar at the user saying "something modified your
postgresql.auto.conf in an unexpected way").  I'd suggest we do that on
every ALTER SYSTEM call.

Thanks,

Stephen

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Stop ALTER SYSTEM from making bad assumptions

Tom Lane-2
Stephen Frost <[hidden email]> writes:
> Unless there's actually a use-case for duplicate entries in
> postgresql.auto.conf,

There isn't --- guc.c will just discard the earlier duplicates.

> what we should do is clean them up (and possibly
> throw a WARNING or similar at the user saying "something modified your
> postgresql.auto.conf in an unexpected way").  I'd suggest we do that on
> every ALTER SYSTEM call.

+1 for having ALTER SYSTEM clean out duplicates.  Not sure whether
a WARNING would seem too in-your-face.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Stop ALTER SYSTEM from making bad assumptions

Stephen Frost
Greetings,

* Tom Lane ([hidden email]) wrote:
> Stephen Frost <[hidden email]> writes:
> > Unless there's actually a use-case for duplicate entries in
> > postgresql.auto.conf,
>
> There isn't --- guc.c will just discard the earlier duplicates.

One might be able to argue for trying to create a stack or some such, to
allow you to more easily move between values or to see what the value
was set to at some point in the past, etc etc.  Until we see an actual
thought out use-case along those lines that requires supporting
duplicates in some fashion though, with code to make it all work, I
don't think we should allow it.

> > what we should do is clean them up (and possibly
> > throw a WARNING or similar at the user saying "something modified your
> > postgresql.auto.conf in an unexpected way").  I'd suggest we do that on
> > every ALTER SYSTEM call.
>
> +1 for having ALTER SYSTEM clean out duplicates.  Not sure whether
> a WARNING would seem too in-your-face.

I'd hope for a warning from basically every part of the system when it
detects, clearly, that a file was changed in a way that it shouldn't
have been.  If we don't throw a warning, then we're implying that it's
acceptable, but then cleaning up the duplicates, which seems pretty
confusing.

Thanks,

Stephen

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Stop ALTER SYSTEM from making bad assumptions

Magnus Hagander-2
On Sun, Jun 16, 2019 at 7:43 PM Stephen Frost <[hidden email]> wrote:

* Tom Lane ([hidden email]) wrote:
> Stephen Frost <[hidden email]> writes:

> > what we should do is clean them up (and possibly
> > throw a WARNING or similar at the user saying "something modified your
> > postgresql.auto.conf in an unexpected way").  I'd suggest we do that on
> > every ALTER SYSTEM call.
>
> +1 for having ALTER SYSTEM clean out duplicates.  Not sure whether
> a WARNING would seem too in-your-face.

I'd hope for a warning from basically every part of the system when it
detects, clearly, that a file was changed in a way that it shouldn't
have been.  If we don't throw a warning, then we're implying that it's
acceptable, but then cleaning up the duplicates, which seems pretty
confusing.

+1. Silently "fixing" the file by cleaning up duplicates is going to be even more confusing to uses who had seen them be there before. 

--
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Stop ALTER SYSTEM from making bad assumptions

Gavin Flower-2
On 17/06/2019 05:58, Magnus Hagander wrote:

> On Sun, Jun 16, 2019 at 7:43 PM Stephen Frost <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>
>     * Tom Lane ([hidden email] <mailto:[hidden email]>) wrote:
>     > Stephen Frost <[hidden email] <mailto:[hidden email]>>
>     writes:
>
>     > > what we should do is clean them up (and possibly
>     > > throw a WARNING or similar at the user saying "something
>     modified your
>     > > postgresql.auto.conf in an unexpected way").  I'd suggest we
>     do that on
>     > > every ALTER SYSTEM call.
>     >
>     > +1 for having ALTER SYSTEM clean out duplicates.  Not sure whether
>     > a WARNING would seem too in-your-face.
>
>     I'd hope for a warning from basically every part of the system when it
>     detects, clearly, that a file was changed in a way that it shouldn't
>     have been.  If we don't throw a warning, then we're implying that it's
>     acceptable, but then cleaning up the duplicates, which seems pretty
>     confusing.
>
>
> +1. Silently "fixing" the file by cleaning up duplicates is going to
> be even more confusing to uses who had seen them be there before.
>
> --
>  Magnus Hagander
>  Me: https://www.hagander.net/ <http://www.hagander.net/>
>  Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

I thinking fixing this silently should be at least a hanging offence.

At one time I came a cross a language PL/1, that would silently
'correct' some mistakes, without indicating what it did.  I thought this
was extremely dangerous, that could lead to some very nasty and
unexpected bugs!

It is most important that people be aware of possibly conflicting
changes, or that values they saw in postgresql.conf may have been changed.

Hmm... this suggests that all the implied defaults should be explicitly
set!  Would that be too greater change to make?


Cheers,
Gavin



Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Stop ALTER SYSTEM from making bad assumptions

Juan José Santamaría Flecha
On Mon, Jun 17, 2019 at 12:57 AM Gavin Flower
<[hidden email]> wrote:
>
>
> I thinking fixing this silently should be at least a hanging offence.
>

Maybe adding a MD5 header to the file to check if it has been altered
outside guc.c might be enough.

Regards,

Juan José Santamaría Flecha


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Stop ALTER SYSTEM from making bad assumptions

Ian Barwick-3
In reply to this post by Stephen Frost
Hi

On 6/15/19 1:08 AM, Stephen Frost wrote:
 > Greetings,
 >
 > * Ian Barwick ([hidden email]) wrote:
 >> Consider the following cascading standby setup with PostgreSQL 12:
 >>
 >> - there exists a running primary "A"
 >> - standby "B" is cloned from primary "A" using "pg_basebackup --write-recovery-conf"
 >> - standby "C" is cloned from standby "B" using "pg_basebackup --write-recovery-conf"
(...)
 >> However, executing "ALTER SYSTEM SET primary_conninfo = 'host=someothernode'" left
 >> standby "C"'s "postgresql.auto.conf" file looking like this:
 >>
 >> # Do not edit this file manually!
 >> # It will be overwritten by the ALTER SYSTEM command.
 >> primary_conninfo = 'host=someothernode'
 >> primary_conninfo = 'host=node_B'
 >>
 >> which seems somewhat broken, to say the least.
 >
 > Yes, it's completely broken, which I've complained about at least twice
 > on this list to no avail.
 >
 > Thanks for putting together an example case pointing out why it's a
 > serious issue.  The right thing to do here it so create an open item for
 > PG12 around this.

Done.

 >> Attached patch attempts to rectify this situation by having replace_auto_config_value()
 >> deleting any duplicate entries first, before making any changes to the last entry.
 >
 > While this might be a good belt-and-suspenders kind of change to
 > include, I don't think pg_basebackup should be causing us to have
 > multiple entries in the file in the first place..
(...)
 >> Also attached is a set of TAP tests to check ALTER SYSTEM works as expected (or
 >> at least as seems correct to me).
 >
 > In my view, at least, we should have a similar test for pg_basebackup to
 > make sure that it doesn't create an invalid .auto.conf file.

Indeed... I'd be happy to create tests... but first we need a definition of what
constitutes a valid .auto.conf file.

If that definition includes requiring that a parameter may occur only once, then
we need to provide a way for utilities such as pg_basebackup to write the replication
configuration to a configuration file in such a way that it doesn't somehow
render that file invalid.

In Pg11 and earlier, it was just a case of writing (or overwriting) recovery.conf.

In Pg12, the code in pg_basebackup implies the correct thing to do is append to .auto.conf,
but as demonstrated that can cause problems with duplicate entries.

Having pg_basebackup, or any other utility which clones a standby, parse the file
itself to remove duplicates seems like a recipe for pain and badly duplicated effort
(unless there's some way of calling the configuration parsing code while the
server is not running).

We could declare that the .auto.conf file will be reset to the default state when
a standby is cloned, but the implicit behaviour so far has been to copy the file
as-is (as would happen with any other configuration files in the data directory).

We could avoid the need for modifying the .auto.conf file by declaring that a
configuration with a specific name in the data directory (let's call it
"recovery.conf" or "replication.conf") can be used by any utilities writing
replication configuration (though of course in contrast to the old recovery.conf
it would be included, if exists, as a normal configuration file, though then the
precedence would need to be defined, etc..). I'm not sure off the top of my head
whether something like that has already been discussed, though it's presumably a
bit late in the release cycle to make such changes anyway?

 >>> This is absolutely the fault of the system for putting in multiple
 >>> entries into the auto.conf, which it wasn't ever written to handle.
 >>
 > * Amit Kapila ([hidden email]) wrote:
 >> Right.  I think if possible, it should use existing infrastructure to
 >> write to postgresql.auto.conf rather than inventing a new way to
 >> change it.  Apart from this issue, if we support multiple ways to edit
 >> postgresql.auto.conf, we might end up with more problems like this in
 >> the future where one system is not aware of the way file being edited
 >> by another system.
 >
 > I agere that there should have been some effort put into making the way
 > ALTER SYSTEM is modified be consistent between the backend and utilities
 > like pg_basebackup (which would also help third party tools understand
 > how a non-backend application should be modifying the file).

Did you mean to say "the way postgresql.auto.conf is modified"?

I suggest explicitly documenting postgresql.auto.conf behaviour (and the circumstances
where it's acceptable to modify it outside of ALTER SYSTEM calls) in the documentation
(and possibly in the code), so anyone writing utilities which need to
append to postgresql.auto.conf knows what the situation is.

Something along the following lines?

- postgresql.auto.conf is maintained by PostgreSQL and can be rewritten at will by the system
   at any time
- there is no guarantee that items in postgresql.auto.conf will be in a particular order
- it  must never be manually edited (though it may be viewed)
- postgresql.auto.conf may be appended to by utilities which need to write configuration
   items and which and need a guarantee that the items will be read by the server at startup
   (but only when the server is down of course)
- any duplicate items will be removed when ALTER SYSTEM is executed to change or reset
   an item (a WARNING will be emitted about duplicate items removed)
- comment lines (apart from the warning at the top of the file) will be silently removed
   (this is currently the case anyway)


I will happily work on those changes in the next few days if agreed.



Regards

Ian Barwick

--
  Ian Barwick                   https://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Stop ALTER SYSTEM from making bad assumptions

Ian Barwick-3
In reply to this post by Magnus Hagander-2
On 6/17/19 2:58 AM, Magnus Hagander wrote:

> On Sun, Jun 16, 2019 at 7:43 PM Stephen Frost <[hidden email] <mailto:[hidden email]>> wrote:
>
>
>     * Tom Lane ([hidden email] <mailto:[hidden email]>) wrote:
>      > Stephen Frost <[hidden email] <mailto:[hidden email]>> writes:
>
>      > > what we should do is clean them up (and possibly
>      > > throw a WARNING or similar at the user saying "something modified your
>      > > postgresql.auto.conf in an unexpected way").  I'd suggest we do that on
>      > > every ALTER SYSTEM call.
>      >
>      > +1 for having ALTER SYSTEM clean out duplicates.  Not sure whether
>      > a WARNING would seem too in-your-face.
>
>     I'd hope for a warning from basically every part of the system when it
>     detects, clearly, that a file was changed in a way that it shouldn't
>     have been.  If we don't throw a warning, then we're implying that it's
>     acceptable, but then cleaning up the duplicates, which seems pretty
>     confusing.
>
> > +1. Silently "fixing" the file by cleaning up duplicates is going to be even
 >  more confusing o uses who had seen them be there before.

Some sort of notification is definitely appropriate here.

However, going back to the original scenario (cascaded standby set up using
"pg_basebackup --write-recovery-conf") there would now be a warning emitted
the first time anyone executes ALTER SYSTEM (about duplicate "primary_conninfo"
entries) which would not have occured in Pg11 and earlier (and which will
no doubt cause consternation along the lines "how did my postgresql.auto.conf
get modified in an unexpected way? OMG? Bug? Was I hacked?").


Regards

Ian Barwick
--
  Ian Barwick                   https://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Stop ALTER SYSTEM from making bad assumptions

Stephen Frost
Greetings,

* Ian Barwick ([hidden email]) wrote:
> However, going back to the original scenario (cascaded standby set up using
> "pg_basebackup --write-recovery-conf") there would now be a warning emitted
> the first time anyone executes ALTER SYSTEM (about duplicate "primary_conninfo"
> entries) which would not have occured in Pg11 and earlier (and which will
> no doubt cause consternation along the lines "how did my postgresql.auto.conf
> get modified in an unexpected way? OMG? Bug? Was I hacked?").

No, I don't think we should end up in a situation where this happens.

I agree that this implies making pg_basebackup more intelligent when
it's dealing with that file but I simply don't have a lot of sympathy
about that, it's not news to anyone who has been paying attention.

Thanks,

Stephen

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Stop ALTER SYSTEM from making bad assumptions

Stephen Frost
In reply to this post by Ian Barwick-3
Greetings,

* Ian Barwick ([hidden email]) wrote:

> On 6/15/19 1:08 AM, Stephen Frost wrote:
> > * Ian Barwick ([hidden email]) wrote:
> >> Consider the following cascading standby setup with PostgreSQL 12:
> >>
> >> - there exists a running primary "A"
> >> - standby "B" is cloned from primary "A" using "pg_basebackup --write-recovery-conf"
> >> - standby "C" is cloned from standby "B" using "pg_basebackup --write-recovery-conf"
> (...)
> >> However, executing "ALTER SYSTEM SET primary_conninfo = 'host=someothernode'" left
> >> standby "C"'s "postgresql.auto.conf" file looking like this:
> >>
> >> # Do not edit this file manually!
> >> # It will be overwritten by the ALTER SYSTEM command.
> >> primary_conninfo = 'host=someothernode'
> >> primary_conninfo = 'host=node_B'
> >>
> >> which seems somewhat broken, to say the least.
> >
> > Yes, it's completely broken, which I've complained about at least twice
> > on this list to no avail.
> >
> > Thanks for putting together an example case pointing out why it's a
> > serious issue.  The right thing to do here it so create an open item for
> > PG12 around this.
>
> Done.
Thanks.

> >> Attached patch attempts to rectify this situation by having replace_auto_config_value()
> >> deleting any duplicate entries first, before making any changes to the last entry.
> >
> > While this might be a good belt-and-suspenders kind of change to
> > include, I don't think pg_basebackup should be causing us to have
> > multiple entries in the file in the first place..
> (...)
> >> Also attached is a set of TAP tests to check ALTER SYSTEM works as expected (or
> >> at least as seems correct to me).
> >
> > In my view, at least, we should have a similar test for pg_basebackup to
> > make sure that it doesn't create an invalid .auto.conf file.
>
> Indeed... I'd be happy to create tests... but first we need a definition of what
> constitutes a valid .auto.conf file.
>
> If that definition includes requiring that a parameter may occur only once, then
> we need to provide a way for utilities such as pg_basebackup to write the replication
> configuration to a configuration file in such a way that it doesn't somehow
> render that file invalid.
Yes, I think that we do need to require that a parameter only occur once
and pg_basebackup and friends need to be able to manage that.

> In Pg11 and earlier, it was just a case of writing (or overwriting) recovery.conf.

Right.

> In Pg12, the code in pg_basebackup implies the correct thing to do is append to .auto.conf,
> but as demonstrated that can cause problems with duplicate entries.

Code can have bugs. :)  I'd argue that this is such a bug that needs to
be fixed..

> Having pg_basebackup, or any other utility which clones a standby, parse the file
> itself to remove duplicates seems like a recipe for pain and badly duplicated effort
> (unless there's some way of calling the configuration parsing code while the
> server is not running).

I don't really see that there's much hope for it.

> We could declare that the .auto.conf file will be reset to the default state when
> a standby is cloned, but the implicit behaviour so far has been to copy the file
> as-is (as would happen with any other configuration files in the data directory).
>
> We could avoid the need for modifying the .auto.conf file by declaring that a
> configuration with a specific name in the data directory (let's call it
> "recovery.conf" or "replication.conf") can be used by any utilities writing
> replication configuration (though of course in contrast to the old recovery.conf
> it would be included, if exists, as a normal configuration file, though then the
> precedence would need to be defined, etc..). I'm not sure off the top of my head
> whether something like that has already been discussed, though it's presumably a
> bit late in the release cycle to make such changes anyway?
This was discussed a fair bit, including suggestions along exactly those
lines.  There were various arguments for and against, so you might want
to review the threads where that discussion happened to see what the
reasoning was for not having such an independent file.

> >>> This is absolutely the fault of the system for putting in multiple
> >>> entries into the auto.conf, which it wasn't ever written to handle.
> >>
> > * Amit Kapila ([hidden email]) wrote:
> >> Right.  I think if possible, it should use existing infrastructure to
> >> write to postgresql.auto.conf rather than inventing a new way to
> >> change it.  Apart from this issue, if we support multiple ways to edit
> >> postgresql.auto.conf, we might end up with more problems like this in
> >> the future where one system is not aware of the way file being edited
> >> by another system.
> >
> > I agere that there should have been some effort put into making the way
> > ALTER SYSTEM is modified be consistent between the backend and utilities
> > like pg_basebackup (which would also help third party tools understand
> > how a non-backend application should be modifying the file).
>
> Did you mean to say "the way postgresql.auto.conf is modified"?
Ah, yes, more-or-less.  I think I was going for 'the way ALTER SYSTEM
modifies postgresql.auto.conf'.

> I suggest explicitly documenting postgresql.auto.conf behaviour (and the circumstances
> where it's acceptable to modify it outside of ALTER SYSTEM calls) in the documentation
> (and possibly in the code), so anyone writing utilities which need to
> append to postgresql.auto.conf knows what the situation is.

Yeah, I would think that, ideally, we'd have some code in the common
library that other utilities could leverage and which the backend would
also use.

> - postgresql.auto.conf is maintained by PostgreSQL and can be rewritten at will by the system
>   at any time

I'd further say something along the lines of 'utilities should not
modify a postgresql.auto.conf that's in place under a running PostgreSQL
cluster'.

> - there is no guarantee that items in postgresql.auto.conf will be in a particular order
> - it  must never be manually edited (though it may be viewed)

'must' is perhaps a bit strong...  I would say something like
"shouldn't", as users may *have* to modify it, if PostgreSQL won't
start due to some configuration in it.

> - postgresql.auto.conf may be appended to by utilities which need to write configuration
>   items and which and need a guarantee that the items will be read by the server at startup
>   (but only when the server is down of course)

Well, I wouldn't support saying "append" since that's what got us into
this mess. :)

> - any duplicate items will be removed when ALTER SYSTEM is executed to change or reset
>   an item (a WARNING will be emitted about duplicate items removed)
> - comment lines (apart from the warning at the top of the file) will be silently removed
>   (this is currently the case anyway)

I'd rather say that 'any duplicate items should be removed, and a
WARNING emitted when detected', or something along those lines.  Same
for comment lines...

> I will happily work on those changes in the next few days if agreed.

Great!

Thanks,

Stephen

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Stop ALTER SYSTEM from making bad assumptions

Magnus Hagander-2


On Mon, Jun 17, 2019 at 5:41 PM Stephen Frost <[hidden email]> wrote:

* Ian Barwick ([hidden email]) wrote:
> On 6/15/19 1:08 AM, Stephen Frost wrote:
> > * Ian Barwick ([hidden email]) wrote:

> >> Attached patch attempts to rectify this situation by having replace_auto_config_value()
> >> deleting any duplicate entries first, before making any changes to the last entry.
> >
> > While this might be a good belt-and-suspenders kind of change to
> > include, I don't think pg_basebackup should be causing us to have
> > multiple entries in the file in the first place..
> (...)
> >> Also attached is a set of TAP tests to check ALTER SYSTEM works as expected (or
> >> at least as seems correct to me).
> >
> > In my view, at least, we should have a similar test for pg_basebackup to
> > make sure that it doesn't create an invalid .auto.conf file.
>
> Indeed... I'd be happy to create tests... but first we need a definition of what
> constitutes a valid .auto.conf file.
>
> If that definition includes requiring that a parameter may occur only once, then
> we need to provide a way for utilities such as pg_basebackup to write the replication
> configuration to a configuration file in such a way that it doesn't somehow
> render that file invalid.

Yes, I think that we do need to require that a parameter only occur once
and pg_basebackup and friends need to be able to manage that.

+1.


> > I agere that there should have been some effort put into making the way
> > ALTER SYSTEM is modified be consistent between the backend and utilities
> > like pg_basebackup (which would also help third party tools understand
> > how a non-backend application should be modifying the file).
>
> Did you mean to say "the way postgresql.auto.conf is modified"?

Ah, yes, more-or-less.  I think I was going for 'the way ALTER SYSTEM
modifies postgresql.auto.conf'.

> I suggest explicitly documenting postgresql.auto.conf behaviour (and the circumstances
> where it's acceptable to modify it outside of ALTER SYSTEM calls) in the documentation
> (and possibly in the code), so anyone writing utilities which need to
> append to postgresql.auto.conf knows what the situation is.

Yeah, I would think that, ideally, we'd have some code in the common
library that other utilities could leverage and which the backend would
also use.

> - postgresql.auto.conf is maintained by PostgreSQL and can be rewritten at will by the system
>   at any time

I'd further say something along the lines of 'utilities should not
modify a postgresql.auto.conf that's in place under a running PostgreSQL
cluster'.

Do we need to differ between "external" and "internal" utilities here?

 
> - there is no guarantee that items in postgresql.auto.conf will be in a particular order
> - it  must never be manually edited (though it may be viewed)

'must' is perhaps a bit strong...  I would say something like
"shouldn't", as users may *have* to modify it, if PostgreSQL won't
start due to some configuration in it.


+1.


> - postgresql.auto.conf may be appended to by utilities which need to write configuration
>   items and which and need a guarantee that the items will be read by the server at startup
>   (but only when the server is down of course)

Well, I wouldn't support saying "append" since that's what got us into
this mess. :)

> - any duplicate items will be removed when ALTER SYSTEM is executed to change or reset
>   an item (a WARNING will be emitted about duplicate items removed)
> - comment lines (apart from the warning at the top of the file) will be silently removed
>   (this is currently the case anyway)

I'd rather say that 'any duplicate items should be removed, and a
WARNING emitted when detected', or something along those lines.  Same
for comment lines...

I think it's perfectly fine to silently drop comments (other than the one at the very top which says don't touch this file).

//Magnus
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Stop ALTER SYSTEM from making bad assumptions

Stephen Frost
Greetings,

* Magnus Hagander ([hidden email]) wrote:
> On Mon, Jun 17, 2019 at 5:41 PM Stephen Frost <[hidden email]> wrote:
> > I'd further say something along the lines of 'utilities should not
> > modify a postgresql.auto.conf that's in place under a running PostgreSQL
> > cluster'.
>
> Do we need to differ between "external" and "internal" utilities here?

I don't think so..?  Is there something there that you're thinking would
be different between them?

> > I'd rather say that 'any duplicate items should be removed, and a
> > WARNING emitted when detected', or something along those lines.  Same
> > for comment lines...
>
> I think it's perfectly fine to silently drop comments (other than the one
> at the very top which says don't touch this file).

I'm not sure why that's different?  I don't really think that I agree
with you on this one- anything showing up in that file that we're ending
up removing must have gotten there because someone or something didn't
realize the rules around managing the file, and that's a problem...

Thanks,

Stephen

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Stop ALTER SYSTEM from making bad assumptions

Magnus Hagander-2

On Tue, Jun 18, 2019 at 3:37 PM Stephen Frost <[hidden email]> wrote:
Greetings,

* Magnus Hagander ([hidden email]) wrote:
> On Mon, Jun 17, 2019 at 5:41 PM Stephen Frost <[hidden email]> wrote:
> > I'd further say something along the lines of 'utilities should not
> > modify a postgresql.auto.conf that's in place under a running PostgreSQL
> > cluster'.
>
> Do we need to differ between "external" and "internal" utilities here?

I don't think so..?  Is there something there that you're thinking would
be different between them?

Probably not. In general thinking that we could "allow" internal tools to do things externals shouldn't do, for example using internal APIs. But it's probably a bad idea to go down that road.


> > I'd rather say that 'any duplicate items should be removed, and a
> > WARNING emitted when detected', or something along those lines.  Same
> > for comment lines...
>
> I think it's perfectly fine to silently drop comments (other than the one
> at the very top which says don't touch this file).

I'm not sure why that's different?  I don't really think that I agree
with you on this one- anything showing up in that file that we're ending
up removing must have gotten there because someone or something didn't
realize the rules around managing the file, and that's a problem...

I'm not strongly against it, I just consider it unnecessary :) 

--
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Stop ALTER SYSTEM from making bad assumptions

akapila
In reply to this post by Ian Barwick-3
On Mon, Jun 17, 2019 at 8:20 PM Ian Barwick <[hidden email]> wrote:

> On 6/15/19 1:08 AM, Stephen Frost wrote:
>  > * Amit Kapila ([hidden email]) wrote:
>  >> Right.  I think if possible, it should use existing infrastructure to
>  >> write to postgresql.auto.conf rather than inventing a new way to
>  >> change it.  Apart from this issue, if we support multiple ways to edit
>  >> postgresql.auto.conf, we might end up with more problems like this in
>  >> the future where one system is not aware of the way file being edited
>  >> by another system.
>  >
>  > I agere that there should have been some effort put into making the way
>  > ALTER SYSTEM is modified be consistent between the backend and utilities
>  > like pg_basebackup (which would also help third party tools understand
>  > how a non-backend application should be modifying the file).
>
> Did you mean to say "the way postgresql.auto.conf is modified"?
>

Yes, that is what we are discussing here.  I think what we can do here
is to extract the functionality to set the parameter in .auto.conf
from AlterSystemSetConfigFile and expose it via a function that takes
(option_name, value) as a parameter.  Then we can expose it via some
SQL function like set_auto_config (similar to what we have now for
set_config/set_config_by_name).  I think if we have something like
that then pg_basebackup or any other utility can use it in a
consistent way.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Stop ALTER SYSTEM from making bad assumptions

Ian Barwick-3
On 6/19/19 12:46 PM, Amit Kapila wrote:

> On Mon, Jun 17, 2019 at 8:20 PM Ian Barwick <[hidden email]> wrote:
>> On 6/15/19 1:08 AM, Stephen Frost wrote:
>>   > * Amit Kapila ([hidden email]) wrote:
>>   >> Right.  I think if possible, it should use existing infrastructure to
>>   >> write to postgresql.auto.conf rather than inventing a new way to
>>   >> change it.  Apart from this issue, if we support multiple ways to edit
>>   >> postgresql.auto.conf, we might end up with more problems like this in
>>   >> the future where one system is not aware of the way file being edited
>>   >> by another system.
>>   >
>>   > I agere that there should have been some effort put into making the way
>>   > ALTER SYSTEM is modified be consistent between the backend and utilities
>>   > like pg_basebackup (which would also help third party tools understand
>>   > how a non-backend application should be modifying the file).
>>
>> Did you mean to say "the way postgresql.auto.conf is modified"?
>>
>
> Yes, that is what we are discussing here.  I think what we can do here
> is to extract the functionality to set the parameter in .auto.conf
> from AlterSystemSetConfigFile and expose it via a function that takes
> (option_name, value) as a parameter.

Yup, I was just considering what's involved there, will reply to another
mail in the thread on that.

> Then we can expose it via some
> SQL function like set_auto_config (similar to what we have now for
> set_config/set_config_by_name).  I think if we have something like
> that then pg_basebackup or any other utility can use it in a
> consistent way.

Umm, but the point is here, the server will *not* be running at this point,
so calling an SQL function is out of the question. And if the server
is running, then you just execute "ALTER SYSTEM".


Regards

Ian Barwick

--
  Ian Barwick                   https://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Stop ALTER SYSTEM from making bad assumptions

Ian Barwick-3
In reply to this post by Stephen Frost
n 6/18/19 12:41 AM, Stephen Frost wrote:
 > Greetings,
 >
 > * Ian Barwick ([hidden email]) wrote
(...)

 >> I suggest explicitly documenting postgresql.auto.conf behaviour (and the circumstances
 >> where it's acceptable to modify it outside of ALTER SYSTEM calls) in the documentation
 >> (and possibly in the code), so anyone writing utilities which need to
 >> append to postgresql.auto.conf knows what the situation is.
 >
 > Yeah, I would think that, ideally, we'd have some code in the common
 > library that other utilities could leverage and which the backend would
 > also use.

So maybe something along the lines of creating a stripped-down variant of
AlterSystemSetConfigFile() (from "src/backend/utils/misc/guc.c") which can be
called from front-end code to safely modify .auto.conf while the server is *not*
running.

I'm not terribly familiar with the GUC code, but that would presumably mean making
parts of the GUC parsing/handling code linkable externally (ParseConfigFp() etc.)
as you'd need to parse the file before rewriting it. Something like (minimal
pseudo-code):

     void
     alter_system_set(char *name, char *value)
     {
         /*
          * check here that the server is *not* running
          */
         ...
         ParseConfigFp(infile, AutoConfFileName, 0, LOG, &head, &tail)
         ...

         /*
          * some robust portable way of ensuring another process can't
          * modify the file(s) until we're done
          */
         lock_file(AutoConfFileName);

         replace_auto_config_value(&head, &tail, name, value);

         write_auto_conf_file(AutoConfTmpFileName, head)

         durable_rename(AutoConfTmpFileName, AutoConfFileName, ERROR);

         FreeConfigVariables(head);
         unlock_file(AutoConfFileName);
     }

I'm not sure how feasible it is to validate the provided parameter
without the server running, but if not possible, that's not any worse than the
status quo, i.e. the utility has to be trusted to write the correct parameters
to the file anyway.

The question is though - is this a change which is practical to make at this point
in the release cycle for Pg12?


Regards

Ian Barwick



--
  Ian Barwick                   https://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Stop ALTER SYSTEM from making bad assumptions

akapila
In reply to this post by Ian Barwick-3
On Wed, Jun 19, 2019 at 10:09 AM Ian Barwick
<[hidden email]> wrote:

>
> On 6/19/19 12:46 PM, Amit Kapila wrote:
> > On Mon, Jun 17, 2019 at 8:20 PM Ian Barwick <[hidden email]> wrote:
> >> On 6/15/19 1:08 AM, Stephen Frost wrote:
> >>   > * Amit Kapila ([hidden email]) wrote:
> >>   >> Right.  I think if possible, it should use existing infrastructure to
> >>   >> write to postgresql.auto.conf rather than inventing a new way to
> >>   >> change it.  Apart from this issue, if we support multiple ways to edit
> >>   >> postgresql.auto.conf, we might end up with more problems like this in
> >>   >> the future where one system is not aware of the way file being edited
> >>   >> by another system.
> >>   >
> >>   > I agere that there should have been some effort put into making the way
> >>   > ALTER SYSTEM is modified be consistent between the backend and utilities
> >>   > like pg_basebackup (which would also help third party tools understand
> >>   > how a non-backend application should be modifying the file).
> >>
> >> Did you mean to say "the way postgresql.auto.conf is modified"?
> >>
> >
> > Yes, that is what we are discussing here.  I think what we can do here
> > is to extract the functionality to set the parameter in .auto.conf
> > from AlterSystemSetConfigFile and expose it via a function that takes
> > (option_name, value) as a parameter.
>
> Yup, I was just considering what's involved there, will reply to another
> mail in the thread on that.
>
> > Then we can expose it via some
> > SQL function like set_auto_config (similar to what we have now for
> > set_config/set_config_by_name).  I think if we have something like
> > that then pg_basebackup or any other utility can use it in a
> > consistent way.
>
> Umm, but the point is here, the server will *not* be running at this point,
> so calling an SQL function is out of the question. And if the server
> is running, then you just execute "ALTER SYSTEM".
>

Sure, SQL function will be a by-product of this.  Can't we expose some
function that can be used by base backup?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


1234 ... 6