our checks for read-only queries are not great

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
35 messages Options
12
Reply | Threaded
Open this post in threaded view
|

our checks for read-only queries are not great

Robert Haas
I spent some time studying the question of how we classify queries as
either read-only or not, and our various definitions of read-only, and
found some bugs. Specifically:

1. check_xact_readonly() prohibits most kinds of DDL in read-only
transactions, but a bunch of recently-added commands were not added to
the list. The missing node types are T_CreatePolicyStmt,
T_AlterPolicyStmt, T_CreateAmStmt, T_CreateStatsStmt,
T_AlterStatsStmt, and T_AlterCollationStmt, which means you can run
these commands in a read-only transaction with no problem and even
attempt to run them on a standby. The ones I tested on a standby all
fail with random-ish error messages due to lower-level checks, but
that's not a great situation.

2. There are comments in utility.c which assert that certain commands
are "forbidden in parallel mode due to CommandIsReadOnly." That's
technically correct, but silly and misleading.These commands wouldn't
be running in parallel mode unless they were running inside of a
function or procedure or something, and, if they are,
CommandIsReadOnly() checks in spi.c or functions.c would prevent not
only these commands but, in fact, all utility commands, so calling out
those particular ones is just adding confusion. Also, the underlying
restriction is unnecessary, because there's no good reason to prevent
the use of things like SHOW and DO in parallel mode, yet we currently
do.

The problems mentioned under (1) are technically the fault of the
people who wrote, reviewed, and committed the patches which added
those command types, and the problems mentioned under (2) are
basically my fault, dating back to the original ParallelContext patch.
However, I think that all of them can be tracked back to a more
fundamental underlying cause, which is that the way that the various
restrictions on read-write queries are implemented is pretty
confusing. Attached is a patch I wrote to try to improve things. It
centralizes three decisions that are currently made in different
places in a single place: (a) can this be run in a read only
transaction? (b) can it run in parallel mode? (c) can it run on a
standby? -- and along the way, it fixes the problems mentioned above
and tries to supply slightly improved comments. Perhaps we should
back-patch fixes at least for (1) even if this gets committed, but I
guess my first question is what people think of this approach to the
problem.

Thanks,

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

0001-Fix-problems-with-read-only-query-checks-and-refacto.patch (32K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: our checks for read-only queries are not great

Tom Lane-2
Robert Haas <[hidden email]> writes:
> I spent some time studying the question of how we classify queries as
> either read-only or not, and our various definitions of read-only, and
> found some bugs. ...
> However, I think that all of them can be tracked back to a more
> fundamental underlying cause, which is that the way that the various
> restrictions on read-write queries are implemented is pretty
> confusing. Attached is a patch I wrote to try to improve things.

Hmm.  I like the idea of deciding this in one place and insisting that
that one place have a switch case for every statement type.  That'll
address the root issue that people fail to think about this when adding
new statements.

I'm less enamored of some of the specific decisions here.  Notably

* I find COMMAND_IS_WEAKLY_READ_ONLY to be a more confusing concept
than what it replaces.  The test for LockStmt is an example --- the
comment talks about restricting locks during recovery, which is fine and
understandable, but then it's completely unobvious that the actual code
implements that behavior rather than some other one.

* ALTER SYSTEM SET is readonly?  Say what?  Even if that's how the current
code behaves, it's a damfool idea and we should change it.  I think that
the semantics we are really trying to implement for read-only is "has no
effects visible outside the current session" --- this explains, for
example, why copying into a temp table is OK.  ALTER SYSTEM SET certainly
isn't that.

I haven't read all of the code; those were just a couple points that
jumped out at me.

I think if we can sort out the notation for how the restrictions
are expressed, this'll be a good improvement.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: our checks for read-only queries are not great

Robert Haas
On Wed, Jan 8, 2020 at 2:57 PM Tom Lane <[hidden email]> wrote:
> Hmm.  I like the idea of deciding this in one place and insisting that
> that one place have a switch case for every statement type.  That'll
> address the root issue that people fail to think about this when adding
> new statements.

Right. Assuming they test their new command at least one time, they
should notice.  :-)

> I'm less enamored of some of the specific decisions here.  Notably
>
> * I find COMMAND_IS_WEAKLY_READ_ONLY to be a more confusing concept
> than what it replaces.  The test for LockStmt is an example --- the
> comment talks about restricting locks during recovery, which is fine and
> understandable, but then it's completely unobvious that the actual code
> implements that behavior rather than some other one.

Uh, suggestions?

> * ALTER SYSTEM SET is readonly?  Say what?  Even if that's how the current
> code behaves, it's a damfool idea and we should change it.  I think that
> the semantics we are really trying to implement for read-only is "has no
> effects visible outside the current session" --- this explains, for
> example, why copying into a temp table is OK.  ALTER SYSTEM SET certainly
> isn't that.

It would be extremely lame and a huge usability regression to
arbitrary restrict ALTER SYSTEM SET on standby nodes for no reason.
Editing the postgresql.auto.conf file works just fine there, and is a
totally sensible thing to want to do. You could argue for restricting
it in parallel mode just out of general paranoia, but I don't favor
that approach.

> I think if we can sort out the notation for how the restrictions
> are expressed, this'll be a good improvement.

Thanks.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: our checks for read-only queries are not great

Tom Lane-2
Robert Haas <[hidden email]> writes:
> On Wed, Jan 8, 2020 at 2:57 PM Tom Lane <[hidden email]> wrote:
>> * I find COMMAND_IS_WEAKLY_READ_ONLY to be a more confusing concept
>> than what it replaces.  The test for LockStmt is an example --- the
>> comment talks about restricting locks during recovery, which is fine and
>> understandable, but then it's completely unobvious that the actual code
>> implements that behavior rather than some other one.

> Uh, suggestions?

COMMAND_NOT_IN_RECOVERY, maybe?

>> * ALTER SYSTEM SET is readonly?  Say what?

> It would be extremely lame and a huge usability regression to
> arbitrary restrict ALTER SYSTEM SET on standby nodes for no reason.

I didn't say that it shouldn't be allowed on standby nodes.  I said
it shouldn't be allowed in transactions that have explicitly declared
themselves to be read-only.  Maybe we need to disaggregate those
concepts a bit more --- a refactoring such as this would be a fine
time to do that.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: our checks for read-only queries are not great

Robert Haas
On Wed, Jan 8, 2020 at 3:26 PM Tom Lane <[hidden email]> wrote:

> Robert Haas <[hidden email]> writes:
> > On Wed, Jan 8, 2020 at 2:57 PM Tom Lane <[hidden email]> wrote:
> >> * I find COMMAND_IS_WEAKLY_READ_ONLY to be a more confusing concept
> >> than what it replaces.  The test for LockStmt is an example --- the
> >> comment talks about restricting locks during recovery, which is fine and
> >> understandable, but then it's completely unobvious that the actual code
> >> implements that behavior rather than some other one.
>
> > Uh, suggestions?
>
> COMMAND_NOT_IN_RECOVERY, maybe?

Well, maybe I should just get rid of COMMAND_IS_WEAKLY_READ_ONLY and
return individual COMMAND_OK_IN_* flags for those cases.

> >> * ALTER SYSTEM SET is readonly?  Say what?
>
> > It would be extremely lame and a huge usability regression to
> > arbitrary restrict ALTER SYSTEM SET on standby nodes for no reason.
>
> I didn't say that it shouldn't be allowed on standby nodes.

Oh, OK. I guess I misunderstood.

> I said
> it shouldn't be allowed in transactions that have explicitly declared
> themselves to be read-only.  Maybe we need to disaggregate those
> concepts a bit more --- a refactoring such as this would be a fine
> time to do that.

Yeah, the current rules are pretty weird. Aside from ALTER SYSTEM ..
SET, read-only transaction seem to allow writes to temporary relations
and sequences, plus CLUSTER, REINDEX, VACUUM, PREPARE, ROLLBACK
PREPARED, and COMMIT PREPARED, all of which sound a lot like writes to
me. They also allow LISTEN and SET which are have transactional
behavior in general but for some reason don't feel they need to
respect the R/O property. I worry that if we start whacking these
behaviors around we'll get complaints, so I'm cautious about doing
that. At the least, we would need to have a real clear definition, and
if there is such a definition that covers the current cases, I can't
guess what it is. Forget ALTER SYSTEM for a minute -- why is it OK to
rewrite a table via CLUSTER in a R/O transaction, but not OK to do an
ALTER TABLE that changes the clustering index? Why is it not OK to
LISTEN on a standby (and presumably not get any notifications until a
promotion occurs) but OK to UNLISTEN? Whatever reasons may have
justified the current choice of behaviors are probably lost in the
sands of time; they are for sure unknown to me.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: our checks for read-only queries are not great

Stephen Frost
Greetings,

(I'm also overall in favor of the direction this is going, so general +1
from me, and I took a quick look through the patch and didn't
particularly see anything I didn't like besides what's commented on
below.)

* Robert Haas ([hidden email]) wrote:

> On Wed, Jan 8, 2020 at 3:26 PM Tom Lane <[hidden email]> wrote:
> > Robert Haas <[hidden email]> writes:
> > > On Wed, Jan 8, 2020 at 2:57 PM Tom Lane <[hidden email]> wrote:
> > >> * I find COMMAND_IS_WEAKLY_READ_ONLY to be a more confusing concept
> > >> than what it replaces.  The test for LockStmt is an example --- the
> > >> comment talks about restricting locks during recovery, which is fine and
> > >> understandable, but then it's completely unobvious that the actual code
> > >> implements that behavior rather than some other one.
> >
> > > Uh, suggestions?
> >
> > COMMAND_NOT_IN_RECOVERY, maybe?
>
> Well, maybe I should just get rid of COMMAND_IS_WEAKLY_READ_ONLY and
> return individual COMMAND_OK_IN_* flags for those cases.
Yeah, I don't like the WEAKLY_READ_ONLY idea either- explicitly having
COMMAND_OK_IN_X seems cleaner.

> > >> * ALTER SYSTEM SET is readonly?  Say what?
> >
> > > It would be extremely lame and a huge usability regression to
> > > arbitrary restrict ALTER SYSTEM SET on standby nodes for no reason.
> >
> > I didn't say that it shouldn't be allowed on standby nodes.
>
> Oh, OK. I guess I misunderstood.

I agree that we want ALTER SYSTEM SET to work on standbys, but it seems
there isn't really disagreement there.

> > I said
> > it shouldn't be allowed in transactions that have explicitly declared
> > themselves to be read-only.  Maybe we need to disaggregate those
> > concepts a bit more --- a refactoring such as this would be a fine
> > time to do that.
>
> Yeah, the current rules are pretty weird. Aside from ALTER SYSTEM ..
> SET, read-only transaction seem to allow writes to temporary relations
> and sequences, plus CLUSTER, REINDEX, VACUUM, PREPARE, ROLLBACK
> PREPARED, and COMMIT PREPARED, all of which sound a lot like writes to
> me. They also allow LISTEN and SET which are have transactional
> behavior in general but for some reason don't feel they need to
> respect the R/O property. I worry that if we start whacking these
> behaviors around we'll get complaints, so I'm cautious about doing
> that. At the least, we would need to have a real clear definition, and
> if there is such a definition that covers the current cases, I can't
> guess what it is. Forget ALTER SYSTEM for a minute -- why is it OK to
> rewrite a table via CLUSTER in a R/O transaction, but not OK to do an
> ALTER TABLE that changes the clustering index? Why is it not OK to
> LISTEN on a standby (and presumably not get any notifications until a
> promotion occurs) but OK to UNLISTEN? Whatever reasons may have
> justified the current choice of behaviors are probably lost in the
> sands of time; they are for sure unknown to me.
That a 'read-only' transaction can call CLUSTER is definitely bizarre to
me.  As relates to 'UN-SOMETHING', having those be allowed makes sense,
to me anyway, since connection poolers like to do those things and it
should be a no-op more-or-less by definition.  SET isn't changing data
blocks, so that also seems ok for a read-only transaction..  but, yeah,
there's no real great hard-and-fast-rule we've been following.

Would we be able to make a rule of "can't change on-disk stuff, except
for things in temporary schemas" and have it stick without a lot of
complaints?  Seems like that would address Tom's ALTER SYSTEM SET
concern, and would mean CLUSTER/REINDEX/VACUUM are disallowed in a
backwards-incompatible way (though I think I'm fine with that..), and
SET would still be allowed (which strikes me as correct too).  I'm not
quite sure how I feel about LISTEN, but that it could possibly actually
be used post-promotion and doesn't change on-disk stuff makes me feel
like it actually probably should be allowed.

Just looking at what was mentioned here- if there's other cases where
this idea falls flat then let's discuss them.

Thanks,

Stephen

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: our checks for read-only queries are not great

Robert Haas
On Wed, Jan 8, 2020 at 5:57 PM Stephen Frost <[hidden email]> wrote:
> Yeah, I don't like the WEAKLY_READ_ONLY idea either- explicitly having
> COMMAND_OK_IN_X seems cleaner.

Done that way. v2 attached.

> Would we be able to make a rule of "can't change on-disk stuff, except
> for things in temporary schemas" and have it stick without a lot of
> complaints?  Seems like that would address Tom's ALTER SYSTEM SET
> concern, and would mean CLUSTER/REINDEX/VACUUM are disallowed in a
> backwards-incompatible way (though I think I'm fine with that..), and
> SET would still be allowed (which strikes me as correct too).  I'm not
> quite sure how I feel about LISTEN, but that it could possibly actually
> be used post-promotion and doesn't change on-disk stuff makes me feel
> like it actually probably should be allowed.

I think we can make any rule we like, but I think we should have some
measure of broad agreement on it. I'd like to go ahead with this for
now and then further changes can continue to be discussed and debated.
Hopefully we'll get a few more people to weigh in, too, because
deciding something like this based on what a handful of people doesn't
seem like a good idea to me.

I'd be really interested to hear if anyone knows the history behind
allowing CLUSTER, REINDEX, VACUUM, and some operations on temp tables.
It seems to have been that way for a long time. I wonder if it was a
deliberate choice or something that just happened semi-accidentally.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

v2-0001-Fix-problems-with-read-only-query-checks-and-refa.patch (32K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: our checks for read-only queries are not great

Tom Lane-2
Robert Haas <[hidden email]> writes:
> I'd be really interested to hear if anyone knows the history behind
> allowing CLUSTER, REINDEX, VACUUM, and some operations on temp tables.
> It seems to have been that way for a long time. I wonder if it was a
> deliberate choice or something that just happened semi-accidentally.

Within a "read-only" xact you mean?  I believe that allowing DML writes
was intentional.  As for the utility commands, I suspect that it was in
part accidental (error of omission?), and then if anyone thought hard
about it they decided that allowing DML writes to temp tables justifies
those operations too.

Have you tried excavating in our git history to see when the relevant
permission tests originated?

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: our checks for read-only queries are not great

Robert Haas
On Thu, Jan 9, 2020 at 2:24 PM Tom Lane <[hidden email]> wrote:

> Robert Haas <[hidden email]> writes:
> > I'd be really interested to hear if anyone knows the history behind
> > allowing CLUSTER, REINDEX, VACUUM, and some operations on temp tables.
> > It seems to have been that way for a long time. I wonder if it was a
> > deliberate choice or something that just happened semi-accidentally.
>
> Within a "read-only" xact you mean?  I believe that allowing DML writes
> was intentional.  As for the utility commands, I suspect that it was in
> part accidental (error of omission?), and then if anyone thought hard
> about it they decided that allowing DML writes to temp tables justifies
> those operations too.
>
> Have you tried excavating in our git history to see when the relevant
> permission tests originated?

check_xact_readonly() with a long list of command tags originated in
the same commit that added read-only transactions. CLUSTER, REINDEX,
and VACUUM weren't included in the list of prohibited operations then,
either, but it's unclear whether that was a deliberate omission or an
oversight. That commit also thought that COPY FROM - and queries -
should allow temp tables. But there's nothing in the commit that seems
to explain why, unless the commit message itself is a hint:

commit b65cd562402ed9d3206d501cc74dc38bc421b2ce
Author: Peter Eisentraut <[hidden email]>
Date:   Fri Jan 10 22:03:30 2003 +0000

    Read-only transactions, as defined in SQL.

Maybe the SQL standard has something to say about this?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: our checks for read-only queries are not great

Tom Lane-2
Robert Haas <[hidden email]> writes:
> Maybe the SQL standard has something to say about this?

[ pokes around ... ]  Yeah, it does, and I'd say it's pretty clearly
in agreement with what Peter did, so far as DML ops go.  For instance,
this bit from SQL99's description of DELETE:

         1) If the access mode of the current SQL-transaction or the access
            mode of the branch of the current SQL-transaction at the current
            SQL-connection is read-only, and T is not a temporary table,
            then an exception condition is raised: invalid transaction state
            - read-only SQL-transaction.

UPDATE and INSERT say the same.  (I didn't look at later spec versions,
since Peter's 2003 commit was probably based on SQL99.)

You could argue about exactly how to extend that to non-spec
utility commands, but for the most part allowing them seems
to make sense if DML is allowed.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: our checks for read-only queries are not great

Robert Haas
On Thu, Jan 9, 2020 at 3:07 PM Tom Lane <[hidden email]> wrote:

> Robert Haas <[hidden email]> writes:
> > Maybe the SQL standard has something to say about this?
>
> [ pokes around ... ]  Yeah, it does, and I'd say it's pretty clearly
> in agreement with what Peter did, so far as DML ops go.  For instance,
> this bit from SQL99's description of DELETE:
>
>          1) If the access mode of the current SQL-transaction or the access
>             mode of the branch of the current SQL-transaction at the current
>             SQL-connection is read-only, and T is not a temporary table,
>             then an exception condition is raised: invalid transaction state
>             - read-only SQL-transaction.
>
> UPDATE and INSERT say the same.  (I didn't look at later spec versions,
> since Peter's 2003 commit was probably based on SQL99.)

OK. That's good to know.

> You could argue about exactly how to extend that to non-spec
> utility commands, but for the most part allowing them seems
> to make sense if DML is allowed.

But I think we allow them on all tables, not just temp tables, so I
don't think I understand this argument.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: our checks for read-only queries are not great

Robert Haas
On Thu, Jan 9, 2020 at 3:37 PM Robert Haas <[hidden email]> wrote:
> > You could argue about exactly how to extend that to non-spec
> > utility commands, but for the most part allowing them seems
> > to make sense if DML is allowed.
>
> But I think we allow them on all tables, not just temp tables, so I
> don't think I understand this argument.

Oh, wait: I'm conflating two things. The current behavior extends the
spec behavior to COPY in a logical way.

But it also allows CLUSTER, REINDEX, and VACUUM on any table. The spec
presumably has no view on that, nor does the passage you quoted seem
to apply here.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: our checks for read-only queries are not great

Tom Lane-2
In reply to this post by Robert Haas
Robert Haas <[hidden email]> writes:
> On Thu, Jan 9, 2020 at 3:07 PM Tom Lane <[hidden email]> wrote:
>> You could argue about exactly how to extend that to non-spec
>> utility commands, but for the most part allowing them seems
>> to make sense if DML is allowed.

> But I think we allow them on all tables, not just temp tables, so I
> don't think I understand this argument.

Oh, I misunderstood your concern.

Peter might remember more clearly, but I have a feeling that we
concluded that the intent of the spec was for read-only-ness to
disallow globally-visible changes in the visible database contents.
VACUUM, for example, does not cause any visible change, so it
should be admissible.  REINDEX ditto.  (We ignore here the possibility
of such things causing, say, a change in the order in which rows are
returned, since that's beneath the spec's notice to begin with.)
ANALYZE ditto, except to the extent that if you look at pg_stats
you might see something different --- but again, system catalog
contents are outside the spec's purview.

You could extend this line of argument, perhaps, far enough to justify
ALTER SYSTEM SET as well.  But I don't like that because some GUCs have
visible effects on the results that an ordinary query minding its own
business can get.  Timezone is perhaps the poster child there, or
search_path.  If we were to subdivide the GUCs into "affects
implementation details only" vs "can affect query semantics",
I'd hold still for allowing ALTER SYSTEM SET on the former group.
Doubt it's worth the trouble to distinguish, though.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: our checks for read-only queries are not great

Peter Eisentraut-6
On 2020-01-09 21:52, Tom Lane wrote:
> Peter might remember more clearly, but I have a feeling that we
> concluded that the intent of the spec was for read-only-ness to
> disallow globally-visible changes in the visible database contents.

I don't really remember, but that was basically the opinion I had
arrived at as I was reading through this current thread.  Roughly
speaking, anything that changes the database state (data or schema) in a
way that would be reflected in a pg_dump output is not read-only.

> VACUUM, for example, does not cause any visible change, so it
> should be admissible.  REINDEX ditto.  (We ignore here the possibility
> of such things causing, say, a change in the order in which rows are
> returned, since that's beneath the spec's notice to begin with.)

agreed

> ANALYZE ditto, except to the extent that if you look at pg_stats
> you might see something different --- but again, system catalog
> contents are outside the spec's purview.

agreed

> You could extend this line of argument, perhaps, far enough to justify
> ALTER SYSTEM SET as well.  But I don't like that because some GUCs have
> visible effects on the results that an ordinary query minding its own
> business can get.  Timezone is perhaps the poster child there, or
> search_path.  If we were to subdivide the GUCs into "affects
> implementation details only" vs "can affect query semantics",
> I'd hold still for allowing ALTER SYSTEM SET on the former group.
> Doubt it's worth the trouble to distinguish, though.

ALTER SYSTEM is read only in my mind.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: our checks for read-only queries are not great

Robert Haas
On Fri, Jan 10, 2020 at 7:23 AM Peter Eisentraut
<[hidden email]> wrote:
> I don't really remember, but that was basically the opinion I had
> arrived at as I was reading through this current thread.  Roughly
> speaking, anything that changes the database state (data or schema) in a
> way that would be reflected in a pg_dump output is not read-only.

This rule very nearly matches the current behavior: it explains why
temp table operations are allowed, and why ALTER SYSTEM is allowed,
and why REINDEX etc. are allowed. However, there's a notable
exception: PREPARE, COMMIT PREPARED, and ROLLBACK PREPARED are allowed
in a read-only transaction. Under the "doesn't change pg_dump output"
criteria, the first and third ones should be permitted but COMMIT
PREPARED should be denied, except maybe if the prepared transaction
didn't do any writes (and in that case, why did we bother preparing
it?). Despite that, this rule does a way better job explaining the
current behavior than anything else suggested so far.

Accordingly, here's v3, with comments adjusted to match this new
explanation for the current behavior. This seems way better than what
I had before, because it actually explains why stuff is the way it is
rather than just appealing to history.

BTW, there's a pending patch that allows CLUSTER to change the
tablespace of an object while rewriting it. If we want to be strict
about it, that variant would need to be disallowed in read only mode,
under this definition. (I also think that it's lousy syntax and ought
to be spelled ALTER TABLE x SET TABLESPACE foo, CLUSTER or something
like that rather than anything beginning with CLUSTER, but I seem to
be losing that argument.)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

v3-0001-Fix-problems-with-read-only-query-checks-and-refa.patch (32K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: our checks for read-only queries are not great

Tom Lane-2
In reply to this post by Peter Eisentraut-6
Peter Eisentraut <[hidden email]> writes:
> I don't really remember, but that was basically the opinion I had
> arrived at as I was reading through this current thread.  Roughly
> speaking, anything that changes the database state (data or schema) in a
> way that would be reflected in a pg_dump output is not read-only.

OK, although I'd put some emphasis on "roughly speaking".

> ALTER SYSTEM is read only in my mind.

I'm still having trouble with this conclusion.  I think it can only
be justified by a very narrow reading of "reflected in pg_dump"
that relies on the specific factorization we've chosen for upgrade
operations, ie that postgresql.conf mods have to be carried across
by hand.  But that's mostly historical baggage, rather than a sane
basis for defining "read only".  If somebody comes up with a patch
that causes "pg_dumpall -g" to include ALTER SYSTEM SET commands for
whatever is in postgresql.auto.conf (not an unreasonable idea BTW),
will you then decide that ALTER SYSTEM SET is no longer read-only?
Or, perhaps, reject such a patch on the grounds that it breaks this
arbitrary definition of read-only-ness?

As another example, do we need to consider that replacing pg_hba.conf
via pg_write_file should be allowed in a "read only" transaction?

These conclusions seem obviously silly to me, but perhaps YMMV.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: our checks for read-only queries are not great

Robert Haas
On Fri, Jan 10, 2020 at 9:30 AM Tom Lane <[hidden email]> wrote:
> If somebody comes up with a patch
> that causes "pg_dumpall -g" to include ALTER SYSTEM SET commands for
> whatever is in postgresql.auto.conf (not an unreasonable idea BTW),
> will you then decide that ALTER SYSTEM SET is no longer read-only?
> Or, perhaps, reject such a patch on the grounds that it breaks this
> arbitrary definition of read-only-ness?

I would vote to reject such a patch as a confused muddle. I mean,
generally, the expectation right now is that if you move your data
from the current cluster to a new one by pg_dump, pg_upgrade, or even
by promoting a standby, you're responsible for making sure that
postgresql.conf and postgresql.auto.conf get copied over separately.
In the last case, the backup that created the standby will have copied
the postgresql.conf from the master as it existed at that time, but
propagating any subsequent changes is up to you. Now, if we now decide
to shove ALTER SYSTEM SET commands into pg_dumpall output, then
suddenly you're changing that rule, and it's not very clear what the
new rule is.

Now, our current approach is fairly arguable. Given that GUCs on
databases, users, functions, etc. are stored in the catalogs and
subject to backup, restore, replication, etc., one might well expect
that global settings would be handled the same way. I tend to think
that would be nicer, though it would require solving the problem of
how to back out bad changes that make the database not start up.
Regardless of what you or anybody thinks about that, though, it's not
how it works today and would require some serious engineering if we
wanted to make it happen.

> As another example, do we need to consider that replacing pg_hba.conf
> via pg_write_file should be allowed in a "read only" transaction?

I don't really see what the problem with that is. It bothers me a lot
more that CLUSTER can be run in a read-only transaction -- which
actually changes stuff inside the database, even if not necessarily in
a user-visible way -- than it does that somebody might be able to use
the database to change something that isn't really part of the
database anyway. And pg_hba.conf, like postgresql.conf, is largely
treated as an input to the database rather than part of it.

Somebody could create a user-defined function that launches a
satellite into orbit and that is, I would argue, a write operation in
the truest sense. You have changed the state of the world in a lasting
way, and you cannot take it back. But, it's not writing to the
database, so as far as read-only transactions are concerned, who
cares?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: our checks for read-only queries are not great

Stephen Frost
Greetings,

* Robert Haas ([hidden email]) wrote:

> On Fri, Jan 10, 2020 at 9:30 AM Tom Lane <[hidden email]> wrote:
> > If somebody comes up with a patch
> > that causes "pg_dumpall -g" to include ALTER SYSTEM SET commands for
> > whatever is in postgresql.auto.conf (not an unreasonable idea BTW),
> > will you then decide that ALTER SYSTEM SET is no longer read-only?
> > Or, perhaps, reject such a patch on the grounds that it breaks this
> > arbitrary definition of read-only-ness?
>
> I would vote to reject such a patch as a confused muddle. I mean,
> generally, the expectation right now is that if you move your data
> from the current cluster to a new one by pg_dump, pg_upgrade, or even
> by promoting a standby, you're responsible for making sure that
> postgresql.conf and postgresql.auto.conf get copied over separately.
I really don't like that the ALTER SYSTEM SET/postgresql.auto.conf stuff
has to be handled in some way external to logical export/import, or
external to pg_upgrade (particularly in link mode), so I generally see
where Tom's coming from with that suggestion.

In general, I don't think people should be expected to hand-muck around
with anything in the data directory.

> In the last case, the backup that created the standby will have copied
> the postgresql.conf from the master as it existed at that time, but
> propagating any subsequent changes is up to you. Now, if we now decide
> to shove ALTER SYSTEM SET commands into pg_dumpall output, then
> suddenly you're changing that rule, and it's not very clear what the
> new rule is.

I'd like a rule of "users don't muck with the data directory", and we
are nearly there when you include sensible packaging such as what Debian
provides- by moving postrgesql.conf, log files, etc, outside of the data
directory.  For things that can't be moved out of the data directory
though, like postgresql.auto.conf, we should be handling those
transparently to the user.

I agree that there are some interesting cases to consider here though-
like doing a pg_dumpall against a standby resulting in something
different than if you did it against the primary because the
postgresql.auto.conf is different between them (something that I'm
generally supportive of having, and it seems everyone else is too).  I
think having an option to control if the postgresql.auto.conf settings
are included or not in the pg_dumpall output would be a reasonable way
to deal with that though.

> Now, our current approach is fairly arguable. Given that GUCs on
> databases, users, functions, etc. are stored in the catalogs and
> subject to backup, restore, replication, etc., one might well expect
> that global settings would be handled the same way. I tend to think
> that would be nicer, though it would require solving the problem of
> how to back out bad changes that make the database not start up.
> Regardless of what you or anybody thinks about that, though, it's not
> how it works today and would require some serious engineering if we
> wanted to make it happen.

This sounds an awful lot like the arguments that I tried to make when
ALTER SYSTEM SET was first going in, but what's done is done and there's
not much to do but make the best of it as I can't imagine there'd be
much support for ripping it out.

I don't really agree about the need for 'some serious engineering'
either, but having an option for it, sure.

I do also tend to agree with Tom about making ALTER SYSTEM SET be
prohibited in explicitly read-only transactions, but still allowing it
to be run against replicas as that's a handy thing to be able to do.

> > As another example, do we need to consider that replacing pg_hba.conf
> > via pg_write_file should be allowed in a "read only" transaction?
>
> I don't really see what the problem with that is. It bothers me a lot
> more that CLUSTER can be run in a read-only transaction -- which
> actually changes stuff inside the database, even if not necessarily in
> a user-visible way -- than it does that somebody might be able to use
> the database to change something that isn't really part of the
> database anyway. And pg_hba.conf, like postgresql.conf, is largely
> treated as an input to the database rather than part of it.
I don't like that CLUSTER can be run in a read-only transaction either
(though it seems like downthread maybe some people are fine with
that..).  I'm also coming around to the idea that pg_write_file()
probably shouldn't be allowed either, and probably not COPY TO either
(except to stdout, since that's a result, not a change operation).

> Somebody could create a user-defined function that launches a
> satellite into orbit and that is, I would argue, a write operation in
> the truest sense. You have changed the state of the world in a lasting
> way, and you cannot take it back. But, it's not writing to the
> database, so as far as read-only transactions are concerned, who
> cares?

I suppose there's another thing to think about in this discussion, which
are FDWs, if the idea is that read-only means "I don't want to make
changes in THIS database".  I don't really feel like that's what marking
a transaction as 'read only' is intended to mean though.

When I think of starting a read-only transaction, I feel like it's
usually with the idea of "I want to play it safe and I don't want this
transaction to make ANY changes".  I'm feeling more inclined that we
should be going out of our way to make darn sure that we respect that
request of the user, no matter what it is they're running.  We can't
prevent user-created C-level functions from launching satellites, but
that's an untrusted language and therefore it's up to the function
author to manage the transaction and privilege system properly anyway,
not ours.

Thanks,

Stephen

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: our checks for read-only queries are not great

Laurenz Albe
In reply to this post by Tom Lane-2
On Fri, 2020-01-10 at 09:29 -0500, Tom Lane wrote:

> > ALTER SYSTEM is read only in my mind.
>
> I'm still having trouble with this conclusion.  I think it can only
> be justified by a very narrow reading of "reflected in pg_dump"
> that relies on the specific factorization we've chosen for upgrade
> operations, ie that postgresql.conf mods have to be carried across
> by hand.  But that's mostly historical baggage, rather than a sane
> basis for defining "read only".  If somebody comes up with a patch
> that causes "pg_dumpall -g" to include ALTER SYSTEM SET commands for
> whatever is in postgresql.auto.conf (not an unreasonable idea BTW),
> will you then decide that ALTER SYSTEM SET is no longer read-only?

I think that having ALTER SYSTEM commands in pg_dumpall output
would be a problem.  It would cause all kinds of problems whenever
parameters change.  Thinking of the transition "checkpoint_segments"
-> "max_wal_size", you'd have to build some translation magic into pg_dump.
Besides, such a feature would make it harder to restore a dump taken
with version x into version x + n for n > 0.

> Or, perhaps, reject such a patch on the grounds that it breaks this
> arbitrary definition of read-only-ness?

I agree with Robert that such a patch should be rejected on other
grounds.

Concerning the topic of the thread, I personally have come to think
that changing GUCs is *not* writing to the database.  But that is based
on the fact that you can change GUCs on streaming replication standbys,
and it may be surprising to a newcomer.

Perhaps it would be good to consider this question:
Do we call something "read-only" if it changes nothing, or do we call it
"read-only" if it is allowed on a streaming replication standby?
The first would be more correct, but the second may be more convenient.

Yours,
Laurenz Albe



Reply | Threaded
Open this post in threaded view
|

Re: our checks for read-only queries are not great

Tom Lane-2
Laurenz Albe <[hidden email]> writes:
> Perhaps it would be good to consider this question:
> Do we call something "read-only" if it changes nothing, or do we call it
> "read-only" if it is allowed on a streaming replication standby?
> The first would be more correct, but the second may be more convenient.

Yeah, this is really the larger point at stake.  I'm not sure that
"read-only" and "allowed on standby" should be identical, nor even
that one should be an exact subset of the other.  They're certainly
by-and-large the same sets of operations, but there might be
exceptions that belong to only one set.  "read-only" is driven by
(some reading of) the SQL standard, while "allowed on standby" is
driven by implementation limitations, so I think it'd be dangerous
to commit ourselves to those being identical.

                        regards, tom lane


12