Transaction control in procedures

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
33 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Transaction control in procedures

Peter Eisentraut-6
Here is a patch that implements transaction control in PL/Python
procedures.  (This patch goes on top of "SQL procedures" patch v1.)

So you can do this:

CREATE PROCEDURE transaction_test1()
LANGUAGE plpythonu
AS $$
for i in range(0, 10):
    plpy.execute("INSERT INTO test1 (a) VALUES (%d)" % i)
    if i % 2 == 0:
         plpy.commit()
    else:
         plpy.rollback()
$$;

CALL transaction_test1();

I started with PL/Python because the internal structures there are more
manageable.  Obviously, people will want this for PL/pgSQL as well, and
I have that in the works.  It's not in a usable state, but I have found
that the work needed is essentially the same as in PL/Python for example.

I have discovered three groups of obstacles that needed addressing to
make this work.  At this point, the patch is more of a demo of what
these issues are, and if we come to satisfactory solutions for each of
them, things should fall into place more easily afterwards.

1) While calling CommitTransactionCommand() in the middle of a utility
command works just fine (several utility commands do this, of course),
calling AbortCurrentTransaction() in a similar way does not.  There are
a few pieces of code that think that a transaction abort will always
result in a return to the main control loop, and so they will just clean
away everything.  This is what the changes in portalmem.c are about.
Some comments there already hint about the issue.  No doubt this will
need further refinement.  I think it would be desirable in general to
separate the control flow concerns from the transaction management
concerns more cleanly.

2) SPI needs some work.  It thinks that it can clean everything away at
transaction end.  I have found that instead of TopTransactionContext one
can use PortalContext and get a more suitable life cycle for the memory.
 I have played with some variants to make this configurable (e.g.,
argument to SPI_connect()), but that didn't seem very useful.  There are
some comments indicating that there might not always be a PortalContext,
but the existing tests don't seem to mind.  (There was a thread recently
about making a fake PortalContext for autovacuum, so maybe the current
idea is that we make sure there always is a PortalContext.)  Maybe we
need another context like StatementContext or ProcedureContext.

There also needs to be a way via SPI to end transactions and allowing
*some* cleanup to happen but leaving the memory around.  I have done
that via additional SPI API functions like SPI_commit(), which are then
available to PL implementations.  I also tried making it possible
calling transaction statements directly via SPI_exec() or similar, but
that ended up a total disaster.  So from the API perspective, I like the
current implementation, but the details will no doubt need refinement.

3) The PL implementations themselves allocate memory in
transaction-bound contexts for convenience as well.  This is usually
easy to fix by switching to PortalContext as well.  As you see, the
PL/Python code part of the patch is actually very small.  Changes in
other PLs would be similar.

Two issues have not been addressed yet:

A) It would be desirable to be able to run commands such as VACUUM and
CREATE INDEX CONCURRENTLY in a procedure.  This needs a bit of thinking
and standards-lawyering about the semantics, like where exactly do
transactions begin and end in various combinations.  It will probably
also need a different entry point into SPI, because SPI_exec cannot
handle statements ending transactions.  But so far my assessment is that
this can be added in a mostly independent way later on.

B) There needs to be some kind of call stack for procedure and function
invocations, so that we can track when we are allowed to make
transaction controlling calls.  The key term in the SQL standard is
"non-atomic execution context", which seems specifically devised to
cover this scenario.  So for example, if you have CALL -> CALL -> CALL,
the third call can issue a transaction statement.  But if you have CALL
-> SELECT -> CALL, then the last call cannot, because the SELECT
introduces an atomic execution context.  I don't know if we have such a
thing yet or something that we could easily latch on to.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

v1-0001-Transaction-control-in-PL-Python-procedures.patch (19K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Transaction control in procedures

Simon Riggs
On 31 October 2017 at 15:38, Peter Eisentraut
<[hidden email]> wrote:
> Here is a patch that implements transaction control in PL/Python
> procedures.  (This patch goes on top of "SQL procedures" patch v1.)

The patch is incredibly short for such a feature, which is probably a
good indication that it is feasible.

Amazing!

> So you can do this:
>
> CREATE PROCEDURE transaction_test1()
> LANGUAGE plpythonu
> AS $$
> for i in range(0, 10):
>     plpy.execute("INSERT INTO test1 (a) VALUES (%d)" % i)
>     if i % 2 == 0:
>          plpy.commit()
>     else:
>          plpy.rollback()
> $$;
>
> CALL transaction_test1();
>
> I started with PL/Python because the internal structures there are more
> manageable.

AFAICS we execute 10 INSERTs and each one runs in a new top-level xid.
The INSERTs succeed in all cases but we then abort odd-numbered ones
and commit even numbered ones.

What would happen if some of the INSERTs failed? Where would control
go to? (Maybe this is just "no change" in this particular proc)

What happens if the procedure is updated during execution? Presumably
it keeps executing the original version as seen in the initial
snapshot?

Does the xmin of this session advance after each transaction, or do we
hold the snapshot used for the procedure body open, causing us to hold
back xmin and prevent vacuuming from being effective?

What would happen if a procedure recursively called itself? And yet it
was updated half-way through? Would that throw an error (I think it
should).

> Obviously, people will want this for PL/pgSQL as well, and
> I have that in the works.  It's not in a usable state, but I have found
> that the work needed is essentially the same as in PL/Python for example.
>
> I have discovered three groups of obstacles that needed addressing to
> make this work.  At this point, the patch is more of a demo of what
> these issues are, and if we come to satisfactory solutions for each of
> them, things should fall into place more easily afterwards.
>
> 1) While calling CommitTransactionCommand() in the middle of a utility
> command works just fine (several utility commands do this, of course),
> calling AbortCurrentTransaction() in a similar way does not.  There are
> a few pieces of code that think that a transaction abort will always
> result in a return to the main control loop, and so they will just clean
> away everything.  This is what the changes in portalmem.c are about.
> Some comments there already hint about the issue.  No doubt this will
> need further refinement.  I think it would be desirable in general to
> separate the control flow concerns from the transaction management
> concerns more cleanly.

+1

> 2) SPI needs some work.  It thinks that it can clean everything away at
> transaction end.  I have found that instead of TopTransactionContext one
> can use PortalContext and get a more suitable life cycle for the memory.
>  I have played with some variants to make this configurable (e.g.,
> argument to SPI_connect()), but that didn't seem very useful.  There are
> some comments indicating that there might not always be a PortalContext,
> but the existing tests don't seem to mind.  (There was a thread recently
> about making a fake PortalContext for autovacuum, so maybe the current
> idea is that we make sure there always is a PortalContext.)  Maybe we
> need another context like StatementContext or ProcedureContext.
>
> There also needs to be a way via SPI to end transactions and allowing
> *some* cleanup to happen but leaving the memory around.  I have done
> that via additional SPI API functions like SPI_commit(), which are then
> available to PL implementations.  I also tried making it possible
> calling transaction statements directly via SPI_exec() or similar, but
> that ended up a total disaster.  So from the API perspective, I like the
> current implementation, but the details will no doubt need refinement.
>
> 3) The PL implementations themselves allocate memory in
> transaction-bound contexts for convenience as well.  This is usually
> easy to fix by switching to PortalContext as well.  As you see, the
> PL/Python code part of the patch is actually very small.  Changes in
> other PLs would be similar.

Is there some kind of interlock to prevent dropping the portal half way thru?

Will the SPI transaction control functions fail if called from a
normal function?

> Two issues have not been addressed yet:
>
> A) It would be desirable to be able to run commands such as VACUUM and
> CREATE INDEX CONCURRENTLY in a procedure.  This needs a bit of thinking
> and standards-lawyering about the semantics, like where exactly do
> transactions begin and end in various combinations.  It will probably
> also need a different entry point into SPI, because SPI_exec cannot
> handle statements ending transactions.  But so far my assessment is that
> this can be added in a mostly independent way later on.

Sounds like a separate commit, but perhaps it influences the design?

> B) There needs to be some kind of call stack for procedure and function
> invocations, so that we can track when we are allowed to make
> transaction controlling calls.  The key term in the SQL standard is
> "non-atomic execution context", which seems specifically devised to
> cover this scenario.  So for example, if you have CALL -> CALL -> CALL,
> the third call can issue a transaction statement.  But if you have CALL
> -> SELECT -> CALL, then the last call cannot, because the SELECT
> introduces an atomic execution context.  I don't know if we have such a
> thing yet or something that we could easily latch on to.

Yeh. The internal_xact flag is only set at EoXact, so its not really
set as described in the .h
It would certainly be useful to have some state that allows sanity
checking on weird state transitions.

What we would benefit from is a README that gives the theory of
operation, so everyone can read and agree.

Presumably we would need to contact authors of main PL languages to
get them to comment on the API requirements for their languages.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Transaction control in procedures

Merlin Moncure-2
On Wed, Nov 8, 2017 at 5:48 PM, Simon Riggs <[hidden email]> wrote:
> On 31 October 2017 at 15:38, Peter Eisentraut
> <[hidden email]> wrote:
>> Here is a patch that implements transaction control in PL/Python
>> procedures.  (This patch goes on top of "SQL procedures" patch v1.)
>
> The patch is incredibly short for such a feature, which is probably a
> good indication that it is feasible.
>
> Amazing!

I have to agree with that.  I'm really excited about this...

Some questions:
*) Will it be possible to do operations like this in pl/pgsql?

BEGIN
  SELECT INTO r * FROM foo;

  START TRANSACTION;  -- perhaps we ought to have a special function
for this instead (BEGIN is reserved, etc).
  SET transaction_isololation TO serializable;
...

 *) Will there be any negative consequences to a procedure running
with an unbounded run time?  For example, something like:

LOOP
  SELECT check_for_stuff_to_do();

  IF stuff_to_do
  THEN
    do_stuff();
  ELSE
    PERFORM pg_sleep(1);
  END IF;
END LOOP;

*) Will pg_cancel_backend() cancel the currently executing statement
or the procedure? (I guess probably the procedure but I'm curious)

*) Will long running procedures be subject to statement timeout (and
does it apply to the entire procedure)?  Will they be able to control
statement_timeout from within the procedure itself?

*) Will pg_stat_activity show the invoking CALL or the currently
executing statement?  I see a strong argument for showing both of
these things. although I understand that's out of scope here.

If these questions (especially the first two) come down the correct
way, then it will mean that I can stop coding in other languages
(primarily bash) for a fairly large number of cases that I really
think belong in the database itself.  This would really simplify
coding, some things in bash are really awkward to get right such as a
mutex to guarantee single script invocation.  My only real dependency
on the operation system environment at that point would be cron to
step in to the backround daemon process (which would immediately set
an advisory lock).

I'm somewhat surprised that SPI is the point of attack for this
functionality, but if it works that's really the best case scenario
(the only downside I can see is that the various out of core pl/s have
to implement the interface individually).

merlin

Reply | Threaded
Open this post in threaded view
|

Re: Transaction control in procedures

Peter Eisentraut-6
On 11/14/17 09:27, Merlin Moncure wrote:
> *) Will it be possible to do operations like this in pl/pgsql?
>
> BEGIN
>   SELECT INTO r * FROM foo;
>
>   START TRANSACTION;  -- perhaps we ought to have a special function
> for this instead (BEGIN is reserved, etc).
>   SET transaction_isololation TO serializable;
> ...

Eventually, I don't see why not.  Currently, it's not complete.

One detail in your example is that when you enter the procedure, you are
already in a transaction, so you would have to run either COMMIT or
ROLLBACK before the START TRANSACTION.

Also, you can't run SET TRANSACTION ISOLATION through SPI, so one would
have to implement a separate code path for that, but that would just be
a bit of leg work.

>  *) Will there be any negative consequences to a procedure running
> with an unbounded run time?  For example, something like:
>
> LOOP
>   SELECT check_for_stuff_to_do();
>
>   IF stuff_to_do
>   THEN
>     do_stuff();
>   ELSE
>     PERFORM pg_sleep(1);
>   END IF;
> END LOOP;

That procedure doesn't do anything with transactions, so it's just like
a long-running function.  Otherwise, it'd just be like long-running
client code managing transactions.

> *) Will pg_cancel_backend() cancel the currently executing statement
> or the procedure? (I guess probably the procedure but I'm curious)

Same as the way it currently works.  It will raise an exception, which
will travel up the stack and eventually issue an error or be caught.  If
someone knows more specific concerns here I could look into it, but I
don't see any problem.

> *) Will long running procedures be subject to statement timeout (and
> does it apply to the entire procedure)?

See previous item.

> Will they be able to control
> statement_timeout from within the procedure itself?

The statement timeout alarm is set by the top-level execution loop, so
you can't change a statement timeout that is already in progress.  But
you could change the GUC and commit it for the next top-level statement.

> *) Will pg_stat_activity show the invoking CALL or the currently
> executing statement?  I see a strong argument for showing both of
> these things. although I understand that's out of scope here.

Not different from a function execution, i.e., top-level statement.

> If these questions (especially the first two) come down the correct
> way, then it will mean that I can stop coding in other languages
> (primarily bash) for a fairly large number of cases that I really
> think belong in the database itself.  This would really simplify
> coding, some things in bash are really awkward to get right such as a
> mutex to guarantee single script invocation.  My only real dependency
> on the operation system environment at that point would be cron to
> step in to the backround daemon process (which would immediately set
> an advisory lock).

Well, some kind of scheduler around procedures would be pretty cool at
some point.

> I'm somewhat surprised that SPI is the point of attack for this
> functionality, but if it works that's really the best case scenario
> (the only downside I can see is that the various out of core pl/s have
> to implement the interface individually).

So I tried different things here, and I'll list them here to explain how
I got there.

Option zero is to not use SPI at all and implement a whole new internal
command execution system.  But that would obviously be a large amount of
work, and it would look 85% like SPI, and as it turns out it's not
necessary.

The first thing I tried out what to run transaction control statements
through SPI.  That turned out to be very complicated and confusing and
fragile, mainly because of the dichotomy between the internal
subtransaction management that the PLs do and the explicit transaction
control statements on the other hand.  It was just a giant unworkable mess.

The next thing I tried was to shut down (SPI_finish) SPI before a
transaction boundary command and restart it (SPI_connect) it afterwards.
 That would work in principle, but it would require a fair amount of
work in each PL, because they implicitly rely on SPI (or perhaps are
tangled up with SPI) for memory management.

The setup I finally arrived at was to implement the transaction boundary
commands as SPI API calls and let them internally make sure that only
the appropriate stuff is cleared away at transaction boundaries.  This
turned out to be the easiest and cleanest.  I have since the last patch
implemented the transaction control capabilities in PL/pgSQL, PL/Perl,
and PL/Tcl, and it was entirely trivial once the details were worked out
as I had shown in PL/Python.  I will post an updated patch with this soon.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: Transaction control in procedures

Merlin Moncure-2
On Tue, Nov 14, 2017 at 12:09 PM, Peter Eisentraut
<[hidden email]> wrote:

> On 11/14/17 09:27, Merlin Moncure wrote:
>> *) Will it be possible to do operations like this in pl/pgsql?
>>
>> BEGIN
>>   SELECT INTO r * FROM foo;
>>
>>   START TRANSACTION;  -- perhaps we ought to have a special function
>> for this instead (BEGIN is reserved, etc).
>>   SET transaction_isololation TO serializable;
>> ...
>
> Eventually, I don't see why not.  Currently, it's not complete.
>
> One detail in your example is that when you enter the procedure, you are
> already in a transaction, so you would have to run either COMMIT or
> ROLLBACK before the START TRANSACTION.

Ok, that's good, but it seems a little wonky to me to have to issue
COMMIT first.  Shouldn't that be the default?  Meaning you would not
be *in* a transaction unless you specified to be in one.

> Also, you can't run SET TRANSACTION ISOLATION through SPI, so one would
> have to implement a separate code path for that, but that would just be
> a bit of leg work.

Roger -- I'm more interested in if your design generally supports this
being able to this (either now or in the future...).  I'm hammering on
this point for basically two reasons:

1) Trying to understand if the MVCC snapshot creation can be
meaningfully controlled (I think so, but I'll verify)).
2) This is an important case for databases that want to run in a mode
(typically serializeable) but lower the isolation for specific cases;
for example to loop on a special flag being set in a table.  It's
annoying to only be able to specify this on the client side; I tend to
like to abstract arcane database considerations into the database
whenever possible.

>>  *) Will there be any negative consequences to a procedure running
>> with an unbounded run time?  For example, something like:
>>
>> LOOP
>>   SELECT check_for_stuff_to_do();
>>
>>   IF stuff_to_do
>>   THEN
>>     do_stuff();
>>   ELSE
>>     PERFORM pg_sleep(1);
>>   END IF;
>> END LOOP;
>
> That procedure doesn't do anything with transactions, so it's just like
> a long-running function.  Otherwise, it'd just be like long-running
> client code managing transactions.

Can we zero in on this?  The question implied, 'can you do this
without being in a transaction'?  PERFORM do_stuff() is a implicit
transaction, so it ought to end when the function returns right?
Meaning, assuming I was not already in a transaction when hitting this
block, I would not be subject to an endless transaction duration?

>>> *) Will pg_cancel_backend() cancel the currently executing statement
>> or the procedure? (I guess probably the procedure but I'm curious)
>
> Same as the way it currently works.  It will raise an exception, which
> will travel up the stack and eventually issue an error or be caught.  If
> someone knows more specific concerns here I could look into it, but I
> don't see any problem.

Yeah, that works.

>> I'm somewhat surprised that SPI is the point of attack for this
>> functionality, but if it works that's really the best case scenario
>> (the only downside I can see is that the various out of core pl/s have
>> to implement the interface individually).
>
> So I tried different things here, and I'll list them here to explain how
> I got there.
>
> Option zero is to not use SPI at all and implement a whole new internal
> command execution system.  But that would obviously be a large amount of
> work, and it would look 85% like SPI, and as it turns out it's not
> necessary.
>
> The first thing I tried out what to run transaction control statements
> through SPI.  That turned out to be very complicated and confusing and
> fragile, mainly because of the dichotomy between the internal
> subtransaction management that the PLs do and the explicit transaction
> control statements on the other hand.  It was just a giant unworkable mess.
>
> The next thing I tried was to shut down (SPI_finish) SPI before a
> transaction boundary command and restart it (SPI_connect) it afterwards.
>  That would work in principle, but it would require a fair amount of
> work in each PL, because they implicitly rely on SPI (or perhaps are
> tangled up with SPI) for memory management.
>
> The setup I finally arrived at was to implement the transaction boundary
> commands as SPI API calls and let them internally make sure that only
> the appropriate stuff is cleared away at transaction boundaries.  This
> turned out to be the easiest and cleanest.  I have since the last patch
> implemented the transaction control capabilities in PL/pgSQL, PL/Perl,
> and PL/Tcl, and it was entirely trivial once the details were worked out
> as I had shown in PL/Python.  I will post an updated patch with this soon.

well, you've convinced me.  now that you've got pl/pgsql implemented
I'll fire it up and see if I can make qualitative assessments...

merlin

Reply | Threaded
Open this post in threaded view
|

Re: Transaction control in procedures

legrand legrand
will that kind of statement (that is permitted with Oracle but gives errors
ora-1555 snapshot too old) be permitted ?

begin
 for c in (select id from tab where cond='blabla')
 loop
    update tab set x=1 where id=c.id;
    commit;
 end loop;
end;






--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

Reply | Threaded
Open this post in threaded view
|

Re: Transaction control in procedures

Peter Eisentraut-6
In reply to this post by Merlin Moncure-2
On 11/14/17 16:33, Merlin Moncure wrote:
>> One detail in your example is that when you enter the procedure, you are
>> already in a transaction, so you would have to run either COMMIT or
>> ROLLBACK before the START TRANSACTION.
>
> Ok, that's good, but it seems a little wonky to me to have to issue
> COMMIT first.  Shouldn't that be the default?  Meaning you would not
> be *in* a transaction unless you specified to be in one.

But that's not how this feature is defined in the SQL standard and AFAIK
other implementations.  When you enter the procedure call, you are in a
transaction.  For one thing, a procedure does not *have* to do
transaction control.  So if it's a plain old procedure like a function
that just runs a few statements, there needs to be a transaction.  We
can't know ahead of time whether the procedure will execute a
transaction control statement and then retroactively change when the
transaction should have started.  Something like an autonomous
transaction procedure might work more like that, but not this feature.

>> Also, you can't run SET TRANSACTION ISOLATION through SPI, so one would
>> have to implement a separate code path for that, but that would just be
>> a bit of leg work.
>
> Roger -- I'm more interested in if your design generally supports this
> being able to this (either now or in the future...).

Nothing in this patch really changes anything about how transactions
themselves work.  So I don't see why any of this shouldn't work.  As I
fill in the gaps in the code, I'll make sure to come back around to
this, but for the time being I can't say anything more.

> Can we zero in on this?  The question implied, 'can you do this
> without being in a transaction'?  PERFORM do_stuff() is a implicit
> transaction, so it ought to end when the function returns right?
> Meaning, assuming I was not already in a transaction when hitting this
> block, I would not be subject to an endless transaction duration?

In the server, you are always in a transaction, so that's not how this
works.  I think this also ties into my first response above.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: Transaction control in procedures

Peter Eisentraut-6
In reply to this post by Peter Eisentraut-6
On 10/31/17 15:38, Peter Eisentraut wrote:
> Here is a patch that implements transaction control in PL/Python
> procedures.  (This patch goes on top of "SQL procedures" patch v1.)

Here is an updated patch, now on top of "SQL procedures" v2.

Relative to the previous patch v1 I added transaction control to
PL/pgSQL, PL/Perl, and PL/Tcl with relative ease.  (There is a weird
crash in one PL/Perl test that is currently commented out.  And I can't
get a proper backtrace.  Maybe something to do with recursive Perl
interpreters?)

I crash-coursed myself in PL/Perl and PL/Tcl (and Tcl).  If anyone has
more of a feel for those languages and wants to comment on the proposed
interfaces and internals, please chime in.

I also added tracking so that transaction control commands can only be
made in the proper context, currently meaning only top-level procedure
calls, not functions or other procedure calls.  This should be extended
to also allow nested CALLs without anything in between, but I need more
time to code that.

I'll spend a bit more time on tidying up a few things, and a bunch of
documentation is missing, but I currently don't see any more major
issues here.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

v2-0001-Transaction-control-in-PL-procedures.patch (67K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Transaction control in procedures

Merlin Moncure-2
In reply to this post by Peter Eisentraut-6
On Tue, Nov 14, 2017 at 5:27 PM, Peter Eisentraut
<[hidden email]> wrote:

> On 11/14/17 16:33, Merlin Moncure wrote:
>>> One detail in your example is that when you enter the procedure, you are
>>> already in a transaction, so you would have to run either COMMIT or
>>> ROLLBACK before the START TRANSACTION.
>>
>> Ok, that's good, but it seems a little wonky to me to have to issue
>> COMMIT first.  Shouldn't that be the default?  Meaning you would not
>> be *in* a transaction unless you specified to be in one.
>
> But that's not how this feature is defined in the SQL standard and AFAIK
> other implementations.  When you enter the procedure call, you are in a
> transaction.  For one thing, a procedure does not *have* to do
> transaction control.  So if it's a plain old procedure like a function
> that just runs a few statements, there needs to be a transaction.

Hm, OK.   Well, SQL Server (which is pretty far from the SQL standard)
works that way.  See here:
http://www.4guysfromrolla.com/webtech/080305-1.shtml.  DB2, which is
very close to the SQL standard, only supports COMMIT/ROLLBACK (not
begin/start etc)
https://www.ibm.com/support/knowledgecenter/en/SSULQD_7.2.0/com.ibm.nz.sproc.doc/c_sproc_transaction_commits_and_rollbacks.html.
Either approach is ok I guess, and always being in a transaction
probably has some advantages. performance being an obvious one.  With
DB2, the COMMIT statement acts as kind of a flush, or a paired
'commit;begin;'.

>> Can we zero in on this?  The question implied, 'can you do this
>> without being in a transaction'?  PERFORM do_stuff() is a implicit
>> transaction, so it ought to end when the function returns right?
>> Meaning, assuming I was not already in a transaction when hitting this
>> block, I would not be subject to an endless transaction duration?
>
> In the server, you are always in a transaction, so that's not how this
> works.  I think this also ties into my first response above.

I'll try this out myself, but as long as we can have a *bounded*
transaction lifetime (basically the time to do stuff + 1 second) via
something like:
LOOP
  <do stuff>
  COMMIT;
  PERFORM pg_sleep(1);
END LOOP;

... I'm good. I'll try your patch out ASAP.  Thanks for answering all
my questions.

merlin

Reply | Threaded
Open this post in threaded view
|

Re: Transaction control in procedures

Pavel Stehule


2017-11-15 14:38 GMT+01:00 Merlin Moncure <[hidden email]>:
On Tue, Nov 14, 2017 at 5:27 PM, Peter Eisentraut
<[hidden email]> wrote:
> On 11/14/17 16:33, Merlin Moncure wrote:
>>> One detail in your example is that when you enter the procedure, you are
>>> already in a transaction, so you would have to run either COMMIT or
>>> ROLLBACK before the START TRANSACTION.
>>
>> Ok, that's good, but it seems a little wonky to me to have to issue
>> COMMIT first.  Shouldn't that be the default?  Meaning you would not
>> be *in* a transaction unless you specified to be in one.
>
> But that's not how this feature is defined in the SQL standard and AFAIK
> other implementations.  When you enter the procedure call, you are in a
> transaction.  For one thing, a procedure does not *have* to do
> transaction control.  So if it's a plain old procedure like a function
> that just runs a few statements, there needs to be a transaction.

Hm, OK.   Well, SQL Server (which is pretty far from the SQL standard)
works that way.  See here:
http://www.4guysfromrolla.com/webtech/080305-1.shtml.  DB2, which is
very close to the SQL standard, only supports COMMIT/ROLLBACK (not
begin/start etc)
https://www.ibm.com/support/knowledgecenter/en/SSULQD_7.2.0/com.ibm.nz.sproc.doc/c_sproc_transaction_commits_and_rollbacks.html.
Either approach is ok I guess, and always being in a transaction
probably has some advantages. performance being an obvious one.  With
DB2, the COMMIT statement acts as kind of a flush, or a paired
'commit;begin;'.

same in Oracle PL/SQL


>> Can we zero in on this?  The question implied, 'can you do this
>> without being in a transaction'?  PERFORM do_stuff() is a implicit
>> transaction, so it ought to end when the function returns right?
>> Meaning, assuming I was not already in a transaction when hitting this
>> block, I would not be subject to an endless transaction duration?
>
> In the server, you are always in a transaction, so that's not how this
> works.  I think this also ties into my first response above.

I'll try this out myself, but as long as we can have a *bounded*
transaction lifetime (basically the time to do stuff + 1 second) via
something like:
LOOP
  <do stuff>
  COMMIT;
  PERFORM pg_sleep(1);
END LOOP;

... I'm good. I'll try your patch out ASAP.  Thanks for answering all
my questions.

merlin


Reply | Threaded
Open this post in threaded view
|

Re: Transaction control in procedures

Merlin Moncure-2
In reply to this post by Merlin Moncure-2
On Wed, Nov 15, 2017 at 7:38 AM, Merlin Moncure <[hidden email]> wrote:

> On Tue, Nov 14, 2017 at 5:27 PM, Peter Eisentraut
>>> Can we zero in on this?  The question implied, 'can you do this
>>> without being in a transaction'?  PERFORM do_stuff() is a implicit
>>> transaction, so it ought to end when the function returns right?
>>> Meaning, assuming I was not already in a transaction when hitting this
>>> block, I would not be subject to an endless transaction duration?
>>
>> In the server, you are always in a transaction, so that's not how this
>> works.  I think this also ties into my first response above.
>
> I'll try this out myself, but as long as we can have a *bounded*
> transaction lifetime (basically the time to do stuff + 1 second) via
> something like:
> LOOP
>   <do stuff>
>   COMMIT;
>   PERFORM pg_sleep(1);
> END LOOP;
>
> ... I'm good. I'll try your patch out ASAP.  Thanks for answering all
> my questions.


Trying this out (v2 both patches,  compiled clean, thank you!),
postgres=# CREATE OR REPLACE PROCEDURE foo() AS
$$
BEGIN
  LOOP
    PERFORM 1;
    COMMIT;
    RAISE NOTICE '%', now();
    PERFORM pg_sleep(1);
  END LOOP;
END;
$$ LANGUAGE PLPGSQL;
CREATE PROCEDURE
Time: 0.996 ms
postgres=# call foo();
NOTICE:  2017-11-15 08:52:08.936025-06
NOTICE:  2017-11-15 08:52:08.936025-06

... I noticed that:
*) now() did not advance with commit and,
*) xact_start via pg_stat_activity did not advance

Shouldn't both of those advance with the in-loop COMMIT?

merlin

Reply | Threaded
Open this post in threaded view
|

Re: Transaction control in procedures

Peter Eisentraut-6
In reply to this post by legrand legrand
On 11/14/17 17:40, legrand legrand wrote:

> will that kind of statement (that is permitted with Oracle but gives errors
> ora-1555 snapshot too old) be permitted ?
>
> begin
>  for c in (select id from tab where cond='blabla')
>  loop
>     update tab set x=1 where id=c.id;
>     commit;
>  end loop;
> end;

Hmm, that currently results in

ERROR:  cannot commit while a portal is pinned

You say this fails in Oracle too.  Are we supposed to make this work
somehow?

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: Transaction control in procedures

Peter Eisentraut-6
In reply to this post by Simon Riggs
On 11/8/17 18:48, Simon Riggs wrote:
> What would happen if some of the INSERTs failed? Where would control
> go to? (Maybe this is just "no change" in this particular proc)

An exception is raised and unless the exception is caught (depending on
the PL), control leaves the procedure.  What is already committed stays.

> What happens if the procedure is updated during execution? Presumably
> it keeps executing the original version as seen in the initial
> snapshot?

correct

> Does the xmin of this session advance after each transaction, or do we
> hold the snapshot used for the procedure body open, causing us to hold
> back xmin and prevent vacuuming from being effective?
>
> What would happen if a procedure recursively called itself? And yet it
> was updated half-way through? Would that throw an error (I think it
> should).

I don't think anything special happens here.  The snapshot that is used
to read the procedure source code and other meta information is released
at a transaction boundary.

>> 3) The PL implementations themselves allocate memory in
>> transaction-bound contexts for convenience as well.  This is usually
>> easy to fix by switching to PortalContext as well.  As you see, the
>> PL/Python code part of the patch is actually very small.  Changes in
>> other PLs would be similar.
>
> Is there some kind of interlock to prevent dropping the portal half way thru?

I should probably look this up, but I don't think this is fundamentally
different from how VACUUM and CREATE INDEX CONCURRENTLY run inside a
portal and issue multiple transactions in sequence.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: Transaction control in procedures

Peter Eisentraut-6
In reply to this post by Merlin Moncure-2
On 11/15/17 09:54, Merlin Moncure wrote:
> ... I noticed that:
> *) now() did not advance with commit and,
> *) xact_start via pg_stat_activity did not advance
>
> Shouldn't both of those advance with the in-loop COMMIT?

I think you are correct.  I'll include that in the next patch version.
It shouldn't be difficult.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

RE: Transaction control in procedures

legrand legrand
In reply to this post by Peter Eisentraut-6

We are just opening the  "close cursors on/at commit" specification ;o)

- MS SQL server: cursor_close_on_commit
- Firebird: close_cursors_at_commit
- DB2: "with hold" syntax
- ...

I think it a plus to support keeping opened cursors at commit time,
but impacts have to be checked in details ...

Oracle Ora-1555 error comes in the extreme situation where rows used inside the cursor are modified, commited, before to be fetched.


Reply | Threaded
Open this post in threaded view
|

Re: Transaction control in procedures

Merlin Moncure-2
In reply to this post by Peter Eisentraut-6
On Wed, Nov 15, 2017 at 3:42 PM, Peter Eisentraut
<[hidden email]> wrote:
> On 11/15/17 09:54, Merlin Moncure wrote:
>> ... I noticed that:
>> *) now() did not advance with commit and,
>> *) xact_start via pg_stat_activity did not advance
>>
>> Shouldn't both of those advance with the in-loop COMMIT?
>
> I think you are correct.  I'll include that in the next patch version.
> It shouldn't be difficult.

Thanks.  A couple of more things.

*) This error message is incorrect now:
postgres=# CREATE OR REPLACE PROCEDURE foo() AS
$$
BEGIN
  LOOP
    SAVEPOINT x;
  END LOOP;
END;
$$ LANGUAGE PLPGSQL;
CREATE PROCEDURE
Time: 0.912 ms
postgres=# call foo();
ERROR:  cannot begin/end transactions in PL/pgSQL
HINT:  Use a BEGIN block with an EXCEPTION clause instead.
CONTEXT:  PL/pgSQL function foo() line 5 at SQL statement

I guess there are a few places that assume pl/pgsql is always run from
a in-transaction function.

*) Exception handlers seem to override COMMITs.  The the following
procedure will not insert any rows.  I wonder if this is the correct
behavior.  I think there's a pretty good case to be made to raise an
error if a COMMIT is issued if you're in an exception block.

CREATE OR REPLACE PROCEDURE foo() AS
$$
BEGIN
  LOOP
    INSERT INTO foo DEFAULT VALUES;
    COMMIT;
    RAISE EXCEPTION 'test';
  END LOOP;
EXCEPTION
  WHEN OTHERS THEN RAISE NOTICE '%', SQLERRM;
END;
$$ LANGUAGE PLPGSQL;

*) The documentation could use some work.  Would you like some help?

merlin

Reply | Threaded
Open this post in threaded view
|

Re: Transaction control in procedures

Peter Eisentraut-6
In reply to this post by legrand legrand
On 11/16/17 07:04, legrand legrand wrote:
> We are just opening the  "close cursors on/at commit" specification ;o)
>
> - MS SQL server: cursor_close_on_commit
> - Firebird: close_cursors_at_commit
> - DB2: "with hold" syntax
> - ...
>
> I think it a plus to support keeping opened cursors at commit time,
> but impacts have to be checked in details ...

I think the facilities to support this in PostgreSQL are already there.
We'd just have to tweak PL/pgSQL to make some of its internal portals
"held" and then clean them up manually at some later point.  So I think
this is a localized detail, not a fundamental problem.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: Transaction control in procedures

Merlin Moncure-2
On Thu, Nov 16, 2017 at 12:36 PM, Peter Eisentraut
<[hidden email]> wrote:

> On 11/16/17 07:04, legrand legrand wrote:
>> We are just opening the  "close cursors on/at commit" specification ;o)
>>
>> - MS SQL server: cursor_close_on_commit
>> - Firebird: close_cursors_at_commit
>> - DB2: "with hold" syntax
>> - ...
>>
>> I think it a plus to support keeping opened cursors at commit time,
>> but impacts have to be checked in details ...
>
> I think the facilities to support this in PostgreSQL are already there.
> We'd just have to tweak PL/pgSQL to make some of its internal portals
> "held" and then clean them up manually at some later point.  So I think
> this is a localized detail, not a fundamental problem.

Automatically persisting cursors (WITH HOLD) can have some very
surprising performance considerations, except when the current code
execution depends on that particular cursor, in which case the current
behavior of raising a (hopefully better worded-) error seems
appropriate.  Cursors based on temporary tables could be exempt from
having to be closed or checked on COMMIT.

plpgsql does not have the facility to create held cursors
FWICT...automatic promotion seems pretty dubious.  It could certainly
be added, and cursors so held could be exempt from being force
closed/errored as well. In lieu of that, having users materialize data
in to temp tables for such cases seems reasonable.

merlin

Reply | Threaded
Open this post in threaded view
|

Re: Transaction control in procedures

Simon Riggs
In reply to this post by Peter Eisentraut-6
On 15 November 2017 at 16:36, Peter Eisentraut
<[hidden email]> wrote:

> On 11/8/17 18:48, Simon Riggs wrote:
>> What would happen if some of the INSERTs failed? Where would control
>> go to? (Maybe this is just "no change" in this particular proc)
>
> An exception is raised and unless the exception is caught (depending on
> the PL), control leaves the procedure.  What is already committed stays.
>
>> What happens if the procedure is updated during execution? Presumably
>> it keeps executing the original version as seen in the initial
>> snapshot?
>
> correct
>
>> Does the xmin of this session advance after each transaction, or do we
>> hold the snapshot used for the procedure body open, causing us to hold
>> back xmin and prevent vacuuming from being effective?
>>
>> What would happen if a procedure recursively called itself? And yet it
>> was updated half-way through? Would that throw an error (I think it
>> should).
>
> I don't think anything special happens here.  The snapshot that is used
> to read the procedure source code and other meta information is released
> at a transaction boundary.

I think we need to document that, or at least note in README

It's quite important for VACUUM.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: Transaction control in procedures

Simon Riggs
In reply to this post by Peter Eisentraut-6
On 14 November 2017 at 13:09, Peter Eisentraut
<[hidden email]> wrote:

>> *) Will pg_cancel_backend() cancel the currently executing statement
>> or the procedure? (I guess probably the procedure but I'm curious)
>
> Same as the way it currently works.  It will raise an exception, which
> will travel up the stack and eventually issue an error or be caught.  If
> someone knows more specific concerns here I could look into it, but I
> don't see any problem.
>
>> *) Will long running procedures be subject to statement timeout (and
>> does it apply to the entire procedure)?
>
> See previous item.
>
>> Will they be able to control
>> statement_timeout from within the procedure itself?
>
> The statement timeout alarm is set by the top-level execution loop, so
> you can't change a statement timeout that is already in progress.  But
> you could change the GUC and commit it for the next top-level statement.
>
>> *) Will pg_stat_activity show the invoking CALL or the currently
>> executing statement?  I see a strong argument for showing both of
>> these things. although I understand that's out of scope here.
>
> Not different from a function execution, i.e., top-level statement.

Which is the "top-level statement"? The CALL or the currently
executing statement within the proc? I think you mean former.

For the first two answers above the answer was "currently executing
statement", yet the third answer seems to be the procedure. So that is
a slight discrepancy.

ISTM we would like

1) a way to cancel execution of a procedure
2) a way to set a timeout to cancel execution of a procedure

as well as

1) a way to cancel execution of a statement that is running within a procedure
2) a way to set a timeout to cancel execution of a statement in a procedure

Visibility of what a routine is currently executing is the role of a
debugger utility/API.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

12
Previous Thread Next Thread