Transactions involving multiple postgres foreign servers, take 2

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
112 messages Options
1 ... 3456
Reply | Threaded
Open this post in threaded view
|

Re: Transactions involving multiple postgres foreign servers, take 2

Masahiko Sawada-2
On Tue, 8 Sep 2020 at 13:00, [hidden email]
<[hidden email]> wrote:

>
> From: Amit Kapila <[hidden email]>
> > I intend to say that the global-visibility work can impact this in a
> > major way and we have analyzed that to some extent during a discussion
> > on the other thread. So, I think without having a complete
> > design/solution that addresses both the 2PC and global-visibility, it
> > is not apparent what is the right way to proceed. It seems to me that
> > rather than working on individual (or smaller) parts one needs to come
> > up with a bigger picture (or overall design) and then once we have
> > figured that out correctly, it would be easier to decide which parts
> > can go first.
>
> I'm really sorry I've been getting late and late and latex10 to publish the revised scale-out design wiki to discuss the big picture!  I don't know why I'm taking this long time; I feel I were captive in a time prison (yes, nobody is holding me captive; I'm just late.)  Please wait a few days.
>
> But to proceed with the development, let me comment on the atomic commit and global visibility.
>
> * We have to hear from Andrey about their check on the possibility that Clock-SI could be Microsoft's patent and if we can avoid it.
>
> * I have a feeling that we can adopt the algorithm used by Spanner, CockroachDB, and YugabyteDB.  That is, 2PC for multi-node atomic commit, Paxos or Raft for replica synchronization (in the process of commit) to make 2PC more highly available, and the timestamp-based global visibility.  However, the timestamp-based approach makes the database instance shut down when the node's clock is distant from the other nodes.
>
> * Or, maybe we can use the following Commitment ordering that doesn't require the timestamp or any other information to be transferred among the cluster nodes.  However, this seems to have to track the order of read and write operations among concurrent transactions to ensure the correct commit order, so I'm not sure about the performance.  The MVCO paper seems to present the information we need, but I haven't understood it well yet (it's difficult.)  Could you anybody kindly interpret this?
>
> Commitment ordering (CO) - yoavraz2
> https://sites.google.com/site/yoavraz2/the_principle_of_co
>
>
> As for the Sawada-san's 2PC patch, which I find interesting purely as FDW enhancement, I raised the following issues to be addressed:
>
> 1. Make FDW API implementable by other FDWs than postgres_fdw (this is what Amit-san kindly pointed out.)  I think oracle_fdw and jdbc_fdw would be good examples to consider, while MySQL may not be good because it exposes the XA feature as SQL statements, not C functions as defined in the XA specification.

I agree that we need to verify new FDW APIs will be suitable for other
FDWs than postgres_fdw as well.

>
> 2. 2PC processing is queued and serialized in one background worker.  That severely subdues transaction throughput.  Each backend should perform 2PC.

Not sure it's safe that each backend perform PREPARE and COMMIT
PREPARED since the current design is for not leading an inconsistency
between the actual transaction result and the result the user sees.
But in the future, I think we can have multiple background workers per
database for better performance.

>
> 3. postgres_fdw cannot detect remote updates when the UDF executed on a remote node updates data.

I assume that you mean the pushing the UDF down to a foreign server.
If so, I think we can do this by improving postgres_fdw. In the
current patch, registering and unregistering a foreign server to a
group of 2PC and marking a foreign server as updated is FDW
responsible. So perhaps if we had a way to tell postgres_fdw that the
UDF might update the data on the foreign server, postgres_fdw could
mark the foreign server as updated if the UDF is shippable.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Transactions involving multiple postgres foreign servers, take 2

Fujii Masao-4


On 2020/09/11 0:37, Masahiko Sawada wrote:

> On Tue, 8 Sep 2020 at 13:00, [hidden email]
> <[hidden email]> wrote:
>>
>> From: Amit Kapila <[hidden email]>
>>> I intend to say that the global-visibility work can impact this in a
>>> major way and we have analyzed that to some extent during a discussion
>>> on the other thread. So, I think without having a complete
>>> design/solution that addresses both the 2PC and global-visibility, it
>>> is not apparent what is the right way to proceed. It seems to me that
>>> rather than working on individual (or smaller) parts one needs to come
>>> up with a bigger picture (or overall design) and then once we have
>>> figured that out correctly, it would be easier to decide which parts
>>> can go first.
>>
>> I'm really sorry I've been getting late and late and latex10 to publish the revised scale-out design wiki to discuss the big picture!  I don't know why I'm taking this long time; I feel I were captive in a time prison (yes, nobody is holding me captive; I'm just late.)  Please wait a few days.
>>
>> But to proceed with the development, let me comment on the atomic commit and global visibility.
>>
>> * We have to hear from Andrey about their check on the possibility that Clock-SI could be Microsoft's patent and if we can avoid it.
>>
>> * I have a feeling that we can adopt the algorithm used by Spanner, CockroachDB, and YugabyteDB.  That is, 2PC for multi-node atomic commit, Paxos or Raft for replica synchronization (in the process of commit) to make 2PC more highly available, and the timestamp-based global visibility.  However, the timestamp-based approach makes the database instance shut down when the node's clock is distant from the other nodes.
>>
>> * Or, maybe we can use the following Commitment ordering that doesn't require the timestamp or any other information to be transferred among the cluster nodes.  However, this seems to have to track the order of read and write operations among concurrent transactions to ensure the correct commit order, so I'm not sure about the performance.  The MVCO paper seems to present the information we need, but I haven't understood it well yet (it's difficult.)  Could you anybody kindly interpret this?
>>
>> Commitment ordering (CO) - yoavraz2
>> https://sites.google.com/site/yoavraz2/the_principle_of_co
>>
>>
>> As for the Sawada-san's 2PC patch, which I find interesting purely as FDW enhancement, I raised the following issues to be addressed:
>>
>> 1. Make FDW API implementable by other FDWs than postgres_fdw (this is what Amit-san kindly pointed out.)  I think oracle_fdw and jdbc_fdw would be good examples to consider, while MySQL may not be good because it exposes the XA feature as SQL statements, not C functions as defined in the XA specification.
>
> I agree that we need to verify new FDW APIs will be suitable for other
> FDWs than postgres_fdw as well.
>
>>
>> 2. 2PC processing is queued and serialized in one background worker.  That severely subdues transaction throughput.  Each backend should perform 2PC.
>
> Not sure it's safe that each backend perform PREPARE and COMMIT
> PREPARED since the current design is for not leading an inconsistency
> between the actual transaction result and the result the user sees.

Can I check my understanding about why the resolver process is necessary?

Firstly, you think that issuing COMMIT PREPARED command to the foreign server can cause an error, for example, because of connection error, OOM, etc. On the other hand, only waiting for other process to issue the command is less likely to cause an error. Right?

If an error occurs in backend process after commit record is WAL-logged, the error would be reported to the client and it may misunderstand that the transaction failed even though commit record was already flushed. So you think that each backend should not issue COMMIT PREPARED command to avoid that inconsistency. To avoid that, it's better to make other process, the resolver, issue the command and just make each backend wait for that to completed. Right?

Also using the resolver process has another merit; when there are unresolved foreign transactions but the corresponding backend exits, the resolver can try to resolve them. If something like this automatic resolution is necessary, the process like the resolver would be necessary. Right?

To the contrary, if we don't need such automatic resolution (i.e., unresolved foreign transactions always need to be resolved manually) and we can prevent the code to issue COMMIT PREPARED command from causing an error (not sure if that's possible, though...), probably we don't need the resolver process. Right?


> But in the future, I think we can have multiple background workers per
> database for better performance.

Yes, that's an idea.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION


Reply | Threaded
Open this post in threaded view
|

RE: Transactions involving multiple postgres foreign servers, take 2

tsunakawa.takay@fujitsu.com
In reply to this post by Fujii Masao-4
        From: Fujii Masao <[hidden email]>
> Originally start(), commit() and rollback() are supported as FDW interfaces.
> As far as I and Sawada-san discussed this upthread, to support MySQL,
> another type of start() would be necessary to issue "XA START id" command.
> end() might be also necessary to issue "XA END id", but that command can be
> issued via prepare() together with "XA PREPARE id".

Yeah, I think we can call xa_end and xa_prepare in the FDW's prepare function.

The issue is when to call xa_start, which requires XID as an argument.  We don't want to call it in transactions that access only one node...?


> With his patch, prepare() is supported. What other interfaces need to be
> supported per XA/JTA?
>
> I'm not familiar with XA/JTA and XA transaction interfaces on other major
> DBMS. So I'd like to know what other interfaces are necessary additionally?

I think xa_start, xa_end, xa_prepare, xa_commit, xa_rollback, and xa_recover are sufficient.  The XA specification is here:

https://pubs.opengroup.org/onlinepubs/009680699/toc.pdf

You can see the function reference in Chapter 5, and the concept in Chapter 3.  Chapter 6 was probably showing the state transition (function call sequence.)


> IMO Sawada-san's version of 2PC is less performant, but it's because his
> patch provides more functionality. For example, with his patch, WAL is written
> to automatically complete the unresolve foreign transactions in the case of
> failure. OTOH, Alexey patch introduces no new WAL for 2PC.
> Of course, generating more WAL would cause more overhead.
> But if we need automatic resolution feature, it's inevitable to introduce new
> WAL whichever the patch we choose.

Please do not get me wrong.  I know Sawada-san is trying to ensure durability.  I just wanted to know what each patch does in how much cost in terms of disk and network I/Os, and if one patch can take something from another for less cost.  I'm simply guessing (without having read the code yet) that each transaction basically does:

- two round trips (prepare, commit) to each remote node
- two WAL writes (prepare, commit) on the local node and each remote node
- one write for two-phase state file on each remote node
- one write to record participants on the local node

It felt hard to think about the algorithm efficiency from the source code.  As you may have seen, the DBMS textbook and/or papers describe disk and network I/Os to evaluate algorithms.  I thought such information would be useful before going deeper into the source code.  Maybe such things can be written in the following Sawada-san's wiki or README in the end.

Atomic Commit of Distributed Transactions
https://wiki.postgresql.org/wiki/Atomic_Commit_of_Distributed_Transactions


Regards
Takayuki Tsunakawa




Reply | Threaded
Open this post in threaded view
|

RE: Transactions involving multiple postgres foreign servers, take 2

tsunakawa.takay@fujitsu.com
In reply to this post by Masahiko Sawada-2
From: Masahiko Sawada <[hidden email]>
> On Tue, 8 Sep 2020 at 13:00, [hidden email]
> <[hidden email]> wrote:
> > 2. 2PC processing is queued and serialized in one background worker.  That
> severely subdues transaction throughput.  Each backend should perform
> 2PC.
>
> Not sure it's safe that each backend perform PREPARE and COMMIT
> PREPARED since the current design is for not leading an inconsistency
> between the actual transaction result and the result the user sees.

As Fujii-san is asking, I also would like to know what situation you think is not safe.  Are you worried that the FDW's commit function might call ereport(ERROR | FATAL | PANIC)?  If so, can't we stipulate that the FDW implementor should ensure that the commit function always returns control to the caller?


> But in the future, I think we can have multiple background workers per
> database for better performance.

Does the database in "per database" mean the local database (that applications connect to), or the remote database accessed via FDW?

I'm wondering how the FDW and background worker(s) can realize parallel prepare and parallel commit.  That is, the coordinator transaction performs:

1. Issue prepare to all participant nodes, but doesn't wait for the reply for each issue.
2. Waits for replies from all participants.
3. Issue commit to all participant nodes, but doesn't wait for the reply for each issue.
4. Waits for replies from all participants.

If we just consider PostgreSQL and don't think about FDW, we can use libpq async functions -- PQsendQuery, PQconsumeInput, and PQgetResult.  pgbench uses them so that one thread can issue SQL statements on multiple connections in parallel.

But when we consider the FDW interface, plus other DBMSs, how can we achieve the parallelism?


> > 3. postgres_fdw cannot detect remote updates when the UDF executed on a
> remote node updates data.
>
> I assume that you mean the pushing the UDF down to a foreign server.
> If so, I think we can do this by improving postgres_fdw. In the current patch,
> registering and unregistering a foreign server to a group of 2PC and marking a
> foreign server as updated is FDW responsible. So perhaps if we had a way to
> tell postgres_fdw that the UDF might update the data on the foreign server,
> postgres_fdw could mark the foreign server as updated if the UDF is shippable.

Maybe we can consider VOLATILE functions update data.  That may be overreaction, though.

Another idea is to add a new value to the ReadyForQuery message in the FE/BE protocol.  Say, 'U' if in a transaction block that updated data.  Here we consider "updated" as having allocated an XID.

52.7. Message Formats
https://www.postgresql.org/docs/devel/protocol-message-formats.html
--------------------------------------------------
ReadyForQuery (B)

Byte1
Current backend transaction status indicator. Possible values are 'I' if idle (not in a transaction block); 'T' if in a transaction block; or 'E' if in a failed transaction block (queries will be rejected until block is ended).
--------------------------------------------------


Regards
Takayuki Tsunakawa


Reply | Threaded
Open this post in threaded view
|

Re: Transactions involving multiple postgres foreign servers, take 2

Masahiko Sawada-2
In reply to this post by Fujii Masao-4
On Fri, 11 Sep 2020 at 11:58, Fujii Masao <[hidden email]> wrote:

>
>
>
> On 2020/09/11 0:37, Masahiko Sawada wrote:
> > On Tue, 8 Sep 2020 at 13:00, [hidden email]
> > <[hidden email]> wrote:
> >>
> >> From: Amit Kapila <[hidden email]>
> >>> I intend to say that the global-visibility work can impact this in a
> >>> major way and we have analyzed that to some extent during a discussion
> >>> on the other thread. So, I think without having a complete
> >>> design/solution that addresses both the 2PC and global-visibility, it
> >>> is not apparent what is the right way to proceed. It seems to me that
> >>> rather than working on individual (or smaller) parts one needs to come
> >>> up with a bigger picture (or overall design) and then once we have
> >>> figured that out correctly, it would be easier to decide which parts
> >>> can go first.
> >>
> >> I'm really sorry I've been getting late and late and latex10 to publish the revised scale-out design wiki to discuss the big picture!  I don't know why I'm taking this long time; I feel I were captive in a time prison (yes, nobody is holding me captive; I'm just late.)  Please wait a few days.
> >>
> >> But to proceed with the development, let me comment on the atomic commit and global visibility.
> >>
> >> * We have to hear from Andrey about their check on the possibility that Clock-SI could be Microsoft's patent and if we can avoid it.
> >>
> >> * I have a feeling that we can adopt the algorithm used by Spanner, CockroachDB, and YugabyteDB.  That is, 2PC for multi-node atomic commit, Paxos or Raft for replica synchronization (in the process of commit) to make 2PC more highly available, and the timestamp-based global visibility.  However, the timestamp-based approach makes the database instance shut down when the node's clock is distant from the other nodes.
> >>
> >> * Or, maybe we can use the following Commitment ordering that doesn't require the timestamp or any other information to be transferred among the cluster nodes.  However, this seems to have to track the order of read and write operations among concurrent transactions to ensure the correct commit order, so I'm not sure about the performance.  The MVCO paper seems to present the information we need, but I haven't understood it well yet (it's difficult.)  Could you anybody kindly interpret this?
> >>
> >> Commitment ordering (CO) - yoavraz2
> >> https://sites.google.com/site/yoavraz2/the_principle_of_co
> >>
> >>
> >> As for the Sawada-san's 2PC patch, which I find interesting purely as FDW enhancement, I raised the following issues to be addressed:
> >>
> >> 1. Make FDW API implementable by other FDWs than postgres_fdw (this is what Amit-san kindly pointed out.)  I think oracle_fdw and jdbc_fdw would be good examples to consider, while MySQL may not be good because it exposes the XA feature as SQL statements, not C functions as defined in the XA specification.
> >
> > I agree that we need to verify new FDW APIs will be suitable for other
> > FDWs than postgres_fdw as well.
> >
> >>
> >> 2. 2PC processing is queued and serialized in one background worker.  That severely subdues transaction throughput.  Each backend should perform 2PC.
> >
> > Not sure it's safe that each backend perform PREPARE and COMMIT
> > PREPARED since the current design is for not leading an inconsistency
> > between the actual transaction result and the result the user sees.
>
> Can I check my understanding about why the resolver process is necessary?
>
> Firstly, you think that issuing COMMIT PREPARED command to the foreign server can cause an error, for example, because of connection error, OOM, etc. On the other hand, only waiting for other process to issue the command is less likely to cause an error. Right?
>
> If an error occurs in backend process after commit record is WAL-logged, the error would be reported to the client and it may misunderstand that the transaction failed even though commit record was already flushed. So you think that each backend should not issue COMMIT PREPARED command to avoid that inconsistency. To avoid that, it's better to make other process, the resolver, issue the command and just make each backend wait for that to completed. Right?
>
> Also using the resolver process has another merit; when there are unresolved foreign transactions but the corresponding backend exits, the resolver can try to resolve them. If something like this automatic resolution is necessary, the process like the resolver would be necessary. Right?
>
> To the contrary, if we don't need such automatic resolution (i.e., unresolved foreign transactions always need to be resolved manually) and we can prevent the code to issue COMMIT PREPARED command from causing an error (not sure if that's possible, though...), probably we don't need the resolver process. Right?

Yes, I'm on the same page about all the above explanations.

The resolver process has two functionalities: resolving foreign
transactions automatically when the user issues COMMIT (the case you
described in the second paragraph), and resolving foreign transaction
when the corresponding backend no longer exist or when the server
crashes during in the middle of 2PC (described in the third
paragraph).

Considering the design without the resolver process, I think we can
easily replace the latter with the manual resolution. OTOH, it's not
easy for the former. I have no idea about better design for now,
although, as you described, if we could ensure that the process
doesn't raise an error during resolving foreign transactions after
committing the local transaction we would not need the resolver
process.

Or the second idea would be that the backend commits only the local
transaction then returns the acknowledgment of COMMIT to the user
without resolving foreign transactions. Then the user manually
resolves the foreign transactions by, for example, using the SQL
function pg_resolve_foreign_xact() within a separate transaction. That
way, even if an error occurred during resolving foreign transactions
(i.g., executing COMMIT PREPARED), it’s okay as the user is already
aware of the local transaction having been committed and can retry to
resolve the unresolved foreign transaction. So we won't need the
resolver process while avoiding such inconsistency.

But a drawback would be that the transaction commit doesn't ensure
that all foreign transactions are completed. The subsequent
transactions would need to check if the previous distributed
transaction is completed to see its results. I’m not sure it’s a good
design in terms of usability.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Transactions involving multiple postgres foreign servers, take 2

Masahiko Sawada-2
In reply to this post by tsunakawa.takay@fujitsu.com
On Fri, 11 Sep 2020 at 18:24, [hidden email]
<[hidden email]> wrote:

>
> From: Masahiko Sawada <[hidden email]>
> > On Tue, 8 Sep 2020 at 13:00, [hidden email]
> > <[hidden email]> wrote:
> > > 2. 2PC processing is queued and serialized in one background worker.  That
> > severely subdues transaction throughput.  Each backend should perform
> > 2PC.
> >
> > Not sure it's safe that each backend perform PREPARE and COMMIT
> > PREPARED since the current design is for not leading an inconsistency
> > between the actual transaction result and the result the user sees.
>
> As Fujii-san is asking, I also would like to know what situation you think is not safe.  Are you worried that the FDW's commit function might call ereport(ERROR | FATAL | PANIC)?

Yes.

> If so, can't we stipulate that the FDW implementor should ensure that the commit function always returns control to the caller?

How can the FDW implementor ensure that? Since even palloc could call
ereport(ERROR) I guess it's hard to require that to all FDW
implementors.

>
>
> > But in the future, I think we can have multiple background workers per
> > database for better performance.
>
> Does the database in "per database" mean the local database (that applications connect to), or the remote database accessed via FDW?

I meant the local database. In the current patch, we launch the
resolver process per local database. My idea is to allow launching
multiple resolver processes for one local database as long as the
number of workers doesn't exceed the limit.

>
> I'm wondering how the FDW and background worker(s) can realize parallel prepare and parallel commit.  That is, the coordinator transaction performs:
>
> 1. Issue prepare to all participant nodes, but doesn't wait for the reply for each issue.
> 2. Waits for replies from all participants.
> 3. Issue commit to all participant nodes, but doesn't wait for the reply for each issue.
> 4. Waits for replies from all participants.
>
> If we just consider PostgreSQL and don't think about FDW, we can use libpq async functions -- PQsendQuery, PQconsumeInput, and PQgetResult.  pgbench uses them so that one thread can issue SQL statements on multiple connections in parallel.
>
> But when we consider the FDW interface, plus other DBMSs, how can we achieve the parallelism?

It's still a rough idea but I think we can use TMASYNC flag and
xa_complete explained in the XA specification. The core transaction
manager call prepare, commit, rollback APIs with the flag, requiring
to execute the operation asynchronously and to return a handler (e.g.,
a socket taken by PQsocket in postgres_fdw case) to the transaction
manager. Then the transaction manager continues polling the handler
until it becomes readable and testing the completion using by
xa_complete() with no wait, until all foreign servers return OK on
xa_complete check.

>
>
> > > 3. postgres_fdw cannot detect remote updates when the UDF executed on a
> > remote node updates data.
> >
> > I assume that you mean the pushing the UDF down to a foreign server.
> > If so, I think we can do this by improving postgres_fdw. In the current patch,
> > registering and unregistering a foreign server to a group of 2PC and marking a
> > foreign server as updated is FDW responsible. So perhaps if we had a way to
> > tell postgres_fdw that the UDF might update the data on the foreign server,
> > postgres_fdw could mark the foreign server as updated if the UDF is shippable.
>
> Maybe we can consider VOLATILE functions update data.  That may be overreaction, though.

Sorry I don't understand that. The volatile functions are not pushed
down to the foreign servers in the first place, no?

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Transactions involving multiple postgres foreign servers, take 2

Ashutosh Bapat-2
In reply to this post by Masahiko Sawada-2
On Fri, Sep 11, 2020 at 4:37 PM Masahiko Sawada
<[hidden email]> wrote:
>
> Considering the design without the resolver process, I think we can
> easily replace the latter with the manual resolution. OTOH, it's not
> easy for the former. I have no idea about better design for now,
> although, as you described, if we could ensure that the process
> doesn't raise an error during resolving foreign transactions after
> committing the local transaction we would not need the resolver
> process.

My initial patch used the same backend to resolve foreign
transactions. But in that case even though the user receives COMMIT
completed, the backend isn't accepting the next query till it is busy
resolving the foreign server. That might be a usability issue again if
attempting to resolve all foreign transactions takes noticeable time.
If we go this route, we should try to resolve as many foreign
transactions as possible ignoring any errors while doing so and
somehow let user know which transactions couldn't be resolved. User
can then take responsibility for resolving those.

>
> Or the second idea would be that the backend commits only the local
> transaction then returns the acknowledgment of COMMIT to the user
> without resolving foreign transactions. Then the user manually
> resolves the foreign transactions by, for example, using the SQL
> function pg_resolve_foreign_xact() within a separate transaction. That
> way, even if an error occurred during resolving foreign transactions
> (i.g., executing COMMIT PREPARED), it’s okay as the user is already
> aware of the local transaction having been committed and can retry to
> resolve the unresolved foreign transaction. So we won't need the
> resolver process while avoiding such inconsistency.
>
> But a drawback would be that the transaction commit doesn't ensure
> that all foreign transactions are completed. The subsequent
> transactions would need to check if the previous distributed
> transaction is completed to see its results. I’m not sure it’s a good
> design in terms of usability.

I agree, this won't be acceptable.

In either case, I think a solution where the local server takes
responsibility to resolve foreign transactions will be better even in
the first cut.

--
Best Wishes,
Ashutosh Bapat


Reply | Threaded
Open this post in threaded view
|

RE: Transactions involving multiple postgres foreign servers, take 2

tsunakawa.takay@fujitsu.com
In reply to this post by Masahiko Sawada-2
From: Masahiko Sawada <[hidden email]>
> > If so, can't we stipulate that the FDW implementor should ensure that the
> commit function always returns control to the caller?
>
> How can the FDW implementor ensure that? Since even palloc could call
> ereport(ERROR) I guess it's hard to require that to all FDW
> implementors.

I think the what FDW commit routine will do is to just call xa_commit(), or PQexec("COMMIT PREPARED") in postgres_fdw.


> It's still a rough idea but I think we can use TMASYNC flag and
> xa_complete explained in the XA specification. The core transaction
> manager call prepare, commit, rollback APIs with the flag, requiring
> to execute the operation asynchronously and to return a handler (e.g.,
> a socket taken by PQsocket in postgres_fdw case) to the transaction
> manager. Then the transaction manager continues polling the handler
> until it becomes readable and testing the completion using by
> xa_complete() with no wait, until all foreign servers return OK on
> xa_complete check.

Unfortunately, even Oracle and Db2 don't support XA asynchronous execution for years.  Our DBMS Symfoware doesn't, either.  I don't expect other DBMSs support it.

Hmm, I'm afraid this may be one of the FDW's intractable walls for a serious scale-out DBMS.  If we define asynchronous FDW routines for 2PC, postgres_fdw would be able to implement them by using libpq asynchronous functions.  But other DBMSs can't ...


> > Maybe we can consider VOLATILE functions update data.  That may be
> overreaction, though.
>
> Sorry I don't understand that. The volatile functions are not pushed
> down to the foreign servers in the first place, no?

Ah, you're right.  Then, the choices are twofold: (1) trust users in that their functions don't update data or the user's claim (specification) about it, and (2) get notification through FE/BE protocol that the remote transaction may have updated data.


Regards
Takayuki Tsunakawa

Reply | Threaded
Open this post in threaded view
|

RE: Transactions involving multiple postgres foreign servers, take 2

tsunakawa.takay@fujitsu.com
In reply to this post by Masahiko Sawada-2
From: Masahiko Sawada <[hidden email]>

> The resolver process has two functionalities: resolving foreign
> transactions automatically when the user issues COMMIT (the case you
> described in the second paragraph), and resolving foreign transaction
> when the corresponding backend no longer exist or when the server
> crashes during in the middle of 2PC (described in the third
> paragraph).
>
> Considering the design without the resolver process, I think we can
> easily replace the latter with the manual resolution. OTOH, it's not
> easy for the former. I have no idea about better design for now,
> although, as you described, if we could ensure that the process
> doesn't raise an error during resolving foreign transactions after
> committing the local transaction we would not need the resolver
> process.

Yeah, the resolver background process -- someone independent of client sessions -- is necessary, because the client session disappears sometime.  When the server that hosts the 2PC coordinator crashes, there are no client sessions.  Our DBMS Symfoware also runs background threads that take care of resolution of in-doubt transactions due to a server or network failure.

Then, how does the resolver get involved in 2PC to enable parallel 2PC?  Two ideas quickly come to mind:

(1) Each client backend issues prepare and commit to multiple remote nodes asynchronously.
If the communication fails during commit, the client backend leaves the commit notification task to the resolver.
That is, the resolver lends a hand during failure recovery, and doesn't interfere with the transaction processing during normal operation.

(2) The resolver takes some responsibility in 2PC processing during normal operation.
(send prepare and/or commit to remote nodes and get the results.)
To avoid serial execution per transaction, the resolver bundles multiple requests, send them in bulk, and wait for multiple replies at once.
This allows the coordinator to do its own prepare processing in parallel with those of participants.
However, in Postgres, this requires context switches between the client backend and the resolver.


Our Symfoware takes (2).  However, it doesn't suffer from the context switch, because the server is multi-threaded and further implements or uses more lightweight entities than the thread.


> Or the second idea would be that the backend commits only the local
> transaction then returns the acknowledgment of COMMIT to the user
> without resolving foreign transactions. Then the user manually
> resolves the foreign transactions by, for example, using the SQL
> function pg_resolve_foreign_xact() within a separate transaction. That
> way, even if an error occurred during resolving foreign transactions
> (i.g., executing COMMIT PREPARED), it’s okay as the user is already
> aware of the local transaction having been committed and can retry to
> resolve the unresolved foreign transaction. So we won't need the
> resolver process while avoiding such inconsistency.
>
> But a drawback would be that the transaction commit doesn't ensure
> that all foreign transactions are completed. The subsequent
> transactions would need to check if the previous distributed
> transaction is completed to see its results. I’m not sure it’s a good
> design in terms of usability.

I don't think it's a good design as you are worried.  I guess that's why Postgres-XL had to create a tool called pgxc_clean and ask the user to resolve transactions with it.

pgxc_clean
https://www.postgres-xl.org/documentation/pgxcclean.html

"pgxc_clean is a Postgres-XL utility to maintain transaction status after a crash. When a Postgres-XL node crashes and recovers or fails over, the commit status of the node may be inconsistent with other nodes. pgxc_clean checks transaction commit status and corrects them."


Regards
Takayuki Tsunakawa

Reply | Threaded
Open this post in threaded view
|

Re: Transactions involving multiple postgres foreign servers, take 2

Michael Paquier-2
In reply to this post by Masahiko Sawada-2
On Fri, Aug 21, 2020 at 03:25:29PM +0900, Masahiko Sawada wrote:
> Thank you for letting me know. I've attached the latest version patch set.

A rebase is needed again as the CF bot is complaining.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Transactions involving multiple postgres foreign servers, take 2

Masahiko Sawada-2
On Thu, 17 Sep 2020 at 14:25, Michael Paquier <[hidden email]> wrote:
>
> On Fri, Aug 21, 2020 at 03:25:29PM +0900, Masahiko Sawada wrote:
> > Thank you for letting me know. I've attached the latest version patch set.
>
> A rebase is needed again as the CF bot is complaining.

Thank you for letting me know. I'm updating the patch and splitting
into small pieces as Fujii-san suggested. I'll submit the latest patch
set early next week.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Transactions involving multiple postgres foreign servers, take 2

Masahiko Sawada-2
In reply to this post by tsunakawa.takay@fujitsu.com
On Wed, 16 Sep 2020 at 13:20, [hidden email]
<[hidden email]> wrote:

>
> From: Masahiko Sawada <[hidden email]>
> > > If so, can't we stipulate that the FDW implementor should ensure that the
> > commit function always returns control to the caller?
> >
> > How can the FDW implementor ensure that? Since even palloc could call
> > ereport(ERROR) I guess it's hard to require that to all FDW
> > implementors.
>
> I think the what FDW commit routine will do is to just call xa_commit(), or PQexec("COMMIT PREPARED") in postgres_fdw.

Yes, but it still seems hard to me that we require for all FDW
implementations to commit/rollback prepared transactions without the
possibility of ERROR.

>
>
> > It's still a rough idea but I think we can use TMASYNC flag and
> > xa_complete explained in the XA specification. The core transaction
> > manager call prepare, commit, rollback APIs with the flag, requiring
> > to execute the operation asynchronously and to return a handler (e.g.,
> > a socket taken by PQsocket in postgres_fdw case) to the transaction
> > manager. Then the transaction manager continues polling the handler
> > until it becomes readable and testing the completion using by
> > xa_complete() with no wait, until all foreign servers return OK on
> > xa_complete check.
>
> Unfortunately, even Oracle and Db2 don't support XA asynchronous execution for years.  Our DBMS Symfoware doesn't, either.  I don't expect other DBMSs support it.
>
> Hmm, I'm afraid this may be one of the FDW's intractable walls for a serious scale-out DBMS.  If we define asynchronous FDW routines for 2PC, postgres_fdw would be able to implement them by using libpq asynchronous functions.  But other DBMSs can't ...

I think it's not necessarily that all FDW implementations need to be
able to support xa_complete(). We can support both synchronous and
asynchronous executions of prepare/commit/rollback.

>
>
> > > Maybe we can consider VOLATILE functions update data.  That may be
> > overreaction, though.
> >
> > Sorry I don't understand that. The volatile functions are not pushed
> > down to the foreign servers in the first place, no?
>
> Ah, you're right.  Then, the choices are twofold: (1) trust users in that their functions don't update data or the user's claim (specification) about it, and (2) get notification through FE/BE protocol that the remote transaction may have updated data.
>

I'm confused about the point you're concerned about the UDF function.
If you're concerned that executing a UDF function by like 'SELECT
myfunc();' updates data on a foreign server, since the UDF should know
which foreign server it modifies data on it should be able to register
the foreign server and mark as modified. Or you’re concerned that a
UDF function in WHERE condition is pushed down and updates data (e.g.,
 ‘SELECT … FROM foreign_tbl WHERE id = myfunc()’)?

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


1 ... 3456