Transactions involving multiple postgres foreign servers, take 2

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
112 messages Options
123456
Reply | Threaded
Open this post in threaded view
|

Re: Transactions involving multiple postgres foreign servers, take 2

Masahiko Sawada-2
On Tue, 14 Jul 2020 at 09:08, Masahiro Ikeda <[hidden email]> wrote:

>
> > I've attached the latest version patches. I've incorporated the review
> > comments I got so far and improved locking strategy.
>
> Thanks for updating the patch!
> I have three questions about the v23 patches.
>
>
> 1. messages related to user canceling
>
> In my understanding, there are two messages
> which can be output when a user cancels the COMMIT command.
>
> A. When prepare is failed, the output shows that
>     committed locally but some error is occurred.
>
> ```
> postgres=*# COMMIT;
> ^CCancel request sent
> WARNING:  canceling wait for resolving foreign transaction due to user
> request
> DETAIL:  The transaction has already committed locally, but might not
> have been committed on the foreign server.
> ERROR:  server closed the connection unexpectedly
>          This probably means the server terminated abnormally
>          before or while processing the request.
> CONTEXT:  remote SQL command: PREPARE TRANSACTION
> 'fx_1020791818_519_16399_10'
> ```
>
> B. When prepare is succeeded,
>     the output show that committed locally.
>
> ```
> postgres=*# COMMIT;
> ^CCancel request sent
> WARNING:  canceling wait for resolving foreign transaction due to user
> request
> DETAIL:  The transaction has already committed locally, but might not
> have been committed on the foreign server.
> COMMIT
> ```
>
> In case of A, I think that "committed locally" message can confuse user.
> Because although messages show committed but the transaction is
> "ABORTED".
>
> I think "committed" message means that "ABORT" is committed locally.
> But is there a possibility of misunderstanding?

No, you're right. I'll fix it in the next version patch.

I think synchronous replication also has the same problem. It says
"the transaction has already committed" but it's not true when
executing ROLLBACK PREPARED.

BTW how did you test the case (A)? It says canceling wait for foreign
transaction resolution but the remote SQL command is PREPARE
TRANSACTION.

>
> In case of A, it's better to change message for user friendly, isn't it?
>
>
> 2. typo
>
> Is trasnactions in fdwxact.c typo?
>

Fixed.

>
> 3. FdwXactGetWaiter in fdwxact.c return unused value
>
> FdwXactGetWaiter is called in FXRslvLoop function.
> It returns *waitXid_p, but FXRslvloop doesn't seem to
> use *waitXid_p. Do we need to return it?

Removed.

I've incorporated the above your comments in the local branch. I'll
post the latest version patch after incorporating other comments soon.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Transactions involving multiple postgres foreign servers, take 2

Fujii Masao-4


On 2020/07/15 15:06, Masahiko Sawada wrote:

> On Tue, 14 Jul 2020 at 09:08, Masahiro Ikeda <[hidden email]> wrote:
>>
>>> I've attached the latest version patches. I've incorporated the review
>>> comments I got so far and improved locking strategy.
>>
>> Thanks for updating the patch!
>> I have three questions about the v23 patches.
>>
>>
>> 1. messages related to user canceling
>>
>> In my understanding, there are two messages
>> which can be output when a user cancels the COMMIT command.
>>
>> A. When prepare is failed, the output shows that
>>      committed locally but some error is occurred.
>>
>> ```
>> postgres=*# COMMIT;
>> ^CCancel request sent
>> WARNING:  canceling wait for resolving foreign transaction due to user
>> request
>> DETAIL:  The transaction has already committed locally, but might not
>> have been committed on the foreign server.
>> ERROR:  server closed the connection unexpectedly
>>           This probably means the server terminated abnormally
>>           before or while processing the request.
>> CONTEXT:  remote SQL command: PREPARE TRANSACTION
>> 'fx_1020791818_519_16399_10'
>> ```
>>
>> B. When prepare is succeeded,
>>      the output show that committed locally.
>>
>> ```
>> postgres=*# COMMIT;
>> ^CCancel request sent
>> WARNING:  canceling wait for resolving foreign transaction due to user
>> request
>> DETAIL:  The transaction has already committed locally, but might not
>> have been committed on the foreign server.
>> COMMIT
>> ```
>>
>> In case of A, I think that "committed locally" message can confuse user.
>> Because although messages show committed but the transaction is
>> "ABORTED".
>>
>> I think "committed" message means that "ABORT" is committed locally.
>> But is there a possibility of misunderstanding?
>
> No, you're right. I'll fix it in the next version patch.
>
> I think synchronous replication also has the same problem. It says
> "the transaction has already committed" but it's not true when
> executing ROLLBACK PREPARED.

Yes. Also the same message is logged when executing PREPARE TRANSACTION.
Maybe it should be changed to "the transaction has already prepared".

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION


Reply | Threaded
Open this post in threaded view
|

Re: Transactions involving multiple postgres foreign servers, take 2

Masahiro Ikeda
In reply to this post by Masahiko Sawada-2
On 2020-07-15 15:06, Masahiko Sawada wrote:

> On Tue, 14 Jul 2020 at 09:08, Masahiro Ikeda <[hidden email]>
> wrote:
>>
>> > I've attached the latest version patches. I've incorporated the review
>> > comments I got so far and improved locking strategy.
>>
>> Thanks for updating the patch!
>> I have three questions about the v23 patches.
>>
>>
>> 1. messages related to user canceling
>>
>> In my understanding, there are two messages
>> which can be output when a user cancels the COMMIT command.
>>
>> A. When prepare is failed, the output shows that
>>     committed locally but some error is occurred.
>>
>> ```
>> postgres=*# COMMIT;
>> ^CCancel request sent
>> WARNING:  canceling wait for resolving foreign transaction due to user
>> request
>> DETAIL:  The transaction has already committed locally, but might not
>> have been committed on the foreign server.
>> ERROR:  server closed the connection unexpectedly
>>          This probably means the server terminated abnormally
>>          before or while processing the request.
>> CONTEXT:  remote SQL command: PREPARE TRANSACTION
>> 'fx_1020791818_519_16399_10'
>> ```
>>
>> B. When prepare is succeeded,
>>     the output show that committed locally.
>>
>> ```
>> postgres=*# COMMIT;
>> ^CCancel request sent
>> WARNING:  canceling wait for resolving foreign transaction due to user
>> request
>> DETAIL:  The transaction has already committed locally, but might not
>> have been committed on the foreign server.
>> COMMIT
>> ```
>>
>> In case of A, I think that "committed locally" message can confuse
>> user.
>> Because although messages show committed but the transaction is
>> "ABORTED".
>>
>> I think "committed" message means that "ABORT" is committed locally.
>> But is there a possibility of misunderstanding?
>
> No, you're right. I'll fix it in the next version patch.
>
> I think synchronous replication also has the same problem. It says
> "the transaction has already committed" but it's not true when
> executing ROLLBACK PREPARED.

Thanks for replying and sharing the synchronous replication problem.

> BTW how did you test the case (A)? It says canceling wait for foreign
> transaction resolution but the remote SQL command is PREPARE
> TRANSACTION.

I think the timing of failure is important for 2PC test.
Since I don't have any good solution to simulate those flexibly,
I use the GDB debugger.

The message of the case (A) is sent
after performing the following operations.

1. Attach the debugger to a backend process.
2. Set a breakpoint to PreCommit_FdwXact() in CommitTransaction().
    // Before PREPARE.
3. Execute "BEGIN" and insert data into two remote foreign tables.
4. Issue a "Commit" command
5. The backend process stops at the breakpoint.
6. Stop a remote foreign server.
7. Detach the debugger.
   // The backend continues and prepare is failed. TR try to abort all
remote txs.
   // It's unnecessary to resolve remote txs which prepare is failed,
isn't it?
8. Send a cancel request.


BTW, I concerned that how to test the 2PC patches.
There are many failure patterns, such as failure timing,
failure server/nw (and unexpected recovery), and those combinations...

Though it's best to test those failure patterns automatically,
I have no idea for now, so I manually check some patterns.


> I've incorporated the above your comments in the local branch. I'll
> post the latest version patch after incorporating other comments soon.

OK, Thanks.


Regards,

--
Masahiro Ikeda
NTT DATA CORPORATION


Reply | Threaded
Open this post in threaded view
|

Re: Transactions involving multiple postgres foreign servers, take 2

Masahiko Sawada-2
In reply to this post by Masahiro Ikeda
On Tue, 14 Jul 2020 at 17:24, Masahiro Ikeda <[hidden email]> wrote:

>
> > I've attached the latest version patches. I've incorporated the review
> > comments I got so far and improved locking strategy.
>
> I want to ask a question about streaming replication with 2PC.
> Are you going to support 2PC with streaming replication?
>
> I tried streaming replication using v23 patches.
> I confirm that 2PC works with streaming replication,
> which there are primary/standby coordinator.
>
> But, in my understanding, the WAL of "PREPARE" and
> "COMMIT/ABORT PREPARED" can't be replicated to the standby server in
> sync.
>
> If this is right, the unresolved transaction can be occurred.
>
> For example,
>
> 1. PREPARE is done
> 2. crash primary before the WAL related to PREPARE is
>     replicated to the standby server
> 3. promote standby server // but can't execute "ABORT PREPARED"
>
> In above case, the remote server has the unresolved transaction.
> Can we solve this problem to support in-sync replication?
>
> But, I think some users use async replication for performance.
> Do we need to document the limitation or make another solution?
>

IIUC with synchronous replication, we can guarantee that WAL records
are written on both primary and replicas when the client got an
acknowledgment of commit. We don't replicate each WAL records
generated during transaction one by one in sync. In the case you
described, the client will get an error due to the server crash.
Therefore I think the user cannot expect WAL records generated so far
has been replicated. The same issue could happen also when the user
executes PREPARE TRANSACTION and the server crashes. To prevent this
issue, I think we would need to send each WAL records in sync but I'm
not sure it's reasonable behavior, and as long as we write WAL in the
local and then send it to replicas we would need a smart mechanism to
prevent this situation.

Related to the pointing out by Ikeda-san, I realized that with the
current patch the backend waits for synchronous replication and then
waits for foreign transaction resolution. But it should be reversed.
Otherwise, it could lead to data loss even when the client got an
acknowledgment of commit. Also, when the user is using both atomic
commit and synchronous replication and wants to cancel waiting, he/she
will need to press ctl-c twice with the current patch, which also
should be fixed.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

RE: Transactions involving multiple postgres foreign servers, take 2

tsunakawa.takay@fujitsu.com
In reply to this post by Masahiko Sawada-2
Hi Sawada san,


I'm reviewing this patch series, and let me give some initial comments and questions.  I'm looking at this with a hope that this will be useful purely as a FDW enhancement for our new use cases, regardless of whether the FDW will be used for Postgres scale-out.

I don't think it's necessarily required to combine 2PC with the global visibility.  X/Open XA specification only handles the atomic commit.  The only part in the XA specification that refers to global visibility is the following:


[Quote from XA specification]
--------------------------------------------------
2.3.2 Protocol Optimisations
・ Read-only
An RM can respond to the TM’s prepare request by asserting that the RM was not
asked to update shared resources in this transaction branch. This response
concludes the RM’s involvement in the transaction; the Phase 2 dialogue between
the TM and this RM does not occur. The TM need not stably record, in its list of
participating RMs, an RM that asserts a read-only role in the global transaction.

However, if the RM returns the read-only optimisation before all work on the global
transaction is prepared, global serialisability1 cannot be guaranteed. This is because
the RM may release transaction context, such as read locks, before all application
activity for that global transaction is finished.

1.
Serialisability is a property of a set of concurrent transactions. For a serialisable set of transactions, at least one
serial sequence of the transactions exists that produces identical results, with respect to shared resources, as does
concurrent execution of the transaction.
--------------------------------------------------


(1)
Do other popular DBMSs (Oracle, MySQL, etc.)  provide concrete functions that can be used for the new FDW commit/rollback/prepare API?  I'm asking this to confirm that we really need to provide these functions, not as the transaction callbacks for postgres_fdw.


(2)
How are data modifications tracked in local and remote transactions?  0001 seems to handle local INSERT/DELETE/UPDATE.  Especially:

* COPY FROM to local/remote tables/views.

* User-defined function calls that modify data, e.g. SELECT func1() WHERE col = func2()


(3)
Does the 2PC processing always go through the background worker?
Is the group commit effective on the remote server? That is, PREPARE and COMMIT PREPARED issued from multiple remote sessions are written to WAL in batch?


Regards
Takayuki Tsunakawa

Reply | Threaded
Open this post in threaded view
|

Re: Transactions involving multiple postgres foreign servers, take 2

Masahiko Sawada-2
In reply to this post by Fujii Masao-4
On Tue, 14 Jul 2020 at 11:19, Fujii Masao <[hidden email]> wrote:

>
>
>
> On 2020/07/14 9:08, Masahiro Ikeda wrote:
> >> I've attached the latest version patches. I've incorporated the review
> >> comments I got so far and improved locking strategy.
> >
> > Thanks for updating the patch!
>
> +1
> I'm interested in these patches and now studying them. While checking
> the behaviors of the patched PostgreSQL, I got three comments.

Thank you for testing this patch!

>
> 1. We can access to the foreign table even during recovery in the HEAD.
> But in the patched version, when I did that, I got the following error.
> Is this intentional?
>
> ERROR:  cannot assign TransactionIds during recovery

No, it should be fixed. I'm going to fix this by not collecting
participants for atomic commit during recovery.

>
> 2. With the patch, when INSERT/UPDATE/DELETE are executed both in
> local and remote servers, 2PC is executed at the commit phase. But
> when write SQL (e.g., TRUNCATE) except INSERT/UPDATE/DELETE are
> executed in local and INSERT/UPDATE/DELETE are executed in remote,
> 2PC is NOT executed. Is this safe?

Hmm, you're right. I think atomic commit must be used also when the
user executes other write SQLs such as TRUNCATE, COPY, CLUSTER, and
CREATE TABLE on the local node.

>
> 3. XACT_FLAGS_WROTENONTEMPREL is set when INSERT/UPDATE/DELETE
> are executed. But it's not reset even when those queries are canceled by
> ROLLBACK TO SAVEPOINT. This may cause unnecessary 2PC at the commit phase.

Will fix.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Transactions involving multiple postgres foreign servers, take 2

Masahiro Ikeda
In reply to this post by Masahiko Sawada-2
On 2020-07-16 13:16, Masahiko Sawada wrote:

> On Tue, 14 Jul 2020 at 17:24, Masahiro Ikeda <[hidden email]>
> wrote:
>>
>> > I've attached the latest version patches. I've incorporated the review
>> > comments I got so far and improved locking strategy.
>>
>> I want to ask a question about streaming replication with 2PC.
>> Are you going to support 2PC with streaming replication?
>>
>> I tried streaming replication using v23 patches.
>> I confirm that 2PC works with streaming replication,
>> which there are primary/standby coordinator.
>>
>> But, in my understanding, the WAL of "PREPARE" and
>> "COMMIT/ABORT PREPARED" can't be replicated to the standby server in
>> sync.
>>
>> If this is right, the unresolved transaction can be occurred.
>>
>> For example,
>>
>> 1. PREPARE is done
>> 2. crash primary before the WAL related to PREPARE is
>>     replicated to the standby server
>> 3. promote standby server // but can't execute "ABORT PREPARED"
>>
>> In above case, the remote server has the unresolved transaction.
>> Can we solve this problem to support in-sync replication?
>>
>> But, I think some users use async replication for performance.
>> Do we need to document the limitation or make another solution?
>>
>
> IIUC with synchronous replication, we can guarantee that WAL records
> are written on both primary and replicas when the client got an
> acknowledgment of commit. We don't replicate each WAL records
> generated during transaction one by one in sync. In the case you
> described, the client will get an error due to the server crash.
> Therefore I think the user cannot expect WAL records generated so far
> has been replicated. The same issue could happen also when the user
> executes PREPARE TRANSACTION and the server crashes.

Thanks! I didn't noticed the behavior when a user executes PREPARE
TRANSACTION is same.

IIUC with 2PC, there is a different point between (1)PREPARE TRANSACTION
and (2)2PC.
The point is that whether the client can know when the server crashed
and it's global tx id.

If (1)PREPARE TRANSACTION is failed, it's ok the client execute same
command
because if the remote server is already prepared the command will be
ignored.

But, if (2)2PC is failed with coordinator crash, the client can't know
what operations should be done.

If the old coordinator already executed PREPARED, there are some
transaction which should be ABORT PREPARED.
But if the PREPARED WAL is not sent to the standby, the new coordinator
can't execute ABORT PREPARED.
And the client can't know which remote servers have PREPARED
transactions which should be ABORTED either.

Even if the client can know that, only the old coordinator knows its
global transaction id.
Only the database administrator can analyze the old coordinator's log
and then execute the appropriate commands manually, right?


> To prevent this
> issue, I think we would need to send each WAL records in sync but I'm
> not sure it's reasonable behavior, and as long as we write WAL in the
> local and then send it to replicas we would need a smart mechanism to
> prevent this situation.

I agree. To send each 2PC WAL records  in sync must be with a large
performance impact.
At least, we need to document the limitation and how to handle this
situation.


> Related to the pointing out by Ikeda-san, I realized that with the
> current patch the backend waits for synchronous replication and then
> waits for foreign transaction resolution. But it should be reversed.
> Otherwise, it could lead to data loss even when the client got an
> acknowledgment of commit. Also, when the user is using both atomic
> commit and synchronous replication and wants to cancel waiting, he/she
> will need to press ctl-c twice with the current patch, which also
> should be fixed.

I'm sorry that I can't understood.

In my understanding, if COMMIT WAL is replicated to the standby in sync,
the standby server can resolve the transaction after crash recovery in
promoted phase.

If reversed, there are some situation which can't guarantee atomic
commit.
In case that some foreign transaction resolutions are succeed but others
are failed(and COMMIT WAL is not replicated),
the standby must ABORT PREPARED because the COMMIT WAL is not
replicated.
This means that some  foreign transactions are COMMITE PREPARED executed
by primary coordinator,
other foreign transactions can be ABORT PREPARED executed by secondary
coordinator.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION


Reply | Threaded
Open this post in threaded view
|

Re: Transactions involving multiple postgres foreign servers, take 2

Masahiko Sawada-2
In reply to this post by tsunakawa.takay@fujitsu.com
On Thu, 16 Jul 2020 at 13:53, [hidden email]
<[hidden email]> wrote:
>
> Hi Sawada san,
>
>
> I'm reviewing this patch series, and let me give some initial comments and questions.  I'm looking at this with a hope that this will be useful purely as a FDW enhancement for our new use cases, regardless of whether the FDW will be used for Postgres scale-out.

Thank you for reviewing this patch!

Yes, this patch is trying to resolve the generic atomic commit problem
w.r.t. FDW, and will be useful also for Postgres scale-out.

>
> I don't think it's necessarily required to combine 2PC with the global visibility.  X/Open XA specification only handles the atomic commit.  The only part in the XA specification that refers to global visibility is the following:
>
>
> [Quote from XA specification]
> --------------------------------------------------
> 2.3.2 Protocol Optimisations
> ・ Read-only
> An RM can respond to the TM’s prepare request by asserting that the RM was not
> asked to update shared resources in this transaction branch. This response
> concludes the RM’s involvement in the transaction; the Phase 2 dialogue between
> the TM and this RM does not occur. The TM need not stably record, in its list of
> participating RMs, an RM that asserts a read-only role in the global transaction.
>
> However, if the RM returns the read-only optimisation before all work on the global
> transaction is prepared, global serialisability1 cannot be guaranteed. This is because
> the RM may release transaction context, such as read locks, before all application
> activity for that global transaction is finished.
>
> 1.
> Serialisability is a property of a set of concurrent transactions. For a serialisable set of transactions, at least one
> serial sequence of the transactions exists that produces identical results, with respect to shared resources, as does
> concurrent execution of the transaction.
> --------------------------------------------------
>

Agreed.

>
> (1)
> Do other popular DBMSs (Oracle, MySQL, etc.)  provide concrete functions that can be used for the new FDW commit/rollback/prepare API?  I'm asking this to confirm that we really need to provide these functions, not as the transaction callbacks for postgres_fdw.
>

I have briefly checked the only oracle_fdw but in general I think that
if an existing FDW supports transaction begin, commit, and rollback,
these can be ported to new FDW transaction APIs easily.

Regarding the comparison between FDW transaction APIs and transaction
callbacks, I think one of the benefits of providing FDW transaction
APIs is that the core is able to manage the status of foreign
transactions. We need to track the status of individual foreign
transactions to support atomic commit. If we use transaction callbacks
(XactCallback) that many FDWs are using, I think we will end up
calling the transaction callback and leave the transaction work to
FDWs, leading that the core is not able to know the return values of
PREPARE TRANSACTION for example. We can add more arguments passed to
transaction callbacks to get the return value from FDWs but I don’t
think it’s a good idea as transaction callbacks are used not only by
FDW but also other external modules.

>
> (2)
> How are data modifications tracked in local and remote transactions?  0001 seems to handle local INSERT/DELETE/UPDATE.  Especially:
>
> * COPY FROM to local/remote tables/views.
>
> * User-defined function calls that modify data, e.g. SELECT func1() WHERE col = func2()
>

With the current version patch (v23), it supports only
INSERT/DELETE/UPDATE. But I'm going to change the patch so that it
supports other writes SQLs as Fujii-san also pointed out.

>
> (3)
> Does the 2PC processing always go through the background worker?
> Is the group commit effective on the remote server? That is, PREPARE and COMMIT PREPARED issued from multiple remote sessions are written to WAL in batch?

No, in the current design, the backend who received a query from the
client does PREPARE, and then the transaction resolver process, a
background worker, does COMMIT PREPARED.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

RE: Transactions involving multiple postgres foreign servers, take 2

tsunakawa.takay@fujitsu.com
From: Masahiko Sawada <[hidden email]>
I have briefly checked the only oracle_fdw but in general I think that
> if an existing FDW supports transaction begin, commit, and rollback,
> these can be ported to new FDW transaction APIs easily.

Does oracle_fdw support begin, commit and rollback?

And most importantly, do other major DBMSs, including Oracle, provide the API for preparing a transaction?  In other words, will the FDWs other than postgres_fdw really be able to take advantage of the new FDW functions to join the 2PC processing?  I think we need to confirm that there are concrete examples.

What I'm worried is that if only postgres_fdw can implement the prepare function, it's a sign that FDW interface will be riddled with functions only for Postgres.  That is, the FDW interface is getting away from its original purpose "access external data as a relation" and complex.  Tomas Vondra showed this concern as follows:

Horizontal scalability/sharding
https://www.postgresql.org/message-id/flat/CANP8%2BjK%3D%2B3zVYDFY0oMAQKQVJ%2BqReDHr1UPdyFEELO82yVfb9A%40mail.gmail.com#2c45f0ee97855449f1f7fedcef1d5e11


[Tomas Vondra's remarks]
--------------------------------------------------
> This strikes me as a bit of a conflict of interest with FDW which
> seems to want to hide the fact that it's foreign; the FDW
> implementation makes it's own optimization decisions which might
> make sense for single table queries but breaks down in the face of
> joins.

+1 to these concerns

In my mind, FDW is a wonderful tool to integrate PostgreSQL with
external data sources, and it's nicely shaped for this purpose, which
implies the abstractions and assumptions in the code.

The truth however is that many current uses of the FDW API are actually
using it for different purposes because there's no other way to do that,
not because FDWs are the "right way". And this includes the attempts to
build sharding on FDW, I think.

Situations like this result in "improvements" of the API that seem to
improve the API for the second group, but make the life harder for the
original FDW API audience by making the API needlessly complex. And I
say "seem to improve" because the second group eventually runs into the
fundamental abstractions and assumptions the API is based on anyway.

And based on the discussions at pgcon, I think this is the main reason
why people cringe when they hear "FDW" and "sharding" in the same sentence.

...
My other worry is that we'll eventually mess the FDW infrastructure,
making it harder to use for the original purpose. Granted, most of the
improvements proposed so far look sane and useful for FDWs in general,
but sooner or later that ceases to be the case - there sill be changes
needed merely for the sharding. Those will be tough decisions.
--------------------------------------------------


> Regarding the comparison between FDW transaction APIs and transaction
> callbacks, I think one of the benefits of providing FDW transaction
> APIs is that the core is able to manage the status of foreign
> transactions. We need to track the status of individual foreign
> transactions to support atomic commit. If we use transaction callbacks
> (XactCallback) that many FDWs are using, I think we will end up
> calling the transaction callback and leave the transaction work to
> FDWs, leading that the core is not able to know the return values of
> PREPARE TRANSACTION for example. We can add more arguments passed to
> transaction callbacks to get the return value from FDWs but I don’t
> think it’s a good idea as transaction callbacks are used not only by
> FDW but also other external modules.

To track the foreign transaction status, we can add GetTransactionStatus() to the FDW interface as an alternative, can't we?


> With the current version patch (v23), it supports only
> INSERT/DELETE/UPDATE. But I'm going to change the patch so that it
> supports other writes SQLs as Fujii-san also pointed out.

OK.  I've just read that Fujii san already pointed out a similar thing.  But I wonder if we can know that the UDF executed on the foreign server has updated data.  Maybe we can know or guess it by calling txid_current_if_any() or checking the transaction status in FE/BE protocol, but can we deal with other FDWs other than postgres_fdw?


> No, in the current design, the backend who received a query from the
> client does PREPARE, and then the transaction resolver process, a
> background worker, does COMMIT PREPARED.

This "No" means the current implementation cannot group commits from multiple transactions?
Does the transaction resolver send COMMIT PREPARED and waits for its response for each transaction one by one?  For example,

[local server]
Transaction T1 and T2 performs 2PC at the same time.
Transaction resolver sends COMMIT PREPARED for T1 and then waits for the response.
T1 writes COMMIT PREPARED record locally and sync the WAL.
Transaction resolver sends COMMIT PREPARED for T2 and then waits for the response.
T2 writes COMMIT PREPARED record locally and sync the WAL.

[foreign server]
T1 writes COMMIT PREPARED record locally and sync the WAL.
T2 writes COMMIT PREPARED record locally and sync the WAL.

If the WAL records of multiple concurrent transactions are written and synced separately, i.e. group commit doesn't take effect, then the OLTP transaction performance will be unacceptable.


Regards
Takayuki Tsunakawa


Reply | Threaded
Open this post in threaded view
|

Re: Transactions involving multiple postgres foreign servers, take 2

Laurenz Albe
On Fri, 2020-07-17 at 05:21 +0000, [hidden email] wrote:
> From: Masahiko Sawada <[hidden email]>
> I have briefly checked the only oracle_fdw but in general I think that
> > if an existing FDW supports transaction begin, commit, and rollback,
> > these can be ported to new FDW transaction APIs easily.
>
> Does oracle_fdw support begin, commit and rollback?

Yes.

> And most importantly, do other major DBMSs, including Oracle, provide the API for
> preparing a transaction?  In other words, will the FDWs other than postgres_fdw
> really be able to take advantage of the new FDW functions to join the 2PC processing?
> I think we need to confirm that there are concrete examples.

I bet they do.  There is even a standard for that.

I am not looking forward to adapting oracle_fdw, and I didn't read the patch.

But using distributed transactions is certainly a good thing if it is done right.

The trade off is the need for a transaction manager, and implementing that
correctly is a high price to pay.

Yours,
Laurenz Albe



Reply | Threaded
Open this post in threaded view
|

Re: Transactions involving multiple postgres foreign servers, take 2

Masahiko Sawada-2
In reply to this post by Masahiro Ikeda
On Fri, 17 Jul 2020 at 11:06, Masahiro Ikeda <[hidden email]> wrote:

>
> On 2020-07-16 13:16, Masahiko Sawada wrote:
> > On Tue, 14 Jul 2020 at 17:24, Masahiro Ikeda <[hidden email]>
> > wrote:
> >>
> >> > I've attached the latest version patches. I've incorporated the review
> >> > comments I got so far and improved locking strategy.
> >>
> >> I want to ask a question about streaming replication with 2PC.
> >> Are you going to support 2PC with streaming replication?
> >>
> >> I tried streaming replication using v23 patches.
> >> I confirm that 2PC works with streaming replication,
> >> which there are primary/standby coordinator.
> >>
> >> But, in my understanding, the WAL of "PREPARE" and
> >> "COMMIT/ABORT PREPARED" can't be replicated to the standby server in
> >> sync.
> >>
> >> If this is right, the unresolved transaction can be occurred.
> >>
> >> For example,
> >>
> >> 1. PREPARE is done
> >> 2. crash primary before the WAL related to PREPARE is
> >>     replicated to the standby server
> >> 3. promote standby server // but can't execute "ABORT PREPARED"
> >>
> >> In above case, the remote server has the unresolved transaction.
> >> Can we solve this problem to support in-sync replication?
> >>
> >> But, I think some users use async replication for performance.
> >> Do we need to document the limitation or make another solution?
> >>
> >
> > IIUC with synchronous replication, we can guarantee that WAL records
> > are written on both primary and replicas when the client got an
> > acknowledgment of commit. We don't replicate each WAL records
> > generated during transaction one by one in sync. In the case you
> > described, the client will get an error due to the server crash.
> > Therefore I think the user cannot expect WAL records generated so far
> > has been replicated. The same issue could happen also when the user
> > executes PREPARE TRANSACTION and the server crashes.
>
> Thanks! I didn't noticed the behavior when a user executes PREPARE
> TRANSACTION is same.
>
> IIUC with 2PC, there is a different point between (1)PREPARE TRANSACTION
> and (2)2PC.
> The point is that whether the client can know when the server crashed
> and it's global tx id.
>
> If (1)PREPARE TRANSACTION is failed, it's ok the client execute same
> command
> because if the remote server is already prepared the command will be
> ignored.
>
> But, if (2)2PC is failed with coordinator crash, the client can't know
> what operations should be done.
>
> If the old coordinator already executed PREPARED, there are some
> transaction which should be ABORT PREPARED.
> But if the PREPARED WAL is not sent to the standby, the new coordinator
> can't execute ABORT PREPARED.
> And the client can't know which remote servers have PREPARED
> transactions which should be ABORTED either.
>
> Even if the client can know that, only the old coordinator knows its
> global transaction id.
> Only the database administrator can analyze the old coordinator's log
> and then execute the appropriate commands manually, right?

I think that's right. In the case of the coordinator crash, the user
can look orphaned foreign prepared transactions by checking the
'identifier' column of pg_foreign_xacts on the new standby server and
the prepared transactions on the remote servers.

>
>
> > To prevent this
> > issue, I think we would need to send each WAL records in sync but I'm
> > not sure it's reasonable behavior, and as long as we write WAL in the
> > local and then send it to replicas we would need a smart mechanism to
> > prevent this situation.
>
> I agree. To send each 2PC WAL records  in sync must be with a large
> performance impact.
> At least, we need to document the limitation and how to handle this
> situation.

Ok. I'll add it.

>
>
> > Related to the pointing out by Ikeda-san, I realized that with the
> > current patch the backend waits for synchronous replication and then
> > waits for foreign transaction resolution. But it should be reversed.
> > Otherwise, it could lead to data loss even when the client got an
> > acknowledgment of commit. Also, when the user is using both atomic
> > commit and synchronous replication and wants to cancel waiting, he/she
> > will need to press ctl-c twice with the current patch, which also
> > should be fixed.
>
> I'm sorry that I can't understood.
>
> In my understanding, if COMMIT WAL is replicated to the standby in sync,
> the standby server can resolve the transaction after crash recovery in
> promoted phase.
>
> If reversed, there are some situation which can't guarantee atomic
> commit.
> In case that some foreign transaction resolutions are succeed but others
> are failed(and COMMIT WAL is not replicated),
> the standby must ABORT PREPARED because the COMMIT WAL is not
> replicated.
> This means that some  foreign transactions are COMMITE PREPARED executed
> by primary coordinator,
> other foreign transactions can be ABORT PREPARED executed by secondary
> coordinator.

You're right. Thank you for pointing out!

If the coordinator crashes after the client gets acknowledgment of the
successful commit of the transaction but before sending
XLOG_FDWXACT_REMOVE record to the replicas, the FdwXact entries are
left on the replicas even after failover. But since we require FDW to
tolerate the error of undefined prepared transactions in
COMMIT/ROLLBACK PREPARED it won’t be a critical problem.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

RE: Transactions involving multiple postgres foreign servers, take 2

tsunakawa.takay@fujitsu.com
In reply to this post by Laurenz Albe
From: Laurenz Albe <[hidden email]>

> On Fri, 2020-07-17 at 05:21 +0000, [hidden email] wrote:
> > And most importantly, do other major DBMSs, including Oracle, provide the
> API for
> > preparing a transaction?  In other words, will the FDWs other than
> postgres_fdw
> > really be able to take advantage of the new FDW functions to join the 2PC
> processing?
> > I think we need to confirm that there are concrete examples.
>
> I bet they do.  There is even a standard for that.

If you're thinking of xa_prepare() defined in the X/Open XA specification, we need to be sure that other FDWs can really utilize this new 2PC mechanism.  What I'm especially wondering is when the FDW can call xa_start().


Regards
Takayuki Tsunakawa


Reply | Threaded
Open this post in threaded view
|

Re: Transactions involving multiple postgres foreign servers, take 2

Masahiko Sawada-2
In reply to this post by tsunakawa.takay@fujitsu.com
On Fri, 17 Jul 2020 at 14:22, [hidden email]
<[hidden email]> wrote:
>
> From: Masahiko Sawada <[hidden email]>
> I have briefly checked the only oracle_fdw but in general I think that
> > if an existing FDW supports transaction begin, commit, and rollback,
> > these can be ported to new FDW transaction APIs easily.
>
> Does oracle_fdw support begin, commit and rollback?
>
> And most importantly, do other major DBMSs, including Oracle, provide the API for preparing a transaction?  In other words, will the FDWs other than postgres_fdw really be able to take advantage of the new FDW functions to join the 2PC processing?  I think we need to confirm that there are concrete examples.

I also believe they do. But I'm concerned that some FDW needs to start
a transaction differently when using 2PC. For instance, IIUC MySQL
also supports 2PC but the transaction needs to be started with "XA
START id” when the transaction needs to be prepared. The transaction
started with XA START can be closed by XA END followed by XA PREPARE
or XA COMMIT ONE PHASE. It means that when starts a new transaction
the transaction needs to prepare the transaction identifier and to
know that 2PC might be used. It’s quite different from PostgreSQL. In
PostgreSQL, we can start a transaction by BEGIN and end it by PREPARE
TRANSACTION, COMMIT, or ROLLBACK. The transaction identifier is
required when PREPARE TRANSACTION.

With MySQL, I guess FDW needs a way to tell the (next) transaction
needs to be started with XA START so it can be prepared. It could be a
custom GUC or an SQL function. Then when starts a new transaction on
MySQL server, FDW can generate and store a transaction identifier into
somewhere alongside the connection. At the prepare phase, it passes
the transaction identifier via GetPrepareId() API to the core.

I haven’t tested the above yet and it’s just a desk plan. it's
definitely a good idea to try integrating this 2PC feature to FDWs
other than postgres_fdw to see if design and interfaces are
implemented sophisticatedly.

>
> What I'm worried is that if only postgres_fdw can implement the prepare function, it's a sign that FDW interface will be riddled with functions only for Postgres.  That is, the FDW interface is getting away from its original purpose "access external data as a relation" and complex.  Tomas Vondra showed this concern as follows:
>
> Horizontal scalability/sharding
> https://www.postgresql.org/message-id/flat/CANP8%2BjK%3D%2B3zVYDFY0oMAQKQVJ%2BqReDHr1UPdyFEELO82yVfb9A%40mail.gmail.com#2c45f0ee97855449f1f7fedcef1d5e11
>
>
> [Tomas Vondra's remarks]
> --------------------------------------------------
> > This strikes me as a bit of a conflict of interest with FDW which
> > seems to want to hide the fact that it's foreign; the FDW
> > implementation makes it's own optimization decisions which might
> > make sense for single table queries but breaks down in the face of
> > joins.
>
> +1 to these concerns
>
> In my mind, FDW is a wonderful tool to integrate PostgreSQL with
> external data sources, and it's nicely shaped for this purpose, which
> implies the abstractions and assumptions in the code.
>
> The truth however is that many current uses of the FDW API are actually
> using it for different purposes because there's no other way to do that,
> not because FDWs are the "right way". And this includes the attempts to
> build sharding on FDW, I think.
>
> Situations like this result in "improvements" of the API that seem to
> improve the API for the second group, but make the life harder for the
> original FDW API audience by making the API needlessly complex. And I
> say "seem to improve" because the second group eventually runs into the
> fundamental abstractions and assumptions the API is based on anyway.
>
> And based on the discussions at pgcon, I think this is the main reason
> why people cringe when they hear "FDW" and "sharding" in the same sentence.
>
> ...
> My other worry is that we'll eventually mess the FDW infrastructure,
> making it harder to use for the original purpose. Granted, most of the
> improvements proposed so far look sane and useful for FDWs in general,
> but sooner or later that ceases to be the case - there sill be changes
> needed merely for the sharding. Those will be tough decisions.
> --------------------------------------------------
>
>
> > Regarding the comparison between FDW transaction APIs and transaction
> > callbacks, I think one of the benefits of providing FDW transaction
> > APIs is that the core is able to manage the status of foreign
> > transactions. We need to track the status of individual foreign
> > transactions to support atomic commit. If we use transaction callbacks
> > (XactCallback) that many FDWs are using, I think we will end up
> > calling the transaction callback and leave the transaction work to
> > FDWs, leading that the core is not able to know the return values of
> > PREPARE TRANSACTION for example. We can add more arguments passed to
> > transaction callbacks to get the return value from FDWs but I don’t
> > think it’s a good idea as transaction callbacks are used not only by
> > FDW but also other external modules.
>
> To track the foreign transaction status, we can add GetTransactionStatus() to the FDW interface as an alternative, can't we?

I haven't thought such an interface but it sounds like the transaction
status is managed on both the core and FDWs. Could you elaborate on
that?

>
>
> > With the current version patch (v23), it supports only
> > INSERT/DELETE/UPDATE. But I'm going to change the patch so that it
> > supports other writes SQLs as Fujii-san also pointed out.
>
> OK.  I've just read that Fujii san already pointed out a similar thing.  But I wonder if we can know that the UDF executed on the foreign server has updated data.  Maybe we can know or guess it by calling txid_current_if_any() or checking the transaction status in FE/BE protocol, but can we deal with other FDWs other than postgres_fdw?

Ah, my answer was not enough. It was only about tracking local writes.

Regarding tracking of writes on the foreign server, I think there are
restrictions. Currently, the executor registers a foreign sever as a
participant of 2PC before calling BeginForeignInsert(),
BeginForeignModify(), and BeginForeignScan() etc with a flag
indicating whether writes is going to happen on the foreign server. So
even if an UDF in a SELECT statement that could update data were to be
pushed down to the foreign server,  the foreign server would be marked
as *not* modified. I’ve not tested yet but I guess that since FDW also
is allowed to register the foreign server along with that flag anytime
before commit, FDW is able to forcibly change that flag if it knows
the SELECT query is going to modify the data on the remote server.

>
>
> > No, in the current design, the backend who received a query from the
> > client does PREPARE, and then the transaction resolver process, a
> > background worker, does COMMIT PREPARED.
>
> This "No" means the current implementation cannot group commits from multiple transactions?

Yes.

> Does the transaction resolver send COMMIT PREPARED and waits for its response for each transaction one by one?  For example,
>
> [local server]
> Transaction T1 and T2 performs 2PC at the same time.
> Transaction resolver sends COMMIT PREPARED for T1 and then waits for the response.
> T1 writes COMMIT PREPARED record locally and sync the WAL.
> Transaction resolver sends COMMIT PREPARED for T2 and then waits for the response.
> T2 writes COMMIT PREPARED record locally and sync the WAL.
>
> [foreign server]
> T1 writes COMMIT PREPARED record locally and sync the WAL.
> T2 writes COMMIT PREPARED record locally and sync the WAL.

Just to be clear, the transaction resolver writes FDWXACT_REMOVE
records instead of COMMIT PREPARED record to remove foreign
transaction entry. But, yes, the transaction resolver works like the
above you explained.

> If the WAL records of multiple concurrent transactions are written and synced separately, i.e. group commit doesn't take effect, then the OLTP transaction performance will be unacceptable.

I agree that it'll be a large performance penalty. I'd like to have it
but I’m not sure we should have it in the first version from the
perspective of complexity. Since the procedure of 2PC is originally
high cost, in my opinion, the user should not use as much as possible
in terms of performance. Especially in OLTP, its cost will directly
affect the latency. I’d suggest designing database schema so
transaction touches only one foreign server but do you have concrete
OLTP usecase where normally requires 2PC, and how many servers
involved within a distributed transaction?

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Transactions involving multiple postgres foreign servers, take 2

Fujii Masao-4


On 2020/07/17 20:04, Masahiko Sawada wrote:

> On Fri, 17 Jul 2020 at 14:22, [hidden email]
> <[hidden email]> wrote:
>>
>> From: Masahiko Sawada <[hidden email]>
>> I have briefly checked the only oracle_fdw but in general I think that
>>> if an existing FDW supports transaction begin, commit, and rollback,
>>> these can be ported to new FDW transaction APIs easily.
>>
>> Does oracle_fdw support begin, commit and rollback?
>>
>> And most importantly, do other major DBMSs, including Oracle, provide the API for preparing a transaction?  In other words, will the FDWs other than postgres_fdw really be able to take advantage of the new FDW functions to join the 2PC processing?  I think we need to confirm that there are concrete examples.
>
> I also believe they do. But I'm concerned that some FDW needs to start
> a transaction differently when using 2PC. For instance, IIUC MySQL
> also supports 2PC but the transaction needs to be started with "XA
> START id” when the transaction needs to be prepared. The transaction
> started with XA START can be closed by XA END followed by XA PREPARE
> or XA COMMIT ONE PHASE.

This means that FDW should provide also the API for xa_end()?
Maybe we need to consider again which API we should provide in FDW,
based on XA specification?


> It means that when starts a new transaction
> the transaction needs to prepare the transaction identifier and to
> know that 2PC might be used. It’s quite different from PostgreSQL. In
> PostgreSQL, we can start a transaction by BEGIN and end it by PREPARE
> TRANSACTION, COMMIT, or ROLLBACK. The transaction identifier is
> required when PREPARE TRANSACTION.
>
> With MySQL, I guess FDW needs a way to tell the (next) transaction
> needs to be started with XA START so it can be prepared. It could be a
> custom GUC or an SQL function. Then when starts a new transaction on
> MySQL server, FDW can generate and store a transaction identifier into
> somewhere alongside the connection. At the prepare phase, it passes
> the transaction identifier via GetPrepareId() API to the core.
>
> I haven’t tested the above yet and it’s just a desk plan. it's
> definitely a good idea to try integrating this 2PC feature to FDWs
> other than postgres_fdw to see if design and interfaces are
> implemented sophisticatedly.

With the current patch, we track whether write queries are executed
in each server. Then, if the number of servers that execute write queries
is less than two, 2PC is skipped. This "optimization" is not necessary
(cannot be applied) when using mysql_fdw because the transaction starts
with XA START. Right?

If that's the "optimization" only for postgres_fdw, maybe it's better to
get rid of that "optimization" from the first patch, to make the patch simpler.

Regards,


--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION


Reply | Threaded
Open this post in threaded view
|

Re: Transactions involving multiple postgres foreign servers, take 2

Fujii Masao-4
In reply to this post by Masahiko Sawada-2


On 2020/07/16 14:47, Masahiko Sawada wrote:

> On Tue, 14 Jul 2020 at 11:19, Fujii Masao <[hidden email]> wrote:
>>
>>
>>
>> On 2020/07/14 9:08, Masahiro Ikeda wrote:
>>>> I've attached the latest version patches. I've incorporated the review
>>>> comments I got so far and improved locking strategy.
>>>
>>> Thanks for updating the patch!
>>
>> +1
>> I'm interested in these patches and now studying them. While checking
>> the behaviors of the patched PostgreSQL, I got three comments.
>
> Thank you for testing this patch!
>
>>
>> 1. We can access to the foreign table even during recovery in the HEAD.
>> But in the patched version, when I did that, I got the following error.
>> Is this intentional?
>>
>> ERROR:  cannot assign TransactionIds during recovery
>
> No, it should be fixed. I'm going to fix this by not collecting
> participants for atomic commit during recovery.

Thanks for trying to fix the issues!

I'd like to report one more issue. When I started new transaction
in the local server, executed INSERT in the remote server via
postgres_fdw and then quit psql, I got the following assertion failure.

TRAP: FailedAssertion("fdwxact", File: "fdwxact.c", Line: 1570)
0   postgres                            0x000000010d52f3c0 ExceptionalCondition + 160
1   postgres                            0x000000010cefbc49 ForgetAllFdwXactParticipants + 313
2   postgres                            0x000000010cefff14 AtProcExit_FdwXact + 20
3   postgres                            0x000000010d313fe3 shmem_exit + 179
4   postgres                            0x000000010d313e7a proc_exit_prepare + 122
5   postgres                            0x000000010d313da3 proc_exit + 19
6   postgres                            0x000000010d35112f PostgresMain + 3711
7   postgres                            0x000000010d27bb3a BackendRun + 570
8   postgres                            0x000000010d27af6b BackendStartup + 475
9   postgres                            0x000000010d279ed1 ServerLoop + 593
10  postgres                            0x000000010d277940 PostmasterMain + 6016
11  postgres                            0x000000010d1597b9 main + 761
12  libdyld.dylib                       0x00007fff7161e3d5 start + 1
13  ???                                 0x0000000000000003 0x0 + 3

Regards,


--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION


Reply | Threaded
Open this post in threaded view
|

Re: Transactions involving multiple postgres foreign servers, take 2

Masahiro Ikeda
In reply to this post by Masahiko Sawada-2
On 2020-07-17 15:55, Masahiko Sawada wrote:

> On Fri, 17 Jul 2020 at 11:06, Masahiro Ikeda <[hidden email]>
> wrote:
>>
>> On 2020-07-16 13:16, Masahiko Sawada wrote:
>> > On Tue, 14 Jul 2020 at 17:24, Masahiro Ikeda <[hidden email]>
>> > wrote:
>> >>
>> >> > I've attached the latest version patches. I've incorporated the review
>> >> > comments I got so far and improved locking strategy.
>> >>
>> >> I want to ask a question about streaming replication with 2PC.
>> >> Are you going to support 2PC with streaming replication?
>> >>
>> >> I tried streaming replication using v23 patches.
>> >> I confirm that 2PC works with streaming replication,
>> >> which there are primary/standby coordinator.
>> >>
>> >> But, in my understanding, the WAL of "PREPARE" and
>> >> "COMMIT/ABORT PREPARED" can't be replicated to the standby server in
>> >> sync.
>> >>
>> >> If this is right, the unresolved transaction can be occurred.
>> >>
>> >> For example,
>> >>
>> >> 1. PREPARE is done
>> >> 2. crash primary before the WAL related to PREPARE is
>> >>     replicated to the standby server
>> >> 3. promote standby server // but can't execute "ABORT PREPARED"
>> >>
>> >> In above case, the remote server has the unresolved transaction.
>> >> Can we solve this problem to support in-sync replication?
>> >>
>> >> But, I think some users use async replication for performance.
>> >> Do we need to document the limitation or make another solution?
>> >>
>> >
>> > IIUC with synchronous replication, we can guarantee that WAL records
>> > are written on both primary and replicas when the client got an
>> > acknowledgment of commit. We don't replicate each WAL records
>> > generated during transaction one by one in sync. In the case you
>> > described, the client will get an error due to the server crash.
>> > Therefore I think the user cannot expect WAL records generated so far
>> > has been replicated. The same issue could happen also when the user
>> > executes PREPARE TRANSACTION and the server crashes.
>>
>> Thanks! I didn't noticed the behavior when a user executes PREPARE
>> TRANSACTION is same.
>>
>> IIUC with 2PC, there is a different point between (1)PREPARE
>> TRANSACTION
>> and (2)2PC.
>> The point is that whether the client can know when the server crashed
>> and it's global tx id.
>>
>> If (1)PREPARE TRANSACTION is failed, it's ok the client execute same
>> command
>> because if the remote server is already prepared the command will be
>> ignored.
>>
>> But, if (2)2PC is failed with coordinator crash, the client can't know
>> what operations should be done.
>>
>> If the old coordinator already executed PREPARED, there are some
>> transaction which should be ABORT PREPARED.
>> But if the PREPARED WAL is not sent to the standby, the new
>> coordinator
>> can't execute ABORT PREPARED.
>> And the client can't know which remote servers have PREPARED
>> transactions which should be ABORTED either.
>>
>> Even if the client can know that, only the old coordinator knows its
>> global transaction id.
>> Only the database administrator can analyze the old coordinator's log
>> and then execute the appropriate commands manually, right?
>
> I think that's right. In the case of the coordinator crash, the user
> can look orphaned foreign prepared transactions by checking the
> 'identifier' column of pg_foreign_xacts on the new standby server and
> the prepared transactions on the remote servers.

I think there is a case we can't check orphaned foreign
prepared transaction in pg_foreign_xacts view on the new standby server.
It confuses users and database administrators.

If the primary coordinator crashes after preparing foreign transaction,
but before sending XLOG_FDWXACT_INSERT records to the standby server,
the standby server can't restore their transaction status and
pg_foreign_xacts view doesn't show the prepared foreign transactions.

To send XLOG_FDWXACT_INSERT records asynchronously leads this problem.

>> > To prevent this
>> > issue, I think we would need to send each WAL records in sync but I'm
>> > not sure it's reasonable behavior, and as long as we write WAL in the
>> > local and then send it to replicas we would need a smart mechanism to
>> > prevent this situation.
>>
>> I agree. To send each 2PC WAL records  in sync must be with a large
>> performance impact.
>> At least, we need to document the limitation and how to handle this
>> situation.
>
> Ok. I'll add it.

Thanks a lot.

>> > Related to the pointing out by Ikeda-san, I realized that with the
>> > current patch the backend waits for synchronous replication and then
>> > waits for foreign transaction resolution. But it should be reversed.
>> > Otherwise, it could lead to data loss even when the client got an
>> > acknowledgment of commit. Also, when the user is using both atomic
>> > commit and synchronous replication and wants to cancel waiting, he/she
>> > will need to press ctl-c twice with the current patch, which also
>> > should be fixed.
>>
>> I'm sorry that I can't understood.
>>
>> In my understanding, if COMMIT WAL is replicated to the standby in
>> sync,
>> the standby server can resolve the transaction after crash recovery in
>> promoted phase.
>>
>> If reversed, there are some situation which can't guarantee atomic
>> commit.
>> In case that some foreign transaction resolutions are succeed but
>> others
>> are failed(and COMMIT WAL is not replicated),
>> the standby must ABORT PREPARED because the COMMIT WAL is not
>> replicated.
>> This means that some  foreign transactions are COMMITE PREPARED
>> executed
>> by primary coordinator,
>> other foreign transactions can be ABORT PREPARED executed by secondary
>> coordinator.
>
> You're right. Thank you for pointing out!
>
> If the coordinator crashes after the client gets acknowledgment of the
> successful commit of the transaction but before sending
> XLOG_FDWXACT_REMOVE record to the replicas, the FdwXact entries are
> left on the replicas even after failover. But since we require FDW to
> tolerate the error of undefined prepared transactions in
> COMMIT/ROLLBACK PREPARED it won’t be a critical problem.

I agree. It's ok that the primary coordinator sends
XLOG_FDWXACT_REMOVE records asynchronously.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION


Reply | Threaded
Open this post in threaded view
|

Re: Transactions involving multiple postgres foreign servers, take 2

akapila
In reply to this post by Masahiko Sawada-2
On Fri, Jul 17, 2020 at 8:38 AM Masahiko Sawada
<[hidden email]> wrote:

>
> On Thu, 16 Jul 2020 at 13:53, [hidden email]
> <[hidden email]> wrote:
> >
> > Hi Sawada san,
> >
> >
> > I'm reviewing this patch series, and let me give some initial comments and questions.  I'm looking at this with a hope that this will be useful purely as a FDW enhancement for our new use cases, regardless of whether the FDW will be used for Postgres scale-out.
>
> Thank you for reviewing this patch!
>
> Yes, this patch is trying to resolve the generic atomic commit problem
> w.r.t. FDW, and will be useful also for Postgres scale-out.
>

I think it is important to get a consensus on this point.  If I
understand correctly, Tsunakawa-San doesn't sound to be convinced that
FDW can be used for postgres scale-out and we are trying to paint this
feature as a step forward in the scale-out direction.  As per my
understanding, we don't have a very clear vision whether we will be
able to achieve the other important aspects of scale-out feature like
global visibility if we go in this direction and that is the reason I
have insisted in this and the other related thread [1] to at least
have a high-level idea of the same before going too far with this
patch. It is quite possible that after spending months of efforts to
straighten out this patch/feature, we came to the conclusion that this
need to be re-designed or requires a lot of re-work to ensure that it
can be extended for global visibility.  It is better to spend some
effort up front to see if the proposed patch is a stepping stone for
achieving what we want w.r.t postgres scale-out.


[1] - https://www.postgresql.org/message-id/07b2c899-4ed0-4c87-1327-23c750311248%40postgrespro.ru

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

RE: Transactions involving multiple postgres foreign servers, take 2

tsunakawa.takay@fujitsu.com
In reply to this post by Masahiko Sawada-2
From: Masahiko Sawada <[hidden email]>

> I also believe they do. But I'm concerned that some FDW needs to start
> a transaction differently when using 2PC. For instance, IIUC MySQL
> also supports 2PC but the transaction needs to be started with "XA
> START id” when the transaction needs to be prepared. The transaction
> started with XA START can be closed by XA END followed by XA PREPARE
> or XA COMMIT ONE PHASE. It means that when starts a new transaction
> the transaction needs to prepare the transaction identifier and to
> know that 2PC might be used. It’s quite different from PostgreSQL. In
> PostgreSQL, we can start a transaction by BEGIN and end it by PREPARE
> TRANSACTION, COMMIT, or ROLLBACK. The transaction identifier is
> required when PREPARE TRANSACTION.

I guess Postgres is rather a minority in this regard.  All I know is XA and its Java counterpart (Java Transaction API: JTA).  In XA, the connection needs to be associated with an XID before its transaction work is performed.
If some transaction work is already done before associating with XID, xa_start() returns an error like this:

[XA specification]
--------------------------------------------------
[XAER_OUTSIDE]
The resource manager is doing work outside any global transaction on behalf of
the application.
--------------------------------------------------


[Java Transaction API (JTA)]
--------------------------------------------------
void start(Xid xid, int flags) throws XAException

This method starts work on behalf of a transaction branch.
...

3.4.7 Local and Global Transactions
The resource adapter is encouraged to support the usage of both local and global
transactions within the same transactional connection. Local transactions are
transactions that are started and coordinated by the resource manager internally. The
XAResource interface is not used for local transactions.

When using the same connection to perform both local and global transactions, the
following rules apply:

. The local transaction must be committed (or rolled back) before starting a
global transaction in the connection.
. The global transaction must be disassociated from the connection before any
local transaction is started.
--------------------------------------------------


(FWIW, jdbc_fdw would expect to use JTA for this FDW 2PC?)



> I haven’t tested the above yet and it’s just a desk plan. it's
> definitely a good idea to try integrating this 2PC feature to FDWs
> other than postgres_fdw to see if design and interfaces are
> implemented sophisticatedly.

Yes, if we address this 2PC feature as an FDW enhancement, we need to make sure that at least some well-known DBMSs should be able to implement the new interface.  The following part may help devise the interface:


[References from XA specification]
--------------------------------------------------
The primary use of xa_start() is to register a new transaction branch with the RM.
This marks the start of the branch. Subsequently, the AP, using the same thread of
control, uses the RM’s native interface to do useful work. All requests for service
made by the same thread are part of the same branch until the thread dissociates
from the branch (see below).

3.3.1 Registration of Resource Managers
Normally, a TM involves all associated RMs in a transaction branch. (The TM’s set of
RM switches, described in Section 4.3 on page 21 tells the TM which RMs are
associated with it.) The TM calls all these RMs with xa_start(), xa_end(), and
xa_prepare (), although an RM that is not active in a branch need not participate further
(see Section 2.3.2 on page 8). A technique to reduce overhead for infrequently-used
RMs is discussed below.

Dynamic Registration

Certain RMs, especially those involved in relatively few global transactions, may ask
the TM to assume they are not involved in a transaction. These RMs must register with
the TM before they do application work, to see whether the work is part of a global
transaction. The TM never calls these RMs with any form of xa_start(). An RM
declares dynamic registration in its switch (see Section 4.3 on page 21). An RM can
make this declaration only on its own behalf, and doing so does not change the TM’s
behaviour with respect to other RMs.

When an AP requests work from such an RM, before doing any work, the RM contacts
the TM by calling ax_reg(). The RM must call ax_reg() from the same thread of control
that the AP would use if it called ax_reg() directly. The TM returns to the RM the
appropriate XID if the AP is in a global transaction.

The implications of dynamically registering are as follows: when a thread of control
begins working on behalf of a transaction branch, the transaction manager calls
xa_start() for all resource managers known to the thread except those having
TMREGISTER set in their xa_switch_t structure. Thus, those resource managers with
this flag set must explicitly join a branch with ax_reg(). Secondly, when a thread of
control is working on behalf of a branch, a transaction manager calls xa_end() for all
resource managers known to the thread that either do not have TMREGISTER set in
their xa_switch_t structure or have dynamically registered with ax_reg().


int
xa_start(XID *xid, int rmid, long flags)

DESCRIPTION
A transaction manager calls xa_start() to inform a resource manager that an application
may do work on behalf of a transaction branch.
...
A transaction manager calls xa_start() only for those resource managers that do not
have TMREGISTER set in the flags element of their xa_switch_t structure. Resource
managers with TMREGISTER set must use ax_reg() to join a transaction branch (see
ax_reg() for details).
--------------------------------------------------


> > To track the foreign transaction status, we can add GetTransactionStatus() to
> the FDW interface as an alternative, can't we?
>
> I haven't thought such an interface but it sounds like the transaction
> status is managed on both the core and FDWs. Could you elaborate on
> that?

I don't have such deep analysis.  I just thought that the core could keep track of the local transaction status, and ask each participant FDW about its transaction status to determine an action.


> > If the WAL records of multiple concurrent transactions are written and
> synced separately, i.e. group commit doesn't take effect, then the OLTP
> transaction performance will be unacceptable.
>
> I agree that it'll be a large performance penalty. I'd like to have it
> but I’m not sure we should have it in the first version from the
> perspective of complexity.

I think at least we should have a rough image of how we can reach the goal.  Otherwise, the current design/implementation may have to be overhauled with great efforts in the near future.  Apart from that, I feel it's unnatural that the commit processing is serialized at the transaction resolver while the DML processing of multiple foreign transactions can be performed in parallel.


> Since the procedure of 2PC is originally
> high cost, in my opinion, the user should not use as much as possible
> in terms of performance. Especially in OLTP, its cost will directly
> affect the latency. I’d suggest designing database schema so
> transaction touches only one foreign server but do you have concrete
> OLTP usecase where normally requires 2PC, and how many servers
> involved within a distributed transaction?

I can't share the details, but some of our customers show interest in Postgres scale-out or FDW 2PC for the following use cases:

* Multitenant OLTP where the data specific to one tenant is stored on one database server.  On the other hand, some data are shared among all tenants, and they are stored on a separate server.  The shared data and the tenant-specific data is updated in the same transaction (I don't know the frequency of such transactions.)

* An IoT use case where each edge database server monitors and tracks the movement of objects in one area.  Those edge database servers store the records of objects they manage.  When an object gets out of one area and moves to another, the record for the object is moved between the two edge database servers using an atomic distributed transaction.

(I wonder if TPC-C or TPC-E needs distributed transaction...)


Regards
Takayuki Tsunakawa




Reply | Threaded
Open this post in threaded view
|

Re: Transactions involving multiple postgres foreign servers, take 2

Masahiko Sawada-2
In reply to this post by Fujii Masao-4
On Sat, 18 Jul 2020 at 01:45, Fujii Masao <[hidden email]> wrote:

>
>
>
> On 2020/07/17 20:04, Masahiko Sawada wrote:
> > On Fri, 17 Jul 2020 at 14:22, [hidden email]
> > <[hidden email]> wrote:
> >>
> >> From: Masahiko Sawada <[hidden email]>
> >> I have briefly checked the only oracle_fdw but in general I think that
> >>> if an existing FDW supports transaction begin, commit, and rollback,
> >>> these can be ported to new FDW transaction APIs easily.
> >>
> >> Does oracle_fdw support begin, commit and rollback?
> >>
> >> And most importantly, do other major DBMSs, including Oracle, provide the API for preparing a transaction?  In other words, will the FDWs other than postgres_fdw really be able to take advantage of the new FDW functions to join the 2PC processing?  I think we need to confirm that there are concrete examples.
> >
> > I also believe they do. But I'm concerned that some FDW needs to start
> > a transaction differently when using 2PC. For instance, IIUC MySQL
> > also supports 2PC but the transaction needs to be started with "XA
> > START id” when the transaction needs to be prepared. The transaction
> > started with XA START can be closed by XA END followed by XA PREPARE
> > or XA COMMIT ONE PHASE.
>
> This means that FDW should provide also the API for xa_end()?
> Maybe we need to consider again which API we should provide in FDW,
> based on XA specification?

Not sure that we really need the API for xa_end(). It's not necessary
at least in MySQL case. mysql_fdw can execute either XA END and XA
PREPARE when FDW prepare API is called or XA END and XA COMMIT ONE
PHASE when FDW commit API is called with FDWXACT_FLAG_ONEPHASE.

>
>
> > It means that when starts a new transaction
> > the transaction needs to prepare the transaction identifier and to
> > know that 2PC might be used. It’s quite different from PostgreSQL. In
> > PostgreSQL, we can start a transaction by BEGIN and end it by PREPARE
> > TRANSACTION, COMMIT, or ROLLBACK. The transaction identifier is
> > required when PREPARE TRANSACTION.
> >
> > With MySQL, I guess FDW needs a way to tell the (next) transaction
> > needs to be started with XA START so it can be prepared. It could be a
> > custom GUC or an SQL function. Then when starts a new transaction on
> > MySQL server, FDW can generate and store a transaction identifier into
> > somewhere alongside the connection. At the prepare phase, it passes
> > the transaction identifier via GetPrepareId() API to the core.
> >
> > I haven’t tested the above yet and it’s just a desk plan. it's
> > definitely a good idea to try integrating this 2PC feature to FDWs
> > other than postgres_fdw to see if design and interfaces are
> > implemented sophisticatedly.
>
> With the current patch, we track whether write queries are executed
> in each server. Then, if the number of servers that execute write queries
> is less than two, 2PC is skipped. This "optimization" is not necessary
> (cannot be applied) when using mysql_fdw because the transaction starts
> with XA START. Right?

I think we can use XA COMMIT ONE PHASE in MySQL, which both prepares
and commits the transaction. If the number of servers that executed
write queries is less than two, the core transaction manager calls
CommitForeignTransaction API with the flag FDWXACT_FLAG_ONEPHASE. That
way, mysql_fdw can execute XA COMMIT ONE PHASE instead of XA PREPARE,
following XA END. On the other hand, when the number of such servers
is greater than or equals to two, the core transaction manager calls
PrepareForeignTransaction API and then CommitForeignTransactionAPI
without that flag. In this case, mysql_fdw can execute XA END and XA
PREPARE in PrepareForeignTransaction API call, and then XA COMMIT in
CommitForeignTransaction API call.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Transactions involving multiple postgres foreign servers, take 2

Masahiko Sawada-2
In reply to this post by Fujii Masao-4
On Sat, 18 Jul 2020 at 01:55, Fujii Masao <[hidden email]> wrote:

>
>
>
> On 2020/07/16 14:47, Masahiko Sawada wrote:
> > On Tue, 14 Jul 2020 at 11:19, Fujii Masao <[hidden email]> wrote:
> >>
> >>
> >>
> >> On 2020/07/14 9:08, Masahiro Ikeda wrote:
> >>>> I've attached the latest version patches. I've incorporated the review
> >>>> comments I got so far and improved locking strategy.
> >>>
> >>> Thanks for updating the patch!
> >>
> >> +1
> >> I'm interested in these patches and now studying them. While checking
> >> the behaviors of the patched PostgreSQL, I got three comments.
> >
> > Thank you for testing this patch!
> >
> >>
> >> 1. We can access to the foreign table even during recovery in the HEAD.
> >> But in the patched version, when I did that, I got the following error.
> >> Is this intentional?
> >>
> >> ERROR:  cannot assign TransactionIds during recovery
> >
> > No, it should be fixed. I'm going to fix this by not collecting
> > participants for atomic commit during recovery.
>
> Thanks for trying to fix the issues!
>
> I'd like to report one more issue. When I started new transaction
> in the local server, executed INSERT in the remote server via
> postgres_fdw and then quit psql, I got the following assertion failure.
>
> TRAP: FailedAssertion("fdwxact", File: "fdwxact.c", Line: 1570)
> 0   postgres                            0x000000010d52f3c0 ExceptionalCondition + 160
> 1   postgres                            0x000000010cefbc49 ForgetAllFdwXactParticipants + 313
> 2   postgres                            0x000000010cefff14 AtProcExit_FdwXact + 20
> 3   postgres                            0x000000010d313fe3 shmem_exit + 179
> 4   postgres                            0x000000010d313e7a proc_exit_prepare + 122
> 5   postgres                            0x000000010d313da3 proc_exit + 19
> 6   postgres                            0x000000010d35112f PostgresMain + 3711
> 7   postgres                            0x000000010d27bb3a BackendRun + 570
> 8   postgres                            0x000000010d27af6b BackendStartup + 475
> 9   postgres                            0x000000010d279ed1 ServerLoop + 593
> 10  postgres                            0x000000010d277940 PostmasterMain + 6016
> 11  postgres                            0x000000010d1597b9 main + 761
> 12  libdyld.dylib                       0x00007fff7161e3d5 start + 1
> 13  ???                                 0x0000000000000003 0x0 + 3
>
Thank you for reporting the issue!

I've attached the latest version patch that incorporated all comments
I got so far. I've removed the patch adding the 'prefer' mode of
foreign_twophase_commit to keep the patch set simple.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

v24-0005-Add-regression-tests-for-foreign-twophase-commit.patch (62K) Download Attachment
v24-0001-Recreate-RemoveForeignServerById.patch (3K) Download Attachment
v24-0003-Documentation-update.patch (55K) Download Attachment
v24-0004-postgres_fdw-supports-atomic-commit-APIs.patch (63K) Download Attachment
v24-0002-Support-atomic-commit-among-multiple-foreign-ser.patch (257K) Download Attachment
123456