Re: Global snapshots

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
47 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Re: Global snapshots

Andrey V. Lepikhov
Rebased onto current master (fb544735f1).

--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company

0001-GlobalCSNLog-SLRU-v3.patch (24K) Download Attachment
0002-Global-snapshots-v3.patch (65K) Download Attachment
0003-postgres_fdw-support-for-global-snapshots-v3.patch (32K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Global snapshots

Fujii Masao-4


On 2020/05/12 19:24, Andrey Lepikhov wrote:
> Rebased onto current master (fb544735f1).

Thanks for the patches!

These patches are no longer applied cleanly and caused the compilation failure.
So could you rebase and update them?

The patches seem not to be registered in CommitFest yet.
Are you planning to do that?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION


Reply | Threaded
Open this post in threaded view
|

Re: Global snapshots

Andrey V. Lepikhov

On 09.06.2020 11:41, Fujii Masao wrote:

>
>
> On 2020/05/12 19:24, Andrey Lepikhov wrote:
>> Rebased onto current master (fb544735f1).
>
> Thanks for the patches!
>
> These patches are no longer applied cleanly and caused the compilation
> failure.
> So could you rebase and update them?
Rebased onto 57cb806308 (see attachment).
>
> The patches seem not to be registered in CommitFest yet.
> Are you planning to do that?
Not now. It is a sharding-related feature. I'm not sure that this
approach is fully consistent with the sharding way now.

--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com


0001-GlobalCSNLog-SLRU.patch (24K) Download Attachment
0002-Global-snapshots.patch (65K) Download Attachment
0003-postgres_fdw-support-for-global-snapshots.patch (32K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Global snapshots

akapila
On Wed, Jun 10, 2020 at 8:36 AM Andrey V. Lepikhov
<[hidden email]> wrote:

>
>
> On 09.06.2020 11:41, Fujii Masao wrote:
> >
> >
> > The patches seem not to be registered in CommitFest yet.
> > Are you planning to do that?
> Not now. It is a sharding-related feature. I'm not sure that this
> approach is fully consistent with the sharding way now.
>

Can you please explain in detail, why you think so?  There is no
commit message explaining what each patch does so it is difficult to
understand why you said so?  Also, can you let us know if this
supports 2PC in some way and if so how is it different from what the
other thread on the same topic [1] is trying to achieve?  Also, I
would like to know if the patch related to CSN based snapshot [2] is a
precursor for this, if not, then is it any way related to this patch
because I see the latest reply on that thread [2] which says it is an
infrastructure of sharding feature but I don't understand completely
whether these patches are related?

Basically, there seem to be three threads, first, this one and then
[1] and [2] which seems to be doing the work for sharding feature but
there is no clear explanation anywhere if these are anyway related or
whether combining all these three we are aiming for a solution for
atomic commit and atomic visibility.

I am not sure if you know answers to all these questions so I added
the people who seem to be working on the other two patches.  I am also
afraid that if there is any duplicate or conflicting work going on in
these threads so we should try to find that as well.


[1] - https://www.postgresql.org/message-id/CA%2Bfd4k4v%2BKdofMyN%2BjnOia8-7rto8tsh9Zs3dd7kncvHp12WYw%40mail.gmail.com
[2] - https://www.postgresql.org/message-id/2020061911294657960322%40highgo.ca

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Global snapshots

Andrey V. Lepikhov
On 6/19/20 11:48 AM, Amit Kapila wrote:

> On Wed, Jun 10, 2020 at 8:36 AM Andrey V. Lepikhov
> <[hidden email]> wrote:
>> On 09.06.2020 11:41, Fujii Masao wrote:
>>> The patches seem not to be registered in CommitFest yet.
>>> Are you planning to do that?
>> Not now. It is a sharding-related feature. I'm not sure that this
>> approach is fully consistent with the sharding way now.
> Can you please explain in detail, why you think so?  There is no
> commit message explaining what each patch does so it is difficult to
> understand why you said so?
For now I used this patch set for providing correct visibility in the
case of access to the table with foreign partitions from many nodes in
parallel. So I saw at this patch set as a sharding-related feature, but
[1] shows another useful application.
CSN-based approach has weak points such as:
1. Dependency on clocks synchronization
2. Needs guarantees of monotonically increasing of the CSN in the case
of an instance restart/crash etc.
3. We need to delay increasing of OldestXmin because it can be needed
for a transaction snapshot at another node.
So I do not have full conviction that it will be better than a single
distributed transaction manager.
   Also, can you let us know if this
> supports 2PC in some way and if so how is it different from what the
> other thread on the same topic [1] is trying to achieve?
Yes, the patch '0003-postgres_fdw-support-for-global-snapshots' contains
2PC machinery. Now I'd not judge which approach is better.
  Also, I
> would like to know if the patch related to CSN based snapshot [2] is a
> precursor for this, if not, then is it any way related to this patch
> because I see the latest reply on that thread [2] which says it is an
> infrastructure of sharding feature but I don't understand completely
> whether these patches are related?
I need some time to study this patch. At first sight it is different.
>
> Basically, there seem to be three threads, first, this one and then
> [1] and [2] which seems to be doing the work for sharding feature but
> there is no clear explanation anywhere if these are anyway related or
> whether combining all these three we are aiming for a solution for
> atomic commit and atomic visibility.
It can be useful to study all approaches.
>
> I am not sure if you know answers to all these questions so I added
> the people who seem to be working on the other two patches.  I am also
> afraid that if there is any duplicate or conflicting work going on in
> these threads so we should try to find that as well.
Ok
>
>
> [1] - https://www.postgresql.org/message-id/CA%2Bfd4k4v%2BKdofMyN%2BjnOia8-7rto8tsh9Zs3dd7kncvHp12WYw%40mail.gmail.com
> [2] - https://www.postgresql.org/message-id/2020061911294657960322%40highgo.ca
>

[1]
https://www.postgresql.org/message-id/flat/20200301083601.ews6hz5dduc3w2se%40alap3.anarazel.de

--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com


Reply | Threaded
Open this post in threaded view
|

Re: Global snapshots

movead.li@highgo.ca

>> would like to know if the patch related to CSN based snapshot [2] is a
>> precursor for this, if not, then is it any way related to this patch
>> because I see the latest reply on that thread [2] which says it is an
>> infrastructure of sharding feature but I don't understand completely
>> whether these patches are related?
>I need some time to study this patch.. At first sight it is different.

This patch[2] is almost base on [3], because I think [1] is talking about 2PC
and FDW, so this patch focus on CSN only and I detach the global snapshot
part and FDW part from the [1] patch. 

I notice CSN will not survival after a restart in [1] patch, I think it may not the
right way, may be it is what in last mail "Needs guarantees of monotonically
increasing of the CSN in the case of an instance restart/crash etc" so I try to
add wal support for CSN on this patch.

That's why this thread exist.


Regards,
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca
Reply | Threaded
Open this post in threaded view
|

Re: Global snapshots

Bruce Momjian
On Fri, Jun 19, 2020 at 05:03:20PM +0800, [hidden email] wrote:

>
> >> would like to know if the patch related to CSN based snapshot [2] is a
> >> precursor for this, if not, then is it any way related to this patch
> >> because I see the latest reply on that thread [2] which says it is an
> >> infrastructure of sharding feature but I don't understand completely
> >> whether these patches are related?
> >I need some time to study this patch.. At first sight it is different.
>
> This patch[2] is almost base on [3], because I think [1] is talking about 2PC
> and FDW, so this patch focus on CSN only and I detach the global snapshot
> part and FDW part from the [1] patch.
>
> I notice CSN will not survival after a restart in [1] patch, I think it may not
> the
> right way, may be it is what in last mail "Needs guarantees of monotonically
> increasing of the CSN in the case of an instance restart/crash etc" so I try to
> add wal support for CSN on this patch.
>
> That's why this thread exist.

I was certainly missing how these items fit together.  Sharding needs
parallel FDWs, atomic commits, and atomic snapshots.  To get atomic
snapshots, we need CSN.  This new sharding wiki pages has more details:

        https://wiki.postgresql.org/wiki/WIP_PostgreSQL_Sharding

After all that is done, we will need optimizer improvements and shard
management tooling.

--
  Bruce Momjian  <[hidden email]>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee



Reply | Threaded
Open this post in threaded view
|

Re: Global snapshots

akapila
In reply to this post by Andrey V. Lepikhov
On Fri, Jun 19, 2020 at 1:42 PM Andrey V. Lepikhov
<[hidden email]> wrote:

>
> On 6/19/20 11:48 AM, Amit Kapila wrote:
> > On Wed, Jun 10, 2020 at 8:36 AM Andrey V. Lepikhov
> > <[hidden email]> wrote:
> >> On 09.06.2020 11:41, Fujii Masao wrote:
> >>> The patches seem not to be registered in CommitFest yet.
> >>> Are you planning to do that?
> >> Not now. It is a sharding-related feature. I'm not sure that this
> >> approach is fully consistent with the sharding way now.
> > Can you please explain in detail, why you think so?  There is no
> > commit message explaining what each patch does so it is difficult to
> > understand why you said so?
> For now I used this patch set for providing correct visibility in the
> case of access to the table with foreign partitions from many nodes in
> parallel. So I saw at this patch set as a sharding-related feature, but
> [1] shows another useful application.
> CSN-based approach has weak points such as:
> 1. Dependency on clocks synchronization
> 2. Needs guarantees of monotonically increasing of the CSN in the case
> of an instance restart/crash etc.
> 3. We need to delay increasing of OldestXmin because it can be needed
> for a transaction snapshot at another node.
>

So, is anyone working on improving these parts of the patch.  AFAICS
from what Bruce has shared [1], some people from HighGo are working on
it but I don't see any discussion of that yet.

> So I do not have full conviction that it will be better than a single
> distributed transaction manager.
>

When you say "single distributed transaction manager"  do you mean
something like pg_dtm which is inspired by Postgres-XL?

>    Also, can you let us know if this
> > supports 2PC in some way and if so how is it different from what the
> > other thread on the same topic [1] is trying to achieve?
> Yes, the patch '0003-postgres_fdw-support-for-global-snapshots' contains
> 2PC machinery. Now I'd not judge which approach is better.
>

Yeah, I have studied both the approaches a little and I feel the main
difference seems to be that in this patch atomicity is tightly coupled
with how we achieve global visibility, basically in this patch "all
running transactions are marked as InDoubt on all nodes in prepare
phase, and after that, each node commit it and stamps each xid with a
given GlobalCSN.".  There are no separate APIs for
prepare/commit/rollback exposed by postgres_fdw as we do it in the
approach followed by Sawada-San's patch.  It seems to me in the patch
in this email one of postgres_fdw node can be a sort of coordinator
which prepares and commit the transaction on all other nodes whereas
that is not true in Sawada-San's patch (where the coordinator is a
local Postgres node, am I right Sawada-San?).  OTOH, Sawada-San's
patch has advanced concepts like a resolver process that can
commit/abort the transactions later.  I couldn't still get a complete
grip of both patches so difficult to say which is better approach but
I think at the least we should have some discussion.

I feel if Sawada-San or someone involved in another patch also once
studies this approach and try to come up with some form of comparison
then we might be able to make better decision.  It is possible that
there are few good things in each approach which we can use.

>   Also, I
> > would like to know if the patch related to CSN based snapshot [2] is a
> > precursor for this, if not, then is it any way related to this patch
> > because I see the latest reply on that thread [2] which says it is an
> > infrastructure of sharding feature but I don't understand completely
> > whether these patches are related?
> I need some time to study this patch. At first sight it is different.
>

I feel the opposite.  I think it has extracted some stuff from this
patch series and extended the same.

Thanks for the inputs.  I feel inputs from you and others who were
involved in this project will be really helpful to move this project
forward.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Global snapshots

akapila
On Sat, Jun 20, 2020 at 5:51 PM Amit Kapila <[hidden email]> wrote:
>
>
> So, is anyone working on improving these parts of the patch.  AFAICS
> from what Bruce has shared [1],
>

oops, forgot to share the link [1] -
https://wiki.postgresql.org/wiki/WIP_PostgreSQL_Sharding

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Global snapshots

akapila
In reply to this post by Bruce Momjian
On Fri, Jun 19, 2020 at 6:33 PM Bruce Momjian <[hidden email]> wrote:

>
> On Fri, Jun 19, 2020 at 05:03:20PM +0800, [hidden email] wrote:
> >
> > >> would like to know if the patch related to CSN based snapshot [2] is a
> > >> precursor for this, if not, then is it any way related to this patch
> > >> because I see the latest reply on that thread [2] which says it is an
> > >> infrastructure of sharding feature but I don't understand completely
> > >> whether these patches are related?
> > >I need some time to study this patch.. At first sight it is different.
> >
> > This patch[2] is almost base on [3], because I think [1] is talking about 2PC
> > and FDW, so this patch focus on CSN only and I detach the global snapshot
> > part and FDW part from the [1] patch.
> >
> > I notice CSN will not survival after a restart in [1] patch, I think it may not
> > the
> > right way, may be it is what in last mail "Needs guarantees of monotonically
> > increasing of the CSN in the case of an instance restart/crash etc" so I try to
> > add wal support for CSN on this patch.
> >
> > That's why this thread exist.
>
> I was certainly missing how these items fit together.  Sharding needs
> parallel FDWs, atomic commits, and atomic snapshots.  To get atomic
> snapshots, we need CSN.  This new sharding wiki pages has more details:
>
>         https://wiki.postgresql.org/wiki/WIP_PostgreSQL_Sharding
>

Thanks for maintaining this page.  It is quite helpful!

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Global snapshots

Bruce Momjian
On Sat, Jun 20, 2020 at 05:54:18PM +0530, Amit Kapila wrote:

> On Fri, Jun 19, 2020 at 6:33 PM Bruce Momjian <[hidden email]> wrote:
> >
> > On Fri, Jun 19, 2020 at 05:03:20PM +0800, [hidden email] wrote:
> > >
> > > >> would like to know if the patch related to CSN based snapshot [2] is a
> > > >> precursor for this, if not, then is it any way related to this patch
> > > >> because I see the latest reply on that thread [2] which says it is an
> > > >> infrastructure of sharding feature but I don't understand completely
> > > >> whether these patches are related?
> > > >I need some time to study this patch.. At first sight it is different.
> > >
> > > This patch[2] is almost base on [3], because I think [1] is talking about 2PC
> > > and FDW, so this patch focus on CSN only and I detach the global snapshot
> > > part and FDW part from the [1] patch.
> > >
> > > I notice CSN will not survival after a restart in [1] patch, I think it may not
> > > the
> > > right way, may be it is what in last mail "Needs guarantees of monotonically
> > > increasing of the CSN in the case of an instance restart/crash etc" so I try to
> > > add wal support for CSN on this patch.
> > >
> > > That's why this thread exist.
> >
> > I was certainly missing how these items fit together.  Sharding needs
> > parallel FDWs, atomic commits, and atomic snapshots.  To get atomic
> > snapshots, we need CSN.  This new sharding wiki pages has more details:
> >
> >         https://wiki.postgresql.org/wiki/WIP_PostgreSQL_Sharding
> >
>
> Thanks for maintaining this page.  It is quite helpful!

Ahsan Hadi <[hidden email]> created that page, and I just made a
few wording edits.  Ahsan is copying information from this older
sharding wiki page:

        https://wiki.postgresql.org/wiki/Built-in_Sharding

to the new one you listed above.

--
  Bruce Momjian  <[hidden email]>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee



Reply | Threaded
Open this post in threaded view
|

Re: Global snapshots

Bruce Momjian
In reply to this post by akapila
On Sat, Jun 20, 2020 at 05:51:21PM +0530, Amit Kapila wrote:
> I feel if Sawada-San or someone involved in another patch also once
> studies this approach and try to come up with some form of comparison
> then we might be able to make better decision.  It is possible that
> there are few good things in each approach which we can use.

Agreed. Postgres-XL code is under the Postgres license:

        Postgres-XL is released under the PostgreSQL License, a liberal Open
        Source license, similar to the BSD or MIT licenses.

and even says they want it moved into Postgres core:

        https://www.postgres-xl.org/2017/08/postgres-xl-9-5-r1-6-announced/

        Postgres-XL is a massively parallel database built on top of,
        and very closely compatible with PostgreSQL 9.5 and its set of advanced
        features. Postgres-XL is fully open source and many parts of it will
        feed back directly or indirectly into later releases of PostgreSQL, as
        we begin to move towards a fully parallel sharded version of core PostgreSQL.

so we should understand what can be used from it.

--
  Bruce Momjian  <[hidden email]>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee



Reply | Threaded
Open this post in threaded view
|

Re: Global snapshots

akapila
On Mon, Jun 22, 2020 at 8:36 PM Bruce Momjian <[hidden email]> wrote:

>
> On Sat, Jun 20, 2020 at 05:51:21PM +0530, Amit Kapila wrote:
> > I feel if Sawada-San or someone involved in another patch also once
> > studies this approach and try to come up with some form of comparison
> > then we might be able to make better decision.  It is possible that
> > there are few good things in each approach which we can use.
>
> Agreed. Postgres-XL code is under the Postgres license:
>
>         Postgres-XL is released under the PostgreSQL License, a liberal Open
>         Source license, similar to the BSD or MIT licenses.
>
> and even says they want it moved into Postgres core:
>
>         https://www.postgres-xl.org/2017/08/postgres-xl-9-5-r1-6-announced/
>
>         Postgres-XL is a massively parallel database built on top of,
>         and very closely compatible with PostgreSQL 9.5 and its set of advanced
>         features. Postgres-XL is fully open source and many parts of it will
>         feed back directly or indirectly into later releases of PostgreSQL, as
>         we begin to move towards a fully parallel sharded version of core PostgreSQL.
>
> so we should understand what can be used from it.
>

+1.  I think that will be quite useful.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Global snapshots

Masahiko Sawada-2
In reply to this post by akapila
On Sat, 20 Jun 2020 at 21:21, Amit Kapila <[hidden email]> wrote:

>
> On Fri, Jun 19, 2020 at 1:42 PM Andrey V. Lepikhov
> <[hidden email]> wrote:
> >
> > On 6/19/20 11:48 AM, Amit Kapila wrote:
> > > On Wed, Jun 10, 2020 at 8:36 AM Andrey V. Lepikhov
> > > <[hidden email]> wrote:
> > >> On 09.06.2020 11:41, Fujii Masao wrote:
> > >>> The patches seem not to be registered in CommitFest yet.
> > >>> Are you planning to do that?
> > >> Not now. It is a sharding-related feature. I'm not sure that this
> > >> approach is fully consistent with the sharding way now.
> > > Can you please explain in detail, why you think so?  There is no
> > > commit message explaining what each patch does so it is difficult to
> > > understand why you said so?
> > For now I used this patch set for providing correct visibility in the
> > case of access to the table with foreign partitions from many nodes in
> > parallel. So I saw at this patch set as a sharding-related feature, but
> > [1] shows another useful application.
> > CSN-based approach has weak points such as:
> > 1. Dependency on clocks synchronization
> > 2. Needs guarantees of monotonically increasing of the CSN in the case
> > of an instance restart/crash etc.
> > 3. We need to delay increasing of OldestXmin because it can be needed
> > for a transaction snapshot at another node.
> >
>
> So, is anyone working on improving these parts of the patch.  AFAICS
> from what Bruce has shared [1], some people from HighGo are working on
> it but I don't see any discussion of that yet.
>
> > So I do not have full conviction that it will be better than a single
> > distributed transaction manager.
> >
>
> When you say "single distributed transaction manager"  do you mean
> something like pg_dtm which is inspired by Postgres-XL?
>
> >    Also, can you let us know if this
> > > supports 2PC in some way and if so how is it different from what the
> > > other thread on the same topic [1] is trying to achieve?
> > Yes, the patch '0003-postgres_fdw-support-for-global-snapshots' contains
> > 2PC machinery. Now I'd not judge which approach is better.
> >
>

Sorry for being late.

> Yeah, I have studied both the approaches a little and I feel the main
> difference seems to be that in this patch atomicity is tightly coupled
> with how we achieve global visibility, basically in this patch "all
> running transactions are marked as InDoubt on all nodes in prepare
> phase, and after that, each node commit it and stamps each xid with a
> given GlobalCSN.".  There are no separate APIs for
> prepare/commit/rollback exposed by postgres_fdw as we do it in the
> approach followed by Sawada-San's patch.  It seems to me in the patch
> in this email one of postgres_fdw node can be a sort of coordinator
> which prepares and commit the transaction on all other nodes whereas
> that is not true in Sawada-San's patch (where the coordinator is a
> local Postgres node, am I right Sawada-San?).

Yeah, where to manage foreign transactions is different: postgres_fdw
manages foreign transactions in this patch whereas the PostgreSQL core
does that in that 2PC patch.

>
> I feel if Sawada-San or someone involved in another patch also once
> studies this approach and try to come up with some form of comparison
> then we might be able to make better decision.  It is possible that
> there are few good things in each approach which we can use.
>

I studied this patch and did a simple comparison between this patch
(0002 patch) and my 2PC patch.

In terms of atomic commit, the features that are not implemented in
this patch but in the 2PC patch are:

* Crash safe.
* PREPARE TRANSACTION command support.
* Query cancel during waiting for the commit.
* Automatically in-doubt transaction resolution.

On the other hand, the feature that is implemented in this patch but
not in the 2PC patch is:

* Executing PREPARE TRANSACTION (and other commands) in parallel

When the 2PC patch was proposed, IIRC it was like this patch (0002
patch). I mean, it changed only postgres_fdw to support 2PC. But after
discussion, we changed the approach to have the core manage foreign
transaction for crash-safe. From my perspective, this patch has a
minimum implementation of 2PC to work the global snapshot feature and
has some missing features important for supporting crash-safe atomic
commit. So I personally think we should consider how to integrate this
global snapshot feature with the 2PC patch, rather than improving this
patch if we want crash-safe atomic commit.

Looking at the commit procedure with this patch:

When starting a new transaction on a foreign server, postgres_fdw
executes pg_global_snapshot_import() to import the global snapshot.
After some work, in pre-commit phase we do:

1. generate global transaction id, say 'gid'
2. execute PREPARE TRANSACTION 'gid' on all participants.
3. prepare global snapshot locally, if the local node also involves
the transaction
4. execute pg_global_snapshot_prepare('gid') for all participants

During step 2 to 4, we calculate the maximum CSN from the CSNs
returned from each pg_global_snapshot_prepare() executions.

5. assign global snapshot locally, if the local node also involves the
transaction
6. execute pg_global_snapshot_assign('gid', max-csn) on all participants.

Then, we commit locally (i.g. mark the current transaction as
committed in clog).

After that, in post-commit phase, execute COMMIT PREPARED 'gid' on all
participants.

Considering how to integrate this global snapshot feature with the 2PC
patch, what the 2PC patch needs to at least change is to allow FDW to
store an FDW-private data that is passed to subsequent FDW transaction
API calls. Currently, in the current 2PC patch, we call Prepare API
for each participant servers one by one, and the core pass only
metadata such as ForeignServer, UserMapping, and global transaction
identifier. So it's not easy to calculate the maximum CSN across
multiple transaction API calls. I think we can change the 2PC patch to
add a void pointer into FdwXactRslvState, struct passed from the core,
in order to store FDW-private data. It's going to be the maximum CSN
in this case. That way, at the first Prepare API calls postgres_fdw
allocates the space and stores CSN to that space. And at subsequent
Prepare API calls it can calculate the maximum of csn, and then is
able to the step 3 to 6 when preparing the transaction on the last
participant. Another idea would be to change 2PC patch so that the
core passes a bunch of participants grouped by FDW.

I’ve not read this patch deeply yet and have considered it without any
coding but my first feeling is not hard to integrate this feature with
the 2PC patch.

Regards,


--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Global snapshots

akapila
On Fri, Jul 3, 2020 at 12:18 PM Masahiko Sawada
<[hidden email]> wrote:

>
> On Sat, 20 Jun 2020 at 21:21, Amit Kapila <[hidden email]> wrote:
> >
> > On Fri, Jun 19, 2020 at 1:42 PM Andrey V. Lepikhov
> > <[hidden email]> wrote:
> >
> > >    Also, can you let us know if this
> > > > supports 2PC in some way and if so how is it different from what the
> > > > other thread on the same topic [1] is trying to achieve?
> > > Yes, the patch '0003-postgres_fdw-support-for-global-snapshots' contains
> > > 2PC machinery. Now I'd not judge which approach is better.
> > >
> >
>
> Sorry for being late.
>

No problem, your summarization, and comparisons of both approaches are
quite helpful.

>
> I studied this patch and did a simple comparison between this patch
> (0002 patch) and my 2PC patch.
>
> In terms of atomic commit, the features that are not implemented in
> this patch but in the 2PC patch are:
>
> * Crash safe.
> * PREPARE TRANSACTION command support.
> * Query cancel during waiting for the commit.
> * Automatically in-doubt transaction resolution.
>
> On the other hand, the feature that is implemented in this patch but
> not in the 2PC patch is:
>
> * Executing PREPARE TRANSACTION (and other commands) in parallel
>
> When the 2PC patch was proposed, IIRC it was like this patch (0002
> patch). I mean, it changed only postgres_fdw to support 2PC. But after
> discussion, we changed the approach to have the core manage foreign
> transaction for crash-safe. From my perspective, this patch has a
> minimum implementation of 2PC to work the global snapshot feature and
> has some missing features important for supporting crash-safe atomic
> commit. So I personally think we should consider how to integrate this
> global snapshot feature with the 2PC patch, rather than improving this
> patch if we want crash-safe atomic commit.
>

Okay, but isn't there some advantage with this approach (manage 2PC at
postgres_fdw level) as well which is that any node will be capable of
handling global transactions rather than doing them via central
coordinator?  I mean any node can do writes or reads rather than
probably routing them (at least writes) via coordinator node.  Now, I
agree that even if this advantage is there in the current approach, we
can't lose the crash-safety aspect of other approach.  Will you be
able to summarize what was the problem w.r.t crash-safety and how your
patch has dealt it?

> Looking at the commit procedure with this patch:
>
> When starting a new transaction on a foreign server, postgres_fdw
> executes pg_global_snapshot_import() to import the global snapshot.
> After some work, in pre-commit phase we do:
>
> 1. generate global transaction id, say 'gid'
> 2. execute PREPARE TRANSACTION 'gid' on all participants.
> 3. prepare global snapshot locally, if the local node also involves
> the transaction
> 4. execute pg_global_snapshot_prepare('gid') for all participants
>
> During step 2 to 4, we calculate the maximum CSN from the CSNs
> returned from each pg_global_snapshot_prepare() executions.
>
> 5. assign global snapshot locally, if the local node also involves the
> transaction
> 6. execute pg_global_snapshot_assign('gid', max-csn) on all participants.
>
> Then, we commit locally (i.g. mark the current transaction as
> committed in clog).
>
> After that, in post-commit phase, execute COMMIT PREPARED 'gid' on all
> participants.
>

As per my current understanding, the overall idea is as follows.  For
global transactions, pg_global_snapshot_prepare('gid') will set the
transaction status as InDoubt and generate CSN (let's call it NodeCSN)
at the node where that function is executed, it also returns the
NodeCSN to the coordinator.  Then the coordinator (the current
postgres_fdw node on which write transaction is being executed)
computes MaxCSN based on the return value (NodeCSN) of prepare
(pg_global_snapshot_prepare) from all nodes.  It then assigns MaxCSN
to each node.  Finally, when Commit Prepared is issued for each node
that MaxCSN will be written to each node including the current node.
So, with this idea, each node will have the same view of CSN value
corresponding to any particular transaction.

For Snapshot management, the node which receives the query generates a
CSN (CurrentCSN) and follows the simple rule that the tuple having a
xid with CSN lesser than CurrentCSN will be visible.  Now, it is
possible that when we are examining a tuple, the CSN corresponding to
xid that has written the tuple has a value as INDOUBT which will
indicate that the transaction is yet not committed on all nodes.  And
we wait till we get the valid CSN value corresponding to xid and then
use it to check if the tuple is visible.

Now, one thing to note here is that for global transactions we
primarily rely on CSN value corresponding to a transaction for its
visibility even though we still maintain CLOG for local transaction
status.

Leaving aside the incomplete parts and or flaws of the current patch,
does the above match the top-level idea of this patch?  I am not sure
if my understanding of this patch at this stage is completely correct
or whether we want to follow the approach of this patch but I think at
least lets first be sure if such a top-level idea can achieve what we
want to do here.

> Considering how to integrate this global snapshot feature with the 2PC
> patch, what the 2PC patch needs to at least change is to allow FDW to
> store an FDW-private data that is passed to subsequent FDW transaction
> API calls. Currently, in the current 2PC patch, we call Prepare API
> for each participant servers one by one, and the core pass only
> metadata such as ForeignServer, UserMapping, and global transaction
> identifier. So it's not easy to calculate the maximum CSN across
> multiple transaction API calls. I think we can change the 2PC patch to
> add a void pointer into FdwXactRslvState, struct passed from the core,
> in order to store FDW-private data. It's going to be the maximum CSN
> in this case. That way, at the first Prepare API calls postgres_fdw
> allocates the space and stores CSN to that space. And at subsequent
> Prepare API calls it can calculate the maximum of csn, and then is
> able to the step 3 to 6 when preparing the transaction on the last
> participant. Another idea would be to change 2PC patch so that the
> core passes a bunch of participants grouped by FDW.
>

IIUC with this the coordinator needs the communication with the nodes
twice at the prepare stage, once to prepare the transaction in each
node and get CSN from each node and then to communicate MaxCSN to each
node?  Also, we probably need InDoubt CSN status at prepare phase to
make snapshots and global visibility work.

> I’ve not read this patch deeply yet and have considered it without any
> coding but my first feeling is not hard to integrate this feature with
> the 2PC patch.
>

Okay.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Global snapshots

Masahiko Sawada-2
On Tue, 7 Jul 2020 at 15:40, Amit Kapila <[hidden email]> wrote:

>
> On Fri, Jul 3, 2020 at 12:18 PM Masahiko Sawada
> <[hidden email]> wrote:
> >
> > On Sat, 20 Jun 2020 at 21:21, Amit Kapila <[hidden email]> wrote:
> > >
> > > On Fri, Jun 19, 2020 at 1:42 PM Andrey V. Lepikhov
> > > <[hidden email]> wrote:
> > >
> > > >    Also, can you let us know if this
> > > > > supports 2PC in some way and if so how is it different from what the
> > > > > other thread on the same topic [1] is trying to achieve?
> > > > Yes, the patch '0003-postgres_fdw-support-for-global-snapshots' contains
> > > > 2PC machinery. Now I'd not judge which approach is better.
> > > >
> > >
> >
> > Sorry for being late.
> >
>
> No problem, your summarization, and comparisons of both approaches are
> quite helpful.
>
> >
> > I studied this patch and did a simple comparison between this patch
> > (0002 patch) and my 2PC patch.
> >
> > In terms of atomic commit, the features that are not implemented in
> > this patch but in the 2PC patch are:
> >
> > * Crash safe.
> > * PREPARE TRANSACTION command support.
> > * Query cancel during waiting for the commit.
> > * Automatically in-doubt transaction resolution.
> >
> > On the other hand, the feature that is implemented in this patch but
> > not in the 2PC patch is:
> >
> > * Executing PREPARE TRANSACTION (and other commands) in parallel
> >
> > When the 2PC patch was proposed, IIRC it was like this patch (0002
> > patch). I mean, it changed only postgres_fdw to support 2PC. But after
> > discussion, we changed the approach to have the core manage foreign
> > transaction for crash-safe. From my perspective, this patch has a
> > minimum implementation of 2PC to work the global snapshot feature and
> > has some missing features important for supporting crash-safe atomic
> > commit. So I personally think we should consider how to integrate this
> > global snapshot feature with the 2PC patch, rather than improving this
> > patch if we want crash-safe atomic commit.
> >
>
> Okay, but isn't there some advantage with this approach (manage 2PC at
> postgres_fdw level) as well which is that any node will be capable of
> handling global transactions rather than doing them via central
> coordinator?  I mean any node can do writes or reads rather than
> probably routing them (at least writes) via coordinator node.

The postgres server where the client started the transaction works as
the coordinator node. I think this is true for both this patch and
that 2PC patch. From the perspective of atomic commit, any node will
be capable of handling global transactions in both approaches.

>  Now, I
> agree that even if this advantage is there in the current approach, we
> can't lose the crash-safety aspect of other approach.  Will you be
> able to summarize what was the problem w.r.t crash-safety and how your
> patch has dealt it?

Since this patch proceeds 2PC without any logging, foreign
transactions prepared on foreign servers are left over without any
clues if the coordinator crashes during commit. Therefore, after
restart, the user will need to find and resolve in-doubt foreign
transactions manually.

In that 2PC patch, the information of foreign transactions is WAL
logged before PREPARE TRANSACTION. So even if the coordinator crashes
after preparing some foreign transactions, the prepared foreign
transactions are recovered during crash recovery, and then the
transaction resolver resolves them automatically or the user also can
resolve them. The user doesn't need to check other participants node
to resolve in-doubt foreign transactions. Also, since the foreign
transaction information is replicated to physical standbys the new
master can take over resolving in-doubt transactions.

>
> > Looking at the commit procedure with this patch:
> >
> > When starting a new transaction on a foreign server, postgres_fdw
> > executes pg_global_snapshot_import() to import the global snapshot.
> > After some work, in pre-commit phase we do:
> >
> > 1. generate global transaction id, say 'gid'
> > 2. execute PREPARE TRANSACTION 'gid' on all participants.
> > 3. prepare global snapshot locally, if the local node also involves
> > the transaction
> > 4. execute pg_global_snapshot_prepare('gid') for all participants
> >
> > During step 2 to 4, we calculate the maximum CSN from the CSNs
> > returned from each pg_global_snapshot_prepare() executions.
> >
> > 5. assign global snapshot locally, if the local node also involves the
> > transaction
> > 6. execute pg_global_snapshot_assign('gid', max-csn) on all participants.
> >
> > Then, we commit locally (i.g. mark the current transaction as
> > committed in clog).
> >
> > After that, in post-commit phase, execute COMMIT PREPARED 'gid' on all
> > participants.
> >
>
> As per my current understanding, the overall idea is as follows.  For
> global transactions, pg_global_snapshot_prepare('gid') will set the
> transaction status as InDoubt and generate CSN (let's call it NodeCSN)
> at the node where that function is executed, it also returns the
> NodeCSN to the coordinator.  Then the coordinator (the current
> postgres_fdw node on which write transaction is being executed)
> computes MaxCSN based on the return value (NodeCSN) of prepare
> (pg_global_snapshot_prepare) from all nodes.  It then assigns MaxCSN
> to each node.  Finally, when Commit Prepared is issued for each node
> that MaxCSN will be written to each node including the current node.
> So, with this idea, each node will have the same view of CSN value
> corresponding to any particular transaction.
>
> For Snapshot management, the node which receives the query generates a
> CSN (CurrentCSN) and follows the simple rule that the tuple having a
> xid with CSN lesser than CurrentCSN will be visible.  Now, it is
> possible that when we are examining a tuple, the CSN corresponding to
> xid that has written the tuple has a value as INDOUBT which will
> indicate that the transaction is yet not committed on all nodes.  And
> we wait till we get the valid CSN value corresponding to xid and then
> use it to check if the tuple is visible.
>
> Now, one thing to note here is that for global transactions we
> primarily rely on CSN value corresponding to a transaction for its
> visibility even though we still maintain CLOG for local transaction
> status.
>
> Leaving aside the incomplete parts and or flaws of the current patch,
> does the above match the top-level idea of this patch?

I'm still studying this patch but your understanding seems right to me.

> I am not sure
> if my understanding of this patch at this stage is completely correct
> or whether we want to follow the approach of this patch but I think at
> least lets first be sure if such a top-level idea can achieve what we
> want to do here.
>
> > Considering how to integrate this global snapshot feature with the 2PC
> > patch, what the 2PC patch needs to at least change is to allow FDW to
> > store an FDW-private data that is passed to subsequent FDW transaction
> > API calls. Currently, in the current 2PC patch, we call Prepare API
> > for each participant servers one by one, and the core pass only
> > metadata such as ForeignServer, UserMapping, and global transaction
> > identifier. So it's not easy to calculate the maximum CSN across
> > multiple transaction API calls. I think we can change the 2PC patch to
> > add a void pointer into FdwXactRslvState, struct passed from the core,
> > in order to store FDW-private data. It's going to be the maximum CSN
> > in this case. That way, at the first Prepare API calls postgres_fdw
> > allocates the space and stores CSN to that space. And at subsequent
> > Prepare API calls it can calculate the maximum of csn, and then is
> > able to the step 3 to 6 when preparing the transaction on the last
> > participant. Another idea would be to change 2PC patch so that the
> > core passes a bunch of participants grouped by FDW.
> >
>
> IIUC with this the coordinator needs the communication with the nodes
> twice at the prepare stage, once to prepare the transaction in each
> node and get CSN from each node and then to communicate MaxCSN to each
> node?

Yes, I think so too.

> Also, we probably need InDoubt CSN status at prepare phase to
> make snapshots and global visibility work.

I think it depends on how global CSN feature works.

For instance, in that 2PC patch, if the coordinator crashes during
preparing a foreign transaction, the global transaction manager
recovers and regards it as "prepared" regardless of the foreign
transaction actually having been prepared. And it sends ROLLBACK
PREPARED after recovery completed. With global CSN patch, as you
mentioned, at prepare phase the coordinator needs to communicate
participants twice other than sending PREPARE TRANSACTION:
pg_global_snapshot_prepare() and pg_global_snapshot_assign().

If global CSN patch needs different cleanup work depending on the CSN
status, we will need InDoubt CSN status so that the global transaction
manager can distinguish between a foreign transaction that has
executed pg_global_snapshot_prepare() and the one that has executed
pg_global_snapshot_assign().

On the other hand, if it's enough to just send ROLLBACK or ROLLBACK
PREPARED in that case, I think we don't need InDoubt CSN status. There
is no difference between those foreign transactions from the global
transaction manager perspective.

As far as I read the patch, on failure postgres_fdw simply send
ROLLBACK PREPARED to participants, and there seems no additional work
other than that. I might be missing something.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Global snapshots

akapila
On Wed, Jul 8, 2020 at 11:16 AM Masahiko Sawada
<[hidden email]> wrote:

>
> On Tue, 7 Jul 2020 at 15:40, Amit Kapila <[hidden email]> wrote:
> >
> >
> > Okay, but isn't there some advantage with this approach (manage 2PC at
> > postgres_fdw level) as well which is that any node will be capable of
> > handling global transactions rather than doing them via central
> > coordinator?  I mean any node can do writes or reads rather than
> > probably routing them (at least writes) via coordinator node.
>
> The postgres server where the client started the transaction works as
> the coordinator node. I think this is true for both this patch and
> that 2PC patch. From the perspective of atomic commit, any node will
> be capable of handling global transactions in both approaches.
>

Okay, but then probably we need to ensure that GID has to be unique
even if that gets generated on different nodes?  I don't know if that
is ensured.

> >  Now, I
> > agree that even if this advantage is there in the current approach, we
> > can't lose the crash-safety aspect of other approach.  Will you be
> > able to summarize what was the problem w.r.t crash-safety and how your
> > patch has dealt it?
>
> Since this patch proceeds 2PC without any logging, foreign
> transactions prepared on foreign servers are left over without any
> clues if the coordinator crashes during commit. Therefore, after
> restart, the user will need to find and resolve in-doubt foreign
> transactions manually.
>

Okay, but is it because we can't directly WAL log in postgres_fdw or
there is some other reason for not doing so?

>
> >
> > > Looking at the commit procedure with this patch:
> > >
> > > When starting a new transaction on a foreign server, postgres_fdw
> > > executes pg_global_snapshot_import() to import the global snapshot.
> > > After some work, in pre-commit phase we do:
> > >
> > > 1. generate global transaction id, say 'gid'
> > > 2. execute PREPARE TRANSACTION 'gid' on all participants.
> > > 3. prepare global snapshot locally, if the local node also involves
> > > the transaction
> > > 4. execute pg_global_snapshot_prepare('gid') for all participants
> > >
> > > During step 2 to 4, we calculate the maximum CSN from the CSNs
> > > returned from each pg_global_snapshot_prepare() executions.
> > >
> > > 5. assign global snapshot locally, if the local node also involves the
> > > transaction
> > > 6. execute pg_global_snapshot_assign('gid', max-csn) on all participants.
> > >
> > > Then, we commit locally (i.g. mark the current transaction as
> > > committed in clog).
> > >
> > > After that, in post-commit phase, execute COMMIT PREPARED 'gid' on all
> > > participants.
> > >
> >
> > As per my current understanding, the overall idea is as follows.  For
> > global transactions, pg_global_snapshot_prepare('gid') will set the
> > transaction status as InDoubt and generate CSN (let's call it NodeCSN)
> > at the node where that function is executed, it also returns the
> > NodeCSN to the coordinator.  Then the coordinator (the current
> > postgres_fdw node on which write transaction is being executed)
> > computes MaxCSN based on the return value (NodeCSN) of prepare
> > (pg_global_snapshot_prepare) from all nodes.  It then assigns MaxCSN
> > to each node.  Finally, when Commit Prepared is issued for each node
> > that MaxCSN will be written to each node including the current node.
> > So, with this idea, each node will have the same view of CSN value
> > corresponding to any particular transaction.
> >
> > For Snapshot management, the node which receives the query generates a
> > CSN (CurrentCSN) and follows the simple rule that the tuple having a
> > xid with CSN lesser than CurrentCSN will be visible.  Now, it is
> > possible that when we are examining a tuple, the CSN corresponding to
> > xid that has written the tuple has a value as INDOUBT which will
> > indicate that the transaction is yet not committed on all nodes.  And
> > we wait till we get the valid CSN value corresponding to xid and then
> > use it to check if the tuple is visible.
> >
> > Now, one thing to note here is that for global transactions we
> > primarily rely on CSN value corresponding to a transaction for its
> > visibility even though we still maintain CLOG for local transaction
> > status.
> >
> > Leaving aside the incomplete parts and or flaws of the current patch,
> > does the above match the top-level idea of this patch?
>
> I'm still studying this patch but your understanding seems right to me.
>

Cool. While studying, if you can try to think whether this approach is
different from the global coordinator based approach then it would be
great.  Here is my initial thought apart from other reasons the global
coordinator based design can help us to do the global transaction
management and snapshots.  It can allocate xids for each transaction
and then collect the list of running xacts (or CSN) from each node and
then prepare a global snapshot that can be used to perform any
transaction.

OTOH, in the design proposed in this patch, we don't need any
coordinator to manage transactions and snapshots because each node's
current CSN will be sufficient for snapshot and visibility as
explained above.  Now, sure this assumes that there is no clock skew
on different nodes or somehow we take care of the same (Note that in
the proposed patch the CSN is a timestamp.).

> > I am not sure
> > if my understanding of this patch at this stage is completely correct
> > or whether we want to follow the approach of this patch but I think at
> > least lets first be sure if such a top-level idea can achieve what we
> > want to do here.
> >
> > > Considering how to integrate this global snapshot feature with the 2PC
> > > patch, what the 2PC patch needs to at least change is to allow FDW to
> > > store an FDW-private data that is passed to subsequent FDW transaction
> > > API calls. Currently, in the current 2PC patch, we call Prepare API
> > > for each participant servers one by one, and the core pass only
> > > metadata such as ForeignServer, UserMapping, and global transaction
> > > identifier. So it's not easy to calculate the maximum CSN across
> > > multiple transaction API calls. I think we can change the 2PC patch to
> > > add a void pointer into FdwXactRslvState, struct passed from the core,
> > > in order to store FDW-private data. It's going to be the maximum CSN
> > > in this case. That way, at the first Prepare API calls postgres_fdw
> > > allocates the space and stores CSN to that space. And at subsequent
> > > Prepare API calls it can calculate the maximum of csn, and then is
> > > able to the step 3 to 6 when preparing the transaction on the last
> > > participant. Another idea would be to change 2PC patch so that the
> > > core passes a bunch of participants grouped by FDW.
> > >
> >
> > IIUC with this the coordinator needs the communication with the nodes
> > twice at the prepare stage, once to prepare the transaction in each
> > node and get CSN from each node and then to communicate MaxCSN to each
> > node?
>
> Yes, I think so too.
>
> > Also, we probably need InDoubt CSN status at prepare phase to
> > make snapshots and global visibility work.
>
> I think it depends on how global CSN feature works.
>
> For instance, in that 2PC patch, if the coordinator crashes during
> preparing a foreign transaction, the global transaction manager
> recovers and regards it as "prepared" regardless of the foreign
> transaction actually having been prepared. And it sends ROLLBACK
> PREPARED after recovery completed. With global CSN patch, as you
> mentioned, at prepare phase the coordinator needs to communicate
> participants twice other than sending PREPARE TRANSACTION:
> pg_global_snapshot_prepare() and pg_global_snapshot_assign().
>
> If global CSN patch needs different cleanup work depending on the CSN
> status, we will need InDoubt CSN status so that the global transaction
> manager can distinguish between a foreign transaction that has
> executed pg_global_snapshot_prepare() and the one that has executed
> pg_global_snapshot_assign().
>
> On the other hand, if it's enough to just send ROLLBACK or ROLLBACK
> PREPARED in that case, I think we don't need InDoubt CSN status. There
> is no difference between those foreign transactions from the global
> transaction manager perspective.
>

I think InDoubt status helps in checking visibility in the proposed
patch wherein if we find the status of the transaction as InDoubt, we
wait till we get some valid CSN for it as explained in my previous
email.  So whether we use it for Rollback/Rollback Prepared, it is
required for this design.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Global snapshots

Masahiko Sawada-2
On Wed, 8 Jul 2020 at 21:35, Amit Kapila <[hidden email]> wrote:

>
> On Wed, Jul 8, 2020 at 11:16 AM Masahiko Sawada
> <[hidden email]> wrote:
> >
> > On Tue, 7 Jul 2020 at 15:40, Amit Kapila <[hidden email]> wrote:
> > >
> > >
> > > Okay, but isn't there some advantage with this approach (manage 2PC at
> > > postgres_fdw level) as well which is that any node will be capable of
> > > handling global transactions rather than doing them via central
> > > coordinator?  I mean any node can do writes or reads rather than
> > > probably routing them (at least writes) via coordinator node.
> >
> > The postgres server where the client started the transaction works as
> > the coordinator node. I think this is true for both this patch and
> > that 2PC patch. From the perspective of atomic commit, any node will
> > be capable of handling global transactions in both approaches.
> >
>
> Okay, but then probably we need to ensure that GID has to be unique
> even if that gets generated on different nodes?  I don't know if that
> is ensured.

Yes, if you mean GID is global transaction id specified to PREPARE
TRANSACTION, it has to be unique. In that 2PC patch, GID is generated
in form of 'fx_<random string>_<server oid>_<user oid>'. I believe it
can ensure uniqueness in most cases. In addition, there is FDW API to
generate an arbitrary identifier.

>
> > >  Now, I
> > > agree that even if this advantage is there in the current approach, we
> > > can't lose the crash-safety aspect of other approach.  Will you be
> > > able to summarize what was the problem w.r.t crash-safety and how your
> > > patch has dealt it?
> >
> > Since this patch proceeds 2PC without any logging, foreign
> > transactions prepared on foreign servers are left over without any
> > clues if the coordinator crashes during commit. Therefore, after
> > restart, the user will need to find and resolve in-doubt foreign
> > transactions manually.
> >
>
> Okay, but is it because we can't directly WAL log in postgres_fdw or
> there is some other reason for not doing so?

Yes, I think it is because we cannot WAL log in postgres_fdw. Maybe I
missed the point in your question. Please correct me if I missed
something.

>
> >
> > >
> > > > Looking at the commit procedure with this patch:
> > > >
> > > > When starting a new transaction on a foreign server, postgres_fdw
> > > > executes pg_global_snapshot_import() to import the global snapshot.
> > > > After some work, in pre-commit phase we do:
> > > >
> > > > 1. generate global transaction id, say 'gid'
> > > > 2. execute PREPARE TRANSACTION 'gid' on all participants.
> > > > 3. prepare global snapshot locally, if the local node also involves
> > > > the transaction
> > > > 4. execute pg_global_snapshot_prepare('gid') for all participants
> > > >
> > > > During step 2 to 4, we calculate the maximum CSN from the CSNs
> > > > returned from each pg_global_snapshot_prepare() executions.
> > > >
> > > > 5. assign global snapshot locally, if the local node also involves the
> > > > transaction
> > > > 6. execute pg_global_snapshot_assign('gid', max-csn) on all participants.
> > > >
> > > > Then, we commit locally (i.g. mark the current transaction as
> > > > committed in clog).
> > > >
> > > > After that, in post-commit phase, execute COMMIT PREPARED 'gid' on all
> > > > participants.
> > > >
> > >
> > > As per my current understanding, the overall idea is as follows.  For
> > > global transactions, pg_global_snapshot_prepare('gid') will set the
> > > transaction status as InDoubt and generate CSN (let's call it NodeCSN)
> > > at the node where that function is executed, it also returns the
> > > NodeCSN to the coordinator.  Then the coordinator (the current
> > > postgres_fdw node on which write transaction is being executed)
> > > computes MaxCSN based on the return value (NodeCSN) of prepare
> > > (pg_global_snapshot_prepare) from all nodes.  It then assigns MaxCSN
> > > to each node.  Finally, when Commit Prepared is issued for each node
> > > that MaxCSN will be written to each node including the current node.
> > > So, with this idea, each node will have the same view of CSN value
> > > corresponding to any particular transaction.
> > >
> > > For Snapshot management, the node which receives the query generates a
> > > CSN (CurrentCSN) and follows the simple rule that the tuple having a
> > > xid with CSN lesser than CurrentCSN will be visible.  Now, it is
> > > possible that when we are examining a tuple, the CSN corresponding to
> > > xid that has written the tuple has a value as INDOUBT which will
> > > indicate that the transaction is yet not committed on all nodes.  And
> > > we wait till we get the valid CSN value corresponding to xid and then
> > > use it to check if the tuple is visible.
> > >
> > > Now, one thing to note here is that for global transactions we
> > > primarily rely on CSN value corresponding to a transaction for its
> > > visibility even though we still maintain CLOG for local transaction
> > > status.
> > >
> > > Leaving aside the incomplete parts and or flaws of the current patch,
> > > does the above match the top-level idea of this patch?
> >
> > I'm still studying this patch but your understanding seems right to me.
> >
>
> Cool. While studying, if you can try to think whether this approach is
> different from the global coordinator based approach then it would be
> great.  Here is my initial thought apart from other reasons the global
> coordinator based design can help us to do the global transaction
> management and snapshots.  It can allocate xids for each transaction
> and then collect the list of running xacts (or CSN) from each node and
> then prepare a global snapshot that can be used to perform any
> transaction. OTOH, in the design proposed in this patch, we don't need any
> coordinator to manage transactions and snapshots because each node's
> current CSN will be sufficient for snapshot and visibility as
> explained above.

Yeah, my thought is the same as you. Since both approaches have strong
points and weak points I cannot mention which is a better approach,
but that 2PC patch would go well together with the design proposed in
this patch.

> Now, sure this assumes that there is no clock skew
> on different nodes or somehow we take care of the same (Note that in
> the proposed patch the CSN is a timestamp.).

As far as I read Clock-SI paper, we take care of the clock skew by
putting some waits on the transaction start and reading tuples on the
remote node.

>
> > > I am not sure
> > > if my understanding of this patch at this stage is completely correct
> > > or whether we want to follow the approach of this patch but I think at
> > > least lets first be sure if such a top-level idea can achieve what we
> > > want to do here.
> > >
> > > > Considering how to integrate this global snapshot feature with the 2PC
> > > > patch, what the 2PC patch needs to at least change is to allow FDW to
> > > > store an FDW-private data that is passed to subsequent FDW transaction
> > > > API calls. Currently, in the current 2PC patch, we call Prepare API
> > > > for each participant servers one by one, and the core pass only
> > > > metadata such as ForeignServer, UserMapping, and global transaction
> > > > identifier. So it's not easy to calculate the maximum CSN across
> > > > multiple transaction API calls. I think we can change the 2PC patch to
> > > > add a void pointer into FdwXactRslvState, struct passed from the core,
> > > > in order to store FDW-private data. It's going to be the maximum CSN
> > > > in this case. That way, at the first Prepare API calls postgres_fdw
> > > > allocates the space and stores CSN to that space. And at subsequent
> > > > Prepare API calls it can calculate the maximum of csn, and then is
> > > > able to the step 3 to 6 when preparing the transaction on the last
> > > > participant. Another idea would be to change 2PC patch so that the
> > > > core passes a bunch of participants grouped by FDW.
> > > >
> > >
> > > IIUC with this the coordinator needs the communication with the nodes
> > > twice at the prepare stage, once to prepare the transaction in each
> > > node and get CSN from each node and then to communicate MaxCSN to each
> > > node?
> >
> > Yes, I think so too.
> >
> > > Also, we probably need InDoubt CSN status at prepare phase to
> > > make snapshots and global visibility work.
> >
> > I think it depends on how global CSN feature works.
> >
> > For instance, in that 2PC patch, if the coordinator crashes during
> > preparing a foreign transaction, the global transaction manager
> > recovers and regards it as "prepared" regardless of the foreign
> > transaction actually having been prepared. And it sends ROLLBACK
> > PREPARED after recovery completed. With global CSN patch, as you
> > mentioned, at prepare phase the coordinator needs to communicate
> > participants twice other than sending PREPARE TRANSACTION:
> > pg_global_snapshot_prepare() and pg_global_snapshot_assign().
> >
> > If global CSN patch needs different cleanup work depending on the CSN
> > status, we will need InDoubt CSN status so that the global transaction
> > manager can distinguish between a foreign transaction that has
> > executed pg_global_snapshot_prepare() and the one that has executed
> > pg_global_snapshot_assign().
> >
> > On the other hand, if it's enough to just send ROLLBACK or ROLLBACK
> > PREPARED in that case, I think we don't need InDoubt CSN status. There
> > is no difference between those foreign transactions from the global
> > transaction manager perspective.
> >
>
> I think InDoubt status helps in checking visibility in the proposed
> patch wherein if we find the status of the transaction as InDoubt, we
> wait till we get some valid CSN for it as explained in my previous
> email.  So whether we use it for Rollback/Rollback Prepared, it is
> required for this design.

Yes, InDoubt status is required for checking visibility. My comment
was it's not necessary from the perspective of atomic commit.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Global snapshots

akapila
On Fri, Jul 10, 2020 at 8:46 AM Masahiko Sawada
<[hidden email]> wrote:

>
> On Wed, 8 Jul 2020 at 21:35, Amit Kapila <[hidden email]> wrote:
> >
> >
> > Cool. While studying, if you can try to think whether this approach is
> > different from the global coordinator based approach then it would be
> > great.  Here is my initial thought apart from other reasons the global
> > coordinator based design can help us to do the global transaction
> > management and snapshots.  It can allocate xids for each transaction
> > and then collect the list of running xacts (or CSN) from each node and
> > then prepare a global snapshot that can be used to perform any
> > transaction. OTOH, in the design proposed in this patch, we don't need any
> > coordinator to manage transactions and snapshots because each node's
> > current CSN will be sufficient for snapshot and visibility as
> > explained above.
>
> Yeah, my thought is the same as you. Since both approaches have strong
> points and weak points I cannot mention which is a better approach,
> but that 2PC patch would go well together with the design proposed in
> this patch.
>

I also think with some modifications we might be able to integrate
your 2PC patch with the patches proposed here.  However, if we decide
not to pursue this approach then it is uncertain whether your proposed
patch can be further enhanced for global visibility.  Does it make
sense to dig the design of this approach a bit further so that we can
be somewhat more sure that pursuing your 2PC patch would be a good
idea and we can, in fact, enhance it later for global visibility?
AFAICS, Andrey has mentioned couple of problems with this approach
[1], the details of which I am also not sure at this stage but if we
can dig those it would be really great.

> > Now, sure this assumes that there is no clock skew
> > on different nodes or somehow we take care of the same (Note that in
> > the proposed patch the CSN is a timestamp.).
>
> As far as I read Clock-SI paper, we take care of the clock skew by
> putting some waits on the transaction start and reading tuples on the
> remote node.
>

Oh, but I am not sure if this patch is able to solve that, and if so, how?

> >
> > I think InDoubt status helps in checking visibility in the proposed
> > patch wherein if we find the status of the transaction as InDoubt, we
> > wait till we get some valid CSN for it as explained in my previous
> > email.  So whether we use it for Rollback/Rollback Prepared, it is
> > required for this design.
>
> Yes, InDoubt status is required for checking visibility. My comment
> was it's not necessary from the perspective of atomic commit.
>

True and probably we can enhance your patch for InDoubt status if required.

Thanks for moving this work forward.  I know the progress is a bit
slow due to various reasons but I think it is important to keep making
some progress.

[1] - https://www.postgresql.org/message-id/f23083b9-38d0-6126-eb6e-091816a78585%40postgrespro.ru

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

RE: Global snapshots

tsunakawa.takay@fujitsu.com
Hello,

While I'm thinking of the following issues of the current approach Andrey raised, I'm getting puzzled and can't help asking certain things.  Please forgive me if I'm missing some discussions in the past.

> 1. Dependency on clocks synchronization
> 2. Needs guarantees of monotonically increasing of the CSN in the case
> of an instance restart/crash etc.
> 3. We need to delay increasing of OldestXmin because it can be needed
> for a transaction snapshot at another node.

While Clock-SI seems to be considered the best promising for global serializability here,

* Why does Clock-SI gets so much attention?  How did Clock-SI become the only choice?

* Clock-SI was devised in Microsoft Research.  Does Microsoft or some other organization use Clock-SI?


Have anyone examined the following Multiversion Commitment Ordering (MVCO)?  Although I haven't understood this yet, it insists that no concurrency control information including timestamps needs to be exchanged among the cluster nodes.  I'd appreciate it if someone could give an opinion.

Commitment Ordering Based Distributed Concurrency Control for Bridging Single and Multi Version Resources.
 Proceedings of the Third IEEE International Workshop on Research Issues on Data Engineering: Interoperability in Multidatabase Systems (RIDE-IMS), Vienna, Austria, pp. 189-198, April 1993. (also DEC-TR 853, July 1992)
https://ieeexplore.ieee.org/document/281924?arnumber=281924


The author of the above paper, Yoav Raz, seems to have had strong passion at least until 2011 about making people believe the mightiness of Commitment Ordering (CO) for global serializability.  However, he complains (sadly) that almost all researchers ignore his theory, as written in his following  site and wikipedia page for Commitment Ordering.  Does anyone know why CO is ignored?

Commitment ordering (CO) - yoavraz2
https://sites.google.com/site/yoavraz2/the_principle_of_co


FWIW, some researchers including Michael Stonebraker evaluated the performance of various distributed concurrency control methods in 2017.  Have anyone looked at this?  (I don't mean there was some promising method that we might want to adopt.)

An Evaluation of Distributed Concurrency Control
Rachael Harding, Dana Van Aken, Andrew Pavlo, and Michael Stonebraker. 2017.
Proc. VLDB Endow. 10, 5 (January 2017), 553-564.
https://doi.org/10.14778/3055540.3055548


Regards
Takayuki Tsunakawa

123