Minimal logical decoding on standbys

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
73 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

Minimal logical decoding on standbys

Andres Freund
Hi,

Craig previously worked on $subject, see thread [1].  A bunch of the
prerequisite features from that and other related threads have been
integrated into PG.  What's missing is actually allowing logical
decoding on a standby.  The latest patch from that thread does that [2],
but unfortunately hasn't been updated after slipping v10.

The biggest remaining issue to allow it is that the catalog xmin on the
primary has to be above the catalog xmin horizon of all slots on the
standby. The patch in [2] does so by periodically logging a new record
that announces the current catalog xmin horizon.   Additionally it
checks that hot_standby_feedback is enabled when doing logical decoding
from a standby.

I don't like the approach of managing the catalog horizon via those
periodically logged catalog xmin announcements.  I think we instead
should build ontop of the records we already have and use to compute
snapshot conflicts.  As of HEAD we don't know whether such tables are
catalog tables, but that's just a bool that we need to include in the
records, a basically immeasurable overhead given the size of those
records.

I also don't think we should actually enforce hot_standby_feedback being
enabled - there's use-cases where that's not convenient, and it's not
bullet proof anyway (can be enabled/disabled without using logical
decoding inbetween).  I think when there's a conflict we should have the
HINT mention that hs_feedback can be used to prevent such conflicts,
that ought to be enough.

Attached is a rough draft patch. If we were to go for this approach,
we'd obviously need to improve the actual conflict handling against
slots - right now it just logs a WARNING and retries shortly after.

I think there's currently one hole in this approach. Nbtree (and other
index types, which are pretty unlikely to matter here) have this logic
to handle snapshot conflicts for single-page deletions:


        /*
         * If we have any conflict processing to do, it must happen before we
         * update the page.
         *
         * Btree delete records can conflict with standby queries.  You might
         * think that vacuum records would conflict as well, but we've handled
         * that already.  XLOG_HEAP2_CLEANUP_INFO records provide the highest xid
         * cleaned by the vacuum of the heap and so we can resolve any conflicts
         * just once when that arrives.  After that we know that no conflicts
         * exist from individual btree vacuum records on that index.
         */
        if (InHotStandby)
        {
                TransactionId latestRemovedXid = btree_xlog_delete_get_latestRemovedXid(record);
                RelFileNode rnode;

                XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);

                ResolveRecoveryConflictWithSnapshot(latestRemovedXid,
                                                                                        xlrec->onCatalogTable, rnode);
        }

I.e. we get the latest removed xid from the heap, which has the
following logic:
        /*
         * If there's nothing running on the standby we don't need to derive a
         * full latestRemovedXid value, so use a fast path out of here.  This
         * returns InvalidTransactionId, and so will conflict with all HS
         * transactions; but since we just worked out that that's zero people,
         * it's OK.
         *
         * XXX There is a race condition here, which is that a new backend might
         * start just after we look.  If so, it cannot need to conflict, but this
         * coding will result in throwing a conflict anyway.
         */
        if (CountDBBackends(InvalidOid) == 0)
                return latestRemovedXid;

        /*
         * In what follows, we have to examine the previous state of the index
         * page, as well as the heap page(s) it points to.  This is only valid if
         * WAL replay has reached a consistent database state; which means that
         * the preceding check is not just an optimization, but is *necessary*. We
         * won't have let in any user sessions before we reach consistency.
         */
        if (!reachedConsistency)
                elog(PANIC, "btree_xlog_delete_get_latestRemovedXid: cannot operate with inconsistent data");

so we wouldn't get a correct xid when not if nobody is connected to a
database (and by implication when not yet consistent).


I'm wondering if it's time to move the latestRemovedXid computation for
this type of record to the primary - it's likely to be cheaper there and
avoids this kind of complication. Secondarily, it'd have the advantage
of making pluggable storage integration easier - there we have the
problem that we don't know which type of relation we're dealing with
during recovery, so such lookups make pluggability harder (zheap just
adds extra flags to signal that, but that's not extensible).

Another alternative would be to just prevent such index deletions for
catalog tables when wal_level = logical.


If we were to go with this approach, there'd be at least the following
tasks:
- adapt tests from [2]
- enforce hot-standby to be enabled on the standby when logical slots
  are created, and at startup if a logical slot exists
- fix issue around btree_xlog_delete_get_latestRemovedXid etc mentioned
  above.
- Have a nicer conflict handling than what I implemented here.  Craig's
  approach deleted the slots, but I'm not sure I like that.  Blocking
  seems more appropriately here, after all it's likely that the
  replication topology would be broken afterwards.
- get_rel_logical_catalog() shouldn't be in lsyscache.[ch], and can be
  optimized (e.g. check wal_level before opening rel etc).


Once we have this logic, it can be used to implement something like
failover slots on-top, by having having a mechanism that occasionally
forwards slots on standbys using pg_replication_slot_advance().

Greetings,

Andres Freund

[1] https://www.postgresql.org/message-id/CAMsr+YEVmBJ=dyLw=+kTihmUnGy5_EW4Mig5T0maieg_Zu=XCg@...
[2] https://archives.postgresql.org/message-id/CAMsr%2BYEbS8ZZ%2Bw18j7OPM2MZEeDtGN9wDVF68%3DMzpeW%3DKRZZ9Q%40mail.gmail.com

logical-decoding-on-standby.diff (19K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Minimal logical decoding on standbys

Robert Haas
On Wed, Dec 12, 2018 at 3:41 PM Andres Freund <[hidden email]> wrote:
> I don't like the approach of managing the catalog horizon via those
> periodically logged catalog xmin announcements.  I think we instead
> should build ontop of the records we already have and use to compute
> snapshot conflicts.  As of HEAD we don't know whether such tables are
> catalog tables, but that's just a bool that we need to include in the
> records, a basically immeasurable overhead given the size of those
> records.

To me, this paragraph appears to say that you don't like Craig's
approach without quite explaining why you don't like it.  Could you be
a bit more explicit about that?

> I also don't think we should actually enforce hot_standby_feedback being
> enabled - there's use-cases where that's not convenient, and it's not
> bullet proof anyway (can be enabled/disabled without using logical
> decoding inbetween).  I think when there's a conflict we should have the
> HINT mention that hs_feedback can be used to prevent such conflicts,
> that ought to be enough.

If we can make that work, +1 from me.

> I'm wondering if it's time to move the latestRemovedXid computation for
> this type of record to the primary - it's likely to be cheaper there and
> avoids this kind of complication. Secondarily, it'd have the advantage
> of making pluggable storage integration easier - there we have the
> problem that we don't know which type of relation we're dealing with
> during recovery, so such lookups make pluggability harder (zheap just
> adds extra flags to signal that, but that's not extensible).

That doesn't look trivial.  It seems like _bt_delitems_delete() would
need to get an array of XIDs, but that gets called from
_bt_vacuum_one_page(), which doesn't have that information available.
It doesn't look like there is a particularly cheap way of getting it,
either.  What do you have in mind?

> Another alternative would be to just prevent such index deletions for
> catalog tables when wal_level = logical.

That doesn't sound like a very nice idea.

> If we were to go with this approach, there'd be at least the following
> tasks:
> - adapt tests from [2]

OK.

> - enforce hot-standby to be enabled on the standby when logical slots
>   are created, and at startup if a logical slot exists

Why do we need this?

> - fix issue around btree_xlog_delete_get_latestRemovedXid etc mentioned
>   above.

OK.

> - Have a nicer conflict handling than what I implemented here.  Craig's
>   approach deleted the slots, but I'm not sure I like that.  Blocking
>   seems more appropriately here, after all it's likely that the
>   replication topology would be broken afterwards.

I guess the viable options are approximately -- (1) drop the slot, (2)
advance the slot, (3) mark the slot as "failed" but leave it in
existence as a tombstone, (4) wait until something changes.  I like
(3) better than (1).  (4) seems pretty unfortunate unless there's some
other system for having the slot advance automatically.  Seems like a
way for replication to hang indefinitely without anybody understanding
why it's happened (or, maybe, noticing).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Reply | Threaded
Open this post in threaded view
|

Re: Minimal logical decoding on standbys

Andres Freund
Hi,

On 2018-12-13 19:32:19 -0500, Robert Haas wrote:

> On Wed, Dec 12, 2018 at 3:41 PM Andres Freund <[hidden email]> wrote:
> > I don't like the approach of managing the catalog horizon via those
> > periodically logged catalog xmin announcements.  I think we instead
> > should build ontop of the records we already have and use to compute
> > snapshot conflicts.  As of HEAD we don't know whether such tables are
> > catalog tables, but that's just a bool that we need to include in the
> > records, a basically immeasurable overhead given the size of those
> > records.
>
> To me, this paragraph appears to say that you don't like Craig's
> approach without quite explaining why you don't like it.  Could you be
> a bit more explicit about that?
I think the conflict system introduced in Craig's patch is quite
complicated, relies on logging new wal records on a regular basis, adds
needs to be more conservative about the xmin horizon, which is obviously
not great for performance.

If you look at Craig's patch, it currently relies on blocking out
concurrent checkpoints:
               /*
                * We must prevent a concurrent checkpoint, otherwise the catalog xmin
                * advance xlog record with the new value might be written before the
                * checkpoint but the checkpoint may still see the old
                * oldestCatalogXmin value.
                */
               if (!LWLockConditionalAcquire(CheckpointLock, LW_SHARED))
                       /* Couldn't get checkpointer lock; will retry later */
                       return;
which on its own seems unacceptable, given that CheckpointLock can be
held by checkpointer for a very long time. While that's ongoing the
catalog xmin horizon doesn't advance.

Looking at the code it seems hard, to me, to make that approach work
nicely. But I might just be tired.


> > I'm wondering if it's time to move the latestRemovedXid computation for
> > this type of record to the primary - it's likely to be cheaper there and
> > avoids this kind of complication. Secondarily, it'd have the advantage
> > of making pluggable storage integration easier - there we have the
> > problem that we don't know which type of relation we're dealing with
> > during recovery, so such lookups make pluggability harder (zheap just
> > adds extra flags to signal that, but that's not extensible).
>
> That doesn't look trivial.  It seems like _bt_delitems_delete() would
> need to get an array of XIDs, but that gets called from
> _bt_vacuum_one_page(), which doesn't have that information available.
> It doesn't look like there is a particularly cheap way of getting it,
> either.  What do you have in mind?
I've a prototype attached, but let's discuss the details in a separate
thread. This also needs to be changed for pluggable storage, as we don't
know about table access methods in the startup process, so we can't call
can't determine which AM the heap is from during
btree_xlog_delete_get_latestRemovedXid() (and sibling routines).

Writing that message right now.


> > - enforce hot-standby to be enabled on the standby when logical slots
> >   are created, and at startup if a logical slot exists
>
> Why do we need this?

Currently the conflict routines are only called when hot standby is
on. There's also no way to use logical decoding (including just advancing the
slot), without hot-standby being enabled, so I think that'd be a pretty
harmless restriction.


> > - Have a nicer conflict handling than what I implemented here.  Craig's
> >   approach deleted the slots, but I'm not sure I like that.  Blocking
> >   seems more appropriately here, after all it's likely that the
> >   replication topology would be broken afterwards.
>
> I guess the viable options are approximately --

> (1) drop the slot

Doable.


> (2)  advance the slot

That's not realistically possible, I think. We'd need to be able to use
most of the logical decoding infrastructure in that context, and we
don't have that available.  It's also possible to deadlock, where
advancing the slot's xmin horizon would need further WAL, but WAL replay
is blocked on advancing the slot.


> (3) mark the slot as "failed" but leave it in existence as a tombstone

We currently don't have that, but it'd be doable, I think.

> (4) wait until something changes.
> (4) seems pretty unfortunate unless there's some other system for
> having the slot advance automatically.  Seems like a way for
> replication to hang indefinitely without anybody understanding why
> it's happened (or, maybe, noticing).

On the other hand, it would often allow whatever user of the slot to
continue using it, till the conflict is "resolved". To me it seems about
as easy to debug physical replication being blocked, as somehow the slot
being magically deleted or marked as invalid.


Thanks for looking,

Andres Freund

index-page-vacuum-xid-horizon-primary.diff (22K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Minimal logical decoding on standbys

Petr Jelinek-4
In reply to this post by Andres Freund
Hi,

On 12/12/2018 21:41, Andres Freund wrote:
>
> I don't like the approach of managing the catalog horizon via those
> periodically logged catalog xmin announcements.  I think we instead
> should build ontop of the records we already have and use to compute
> snapshot conflicts.  As of HEAD we don't know whether such tables are
> catalog tables, but that's just a bool that we need to include in the
> records, a basically immeasurable overhead given the size of those
> records.

IIRC I was originally advocating adding that xmin announcement to the
standby snapshot message, but this seems better.

>
> If we were to go with this approach, there'd be at least the following
> tasks:
> - adapt tests from [2]
> - enforce hot-standby to be enabled on the standby when logical slots
>   are created, and at startup if a logical slot exists
> - fix issue around btree_xlog_delete_get_latestRemovedXid etc mentioned
>   above.
> - Have a nicer conflict handling than what I implemented here.  Craig's
>   approach deleted the slots, but I'm not sure I like that.  Blocking
>   seems more appropriately here, after all it's likely that the
>   replication topology would be broken afterwards.
> - get_rel_logical_catalog() shouldn't be in lsyscache.[ch], and can be
>   optimized (e.g. check wal_level before opening rel etc).
>
>
> Once we have this logic, it can be used to implement something like
> failover slots on-top, by having having a mechanism that occasionally
> forwards slots on standbys using pg_replication_slot_advance().
>

Looking at this from the failover slots perspective. Wouldn't blocking
on conflict mean that we stop physical replication on catalog xmin
advance when there is lagging logical replication on primary? It might
not be too big deal as in that use-case it should only happen if
hs_feedback was off at some point, but just wanted to point out this
potential problem.

--
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: Minimal logical decoding on standbys

tushar1
Hi,

While testing  this feature  found that - if lots of insert happened on
the master cluster then pg_recvlogical is not showing the DATA
information  on logical replication slot which created on SLAVE.

Please refer this scenario -

1)
Create a Master cluster with wal_level=logcal and create logical
replication slot -
  SELECT * FROM pg_create_logical_replication_slot('master_slot',
'test_decoding');

2)
Create a Standby  cluster using pg_basebackup ( ./pg_basebackup -D
slave/ -v -R)  and create logical replication slot -
SELECT * FROM pg_create_logical_replication_slot('standby_slot',
'test_decoding');

3)
X terminal - start  pg_recvlogical  , provide port=5555 ( slave
cluster)  and specify slot=standby_slot
./pg_recvlogical -d postgres  -p 5555 -s 1 -F 1  -v --slot=standby_slot 
--start -f -

Y terminal - start  pg_recvlogical  , provide port=5432 ( master
cluster)  and specify slot=master_slot
./pg_recvlogical -d postgres  -p 5432 -s 1 -F 1  -v --slot=master_slot 
--start -f -

Z terminal - run pg_bench  against Master cluster ( ./pg_bench -i -s 10
postgres)

Able to see DATA information on Y terminal  but not on X.

but same able to see by firing this below query on SLAVE cluster -

SELECT * FROM pg_logical_slot_get_changes('standby_slot', NULL, NULL);

Is it expected ?

regards,
tushar

On 12/17/2018 10:46 PM, Petr Jelinek wrote:

> Hi,
>
> On 12/12/2018 21:41, Andres Freund wrote:
>> I don't like the approach of managing the catalog horizon via those
>> periodically logged catalog xmin announcements.  I think we instead
>> should build ontop of the records we already have and use to compute
>> snapshot conflicts.  As of HEAD we don't know whether such tables are
>> catalog tables, but that's just a bool that we need to include in the
>> records, a basically immeasurable overhead given the size of those
>> records.
> IIRC I was originally advocating adding that xmin announcement to the
> standby snapshot message, but this seems better.
>
>> If we were to go with this approach, there'd be at least the following
>> tasks:
>> - adapt tests from [2]
>> - enforce hot-standby to be enabled on the standby when logical slots
>>    are created, and at startup if a logical slot exists
>> - fix issue around btree_xlog_delete_get_latestRemovedXid etc mentioned
>>    above.
>> - Have a nicer conflict handling than what I implemented here.  Craig's
>>    approach deleted the slots, but I'm not sure I like that.  Blocking
>>    seems more appropriately here, after all it's likely that the
>>    replication topology would be broken afterwards.
>> - get_rel_logical_catalog() shouldn't be in lsyscache.[ch], and can be
>>    optimized (e.g. check wal_level before opening rel etc).
>>
>>
>> Once we have this logic, it can be used to implement something like
>> failover slots on-top, by having having a mechanism that occasionally
>> forwards slots on standbys using pg_replication_slot_advance().
>>
> Looking at this from the failover slots perspective. Wouldn't blocking
> on conflict mean that we stop physical replication on catalog xmin
> advance when there is lagging logical replication on primary? It might
> not be too big deal as in that use-case it should only happen if
> hs_feedback was off at some point, but just wanted to point out this
> potential problem.
>

--
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: Minimal logical decoding on standbys

Andres Freund
Hi,

On 2019-03-01 13:33:23 +0530, tushar wrote:

> While testing  this feature  found that - if lots of insert happened on the
> master cluster then pg_recvlogical is not showing the DATA information  on
> logical replication slot which created on SLAVE.
>
> Please refer this scenario -
>
> 1)
> Create a Master cluster with wal_level=logcal and create logical replication
> slot -
>  SELECT * FROM pg_create_logical_replication_slot('master_slot',
> 'test_decoding');
>
> 2)
> Create a Standby  cluster using pg_basebackup ( ./pg_basebackup -D slave/ -v
> -R)  and create logical replication slot -
> SELECT * FROM pg_create_logical_replication_slot('standby_slot',
> 'test_decoding');

So, if I understand correctly you do *not* have a phyiscal replication
slot for this standby? For the feature to work reliably that needs to
exist, and you need to have hot_standby_feedback enabled. Does having
that fix the issue?

Thanks,

Andres

Reply | Threaded
Open this post in threaded view
|

Re: Minimal logical decoding on standbys

Amit Khandekar-2
In reply to this post by Andres Freund
On Fri, 14 Dec 2018 at 06:25, Andres Freund <[hidden email]> wrote:
> I've a prototype attached, but let's discuss the details in a separate
> thread. This also needs to be changed for pluggable storage, as we don't
> know about table access methods in the startup process, so we can't call
> can't determine which AM the heap is from during
> btree_xlog_delete_get_latestRemovedXid() (and sibling routines).

Attached is a WIP test patch
0003-WIP-TAP-test-for-logical-decoding-on-standby.patch that has a
modified version of Craig Ringer's test cases
(012_logical_decoding_on_replica.pl) that he had attached in [1].
Here, I have also attached his original file
(Craigs_012_logical_decoding_on_replica.pl).

Also attached are rebased versions of couple of Andres's implementation patches.

I have added a new test scenario :
DROP TABLE from master *before* the logical records of the table
insertions are retrieved from standby. The logical records should be
successfully retrieved.


Regarding the test result failures, I could see that when we drop a
logical replication slot at standby server, then the catalog_xmin of
physical replication slot becomes NULL, whereas the test expects it to
be equal to xmin; and that's the reason a couple of test scenarios are
failing :

ok 33 - slot on standby dropped manually
Waiting for replication conn replica's replay_lsn to pass '0/31273E0' on master
done
not ok 34 - physical catalog_xmin still non-null
not ok 35 - xmin and catalog_xmin equal after slot drop
#   Failed test 'xmin and catalog_xmin equal after slot drop'
#   at t/016_logical_decoding_on_replica.pl line 272.
#          got:
#     expected: 2584



Other than the above, there is this test scenario which I had to remove :

#########################################################
# Conflict with recovery: xmin cancels decoding session
#########################################################
#
# Start a transaction on the replica then perform work that should cause a
# recovery conflict with it. We'll check to make sure the client gets
# terminated with recovery conflict.
#
# Temporarily disable hs feedback so we can test recovery conflicts.
# It's fine to continue using a physical slot, the xmin should be
# cleared. We only check hot_standby_feedback when establishing
# a new decoding session so this approach circumvents the safeguards
# in place and forces a conflict.

This test starts pg_recvlogical, and expects it to be terminated due
to recovery conflict because hs feedback is disabled.
But that does not happen; instead, pg_recvlogical does not return.

But I am not sure why it does not terminate with Andres's patch; it
was expected to terminate with Craig Ringer's patch.

Further, there are subsequent test scenarios that test pg_recvlogical
with hs_feedback disabled, which I have removed because pg_recvlogical
does not return. I am yet to clearly understand why that happens. I
suspect that is only because hs_feedback is disabled.

Also, the testcases verify pg_controldata's oldestCatalogXmin values,
which are now not present with Andres's patch; so I removed tracking
of oldestCatalogXmin.

[1] https://www.postgresql.org/message-id/CAMsr+YEVmBJ=dyLw=+kTihmUnGy5_EW4Mig5T0maieg_Zu=XCg@...

Thanks
-Amit Khandekar

0001-Logical-decoding-on-standby_rebased.patch (29K) Download Attachment
Craigs_012_logical_decoding_on_replica.pl (29K) Download Attachment
0002-Move-latestRemovedXid-computation-for-nbtree-xlog-rebased.patch (30K) Download Attachment
0003-WIP-TAP-test-for-logical-decoding-on-standby.patch (22K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Minimal logical decoding on standbys

tushar1
In reply to this post by Andres Freund
On 03/01/2019 11:16 PM, Andres Freund wrote:
So, if I understand correctly you do *not* have a phyiscal replication
slot for this standby? For the feature to work reliably that needs to
exist, and you need to have hot_standby_feedback enabled. Does having
that fix the issue?

Ok, This time around  - I performed like this -

.)Master cluster (set wal_level=logical and  hot_standby_feedback=on in postgresql.conf) , start the server  and create a physical replication slot

postgres=# SELECT * FROM pg_create_physical_replication_slot('decoding_standby');
    slot_name     | lsn
------------------+-----
 decoding_standby |
(1 row)

.)Perform pg_basebackup using --slot=decoding_standby  with option -R . modify port=5555 , start the server

.)Connect to slave and create a logical replication slot

postgres=# create table t(n int);
ERROR:  cannot execute CREATE TABLE in a read-only transaction
postgres=#

postgres=# SELECT * FROM pg_create_logical_replication_slot('standby_slot', 'test_decoding');
  slot_name   |    lsn   
--------------+-----------
 standby_slot | 0/2000060
(1 row)

run pgbench (./pgbench -i -s 10 postgres) against  master and  simultaneously- start  pg_recvlogical  , provide port=5555 ( slave cluster)  and specify slot=standby_slot
./pg_recvlogical -d postgres  -p 5555 -s 1 -F 1  -v --slot=standby_slot  --start -f -


[centos@centos-cpula bin]$ ./pg_recvlogical -d postgres  -p 5555 -s 1 -F 1  -v --slot=standby_slot  --start -f -
pg_recvlogical: starting log streaming at 0/0 (slot standby_slot)
pg_recvlogical: streaming initiated
pg_recvlogical: confirming write up to 0/0, flush to 0/0 (slot standby_slot)
pg_recvlogical: confirming write up to 0/30194E8, flush to 0/30194E8 (slot standby_slot)
pg_recvlogical: confirming write up to 0/3019590, flush to 0/3019590 (slot standby_slot)
pg_recvlogical: confirming write up to 0/3019590, flush to 0/3019590 (slot standby_slot)
pg_recvlogical: confirming write up to 0/3019590, flush to 0/3019590 (slot standby_slot)
pg_recvlogical: confirming write up to 0/3019590, flush to 0/3019590 (slot standby_slot)
pg_recvlogical: confirming write up to 0/3019590, flush to 0/3019590 (slot standby_slot)
pg_recvlogical: confirming write up to 0/3019590, flush to 0/3019590 (slot standby_slot)
pg_recvlogical: confirming write up to 0/3019590, flush to 0/3019590 (slot standby_slot)
pg_recvlogical: confirming write up to 0/3019590, flush to 0/3019590 (slot standby_slot)
pg_recvlogical: confirming write up to 0/3019590, flush to 0/3019590 (slot standby_slot)
pg_recvlogical: confirming write up to 0/301D558, flush to 0/301D558 (slot standby_slot)
BEGIN 476
COMMIT 476
pg_recvlogical: confirming write up to 0/3034B40, flush to 0/3034B40 (slot standby_slot)
pg_recvlogical: confirming write up to 0/3034B40, flush to 0/3034B40 (slot standby_slot)
pg_recvlogical: confirming write up to 0/3034B40, flush to 0/3034B40 (slot standby_slot)
pg_recvlogical: confirming write up to 0/3034B40, flush to 0/3034B40 (slot standby_slot)
pg_recvlogical: confirming write up to 0/3034B40, flush to 0/3034B40 (slot standby_slot)
pg_recvlogical: confirming write up to 0/3034B40, flush to 0/3034B40 (slot standby_slot)
pg_recvlogical: confirming write up to 0/3034B40, flush to 0/3034B40 (slot standby_slot)
pg_recvlogical: confirming write up to 0/3034B40, flush to 0/3034B40 (slot standby_slot)
pg_recvlogical: confirming write up to 0/3034B40, flush to 0/3034B40 (slot standby_slot)
pg_recvlogical: confirming write up to 0/3034B40, flush to 0/3034B40 (slot standby_slot)
pg_recvlogical: confirming write up to 0/3034B40, flush to 0/3034B40 (slot standby_slot)
pg_recvlogical: confirming write up to 0/3034B40, flush to 0/3034B40 (slot standby_slot)
pg_recvlogical: confirming write up to 0/3034B40, flush to 0/3034B40 (slot standby_slot)
pg_recvlogical: confirming write up to 0/3034B40, flush to 0/3034B40 (slot standby_slot)
pg_recvlogical: confirming write up to 0/3034B40, flush to 0/3034B40 (slot standby_slot)
BEGIN 477
COMMIT 477

If we do the same for the logical replication slot which created on Master cluster 

[centos@centos-cpula bin]$ ./pg_recvlogical -d postgres  -s 1 -F 1  -v --slot=master_slot  --start -f -
pg_recvlogical: starting log streaming at 0/0 (slot master_slot)
pg_recvlogical: streaming initiated
pg_recvlogical: confirming write up to 0/0, flush to 0/0 (slot master_slot)
table public.pgbench_accounts: INSERT: aid[integer]:65057 bid[integer]:1 abalance[integer]:0 filler[character]:'                                                                                    '
table public.pgbench_accounts: INSERT: aid[integer]:65058 bid[integer]:1 abalance[integer]:0 filler[character]:'                                                                                    '
table public.pgbench_accounts: INSERT: aid[integer]:65059 bid[integer]:1 abalance[integer]:0 filler[character]:'                                                                                    '
table public.pgbench_accounts: INSERT: aid[integer]:65060 bid[integer]:1 abalance[integer]:0 filler[character]:'                                                                                    '
table public.pgbench_accounts: INSERT: aid[integer]:65061 bid[integer]:1 abalance[integer]:0 filler[character]:'                                                                                    '
table public.pgbench_accounts: INSERT: aid[integer]:65062 bid[integer]:1 abalance[integer]:0 filler[character]:'                                                                                    '
table public.pgbench_accounts: INSERT: aid[integer]:65063 bid[integer]:1 abalance[integer]:0 filler[character]:'                                                                                    '
table public.pgbench_accounts: INSERT: aid[integer]:65064 bid[integer]:1 abalance[integer]:0 filler[character]:'                                                                                    '

-- 
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company
Reply | Threaded
Open this post in threaded view
|

Re: Minimal logical decoding on standbys

tushar1
On 03/04/2019 04:54 PM, tushar wrote:
> .)Perform pg_basebackup using --slot=decoding_standby  with option -R
> . modify port=5555 , start the server

set primary_slot_name = 'decoding_standby'  in the postgresql.conf file
of slave.

--
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: Minimal logical decoding on standbys

Andres Freund
In reply to this post by tushar1
Hi,

On 2019-03-04 16:54:32 +0530, tushar wrote:

> On 03/01/2019 11:16 PM, Andres Freund wrote:
> > So, if I understand correctly you do*not*  have a phyiscal replication
> > slot for this standby? For the feature to work reliably that needs to
> > exist, and you need to have hot_standby_feedback enabled. Does having
> > that fix the issue?
>
> Ok, This time around  - I performed like this -
>
> .)Master cluster (set wal_level=logical and hot_standby_feedback=on in
> postgresql.conf) , start the server and create a physical replication slot

Note that hot_standby_feedback=on needs to be set on a standby, not on
the primary (although it doesn't do any harm there).

Thanks,

Andres

Reply | Threaded
Open this post in threaded view
|

Re: Minimal logical decoding on standbys

tushar1
On 03/04/2019 10:57 PM, Andres Freund wrote:
> Note that hot_standby_feedback=on needs to be set on a standby, not on
> the primary (although it doesn't do any harm there).

Right, This parameter was enabled on both Master and slave.

Is someone able to reproduce this issue ?

--
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: Minimal logical decoding on standbys

tushar1
In reply to this post by Andres Freund
There is an another issue , where i am getting error while executing
"pg_logical_slot_get_changes" on SLAVE

Master (running on port=5432) -  run "make installcheck"  after setting 
PATH=<installation/bin:$PATH )  and export PGDATABASE=postgres from
regress/ folder
Slave (running on port=5555)  -  Connect to regression database and
select pg_logical_slot_get_changes

[centos@mail-arts bin]$ ./psql postgres -p 5555 -f t.sql
You are now connected to database "regression" as user "centos".
  slot_name |    lsn
-----------+-----------
  m61       | 1/D437AD8
(1 row)

psql:t.sql:3: ERROR:  could not resolve cmin/cmax of catalog tuple

[centos@mail-arts bin]$ cat t.sql
\c regression
SELECT * from   pg_create_logical_replication_slot('m61', 'test_decoding');
select * from pg_logical_slot_get_changes('m61',null,null);

regards,

On 03/04/2019 10:57 PM, Andres Freund wrote:

> Hi,
>
> On 2019-03-04 16:54:32 +0530, tushar wrote:
>> On 03/01/2019 11:16 PM, Andres Freund wrote:
>>> So, if I understand correctly you do*not*  have a phyiscal replication
>>> slot for this standby? For the feature to work reliably that needs to
>>> exist, and you need to have hot_standby_feedback enabled. Does having
>>> that fix the issue?
>> Ok, This time around  - I performed like this -
>>
>> .)Master cluster (set wal_level=logical and hot_standby_feedback=on in
>> postgresql.conf) , start the server and create a physical replication slot
> Note that hot_standby_feedback=on needs to be set on a standby, not on
> the primary (although it doesn't do any harm there).
>
> Thanks,
>
> Andres
>

--
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: Minimal logical decoding on standbys

Amit Khandekar-2
In reply to this post by Amit Khandekar-2
On Mon, 4 Mar 2019 at 14:09, Amit Khandekar <[hidden email]> wrote:

>
> On Fri, 14 Dec 2018 at 06:25, Andres Freund <[hidden email]> wrote:
> > I've a prototype attached, but let's discuss the details in a separate
> > thread. This also needs to be changed for pluggable storage, as we don't
> > know about table access methods in the startup process, so we can't call
> > can't determine which AM the heap is from during
> > btree_xlog_delete_get_latestRemovedXid() (and sibling routines).
>
> Attached is a WIP test patch
> 0003-WIP-TAP-test-for-logical-decoding-on-standby.patch that has a
> modified version of Craig Ringer's test cases

Hi Andres,

I am trying to come up with new testcases to test the recovery
conflict handling. Before that I have some queries :

With Craig Ringer's approach, the way to reproduce the recovery
conflict was, I believe, easy : Do a checkpoint, which will log the
global-catalog-xmin-advance WAL record, due to which the standby -
while replaying the message - may find out that it's a recovery
conflict. But with your approach, the latestRemovedXid is passed only
during specific vacuum-related WAL records, so to reproduce the
recovery conflict error, we need to make sure some specific WAL
records are logged, such as XLOG_BTREE_DELETE. So we need to create a
testcase such that while creating an index tuple, it erases dead
tuples from a page, so that it eventually calls
_bt_vacuum_one_page()=>_bt_delitems_delete(), thus logging a
XLOG_BTREE_DELETE record.

I tried to come up with this reproducible testcase without success.
This seems difficult. Do you have an easier option ? May be we can use
some other WAL records that may have easier more reliable test case
for showing up recovery conflict ?

Further, with your patch, in ResolveRecoveryConflictWithSlots(), it
just throws a WARNING error level; so the wal receiver would not make
the backends throw an error; hence the test case won't catch the
error. Is that right ?


--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Reply | Threaded
Open this post in threaded view
|

Re: Minimal logical decoding on standbys

tushar1
In reply to this post by tushar1
Hi ,

I am getting a server crash on standby while executing
pg_logical_slot_get_changes function   , please refer this scenario

Master cluster( ./initdb -D master)
set wal_level='hot_standby in master/postgresql.conf file
start the server , connect to  psql terminal and create a physical
replication slot ( SELECT * from
pg_create_physical_replication_slot('p1');)

perform pg_basebackup using --slot 'p1'  (./pg_basebackup -D slave/ -R
--slot p1 -v))
set wal_level='logical' , hot_standby_feedback=on,
primary_slot_name='p1' in slave/postgresql.conf file
start the server , connect to psql terminal and create a logical
replication slot (  SELECT * from
pg_create_logical_replication_slot('t','test_decoding');)

run pgbench ( ./pgbench -i -s 10 postgres) on master and select
pg_logical_slot_get_changes on Slave database

postgres=# select * from pg_logical_slot_get_changes('t',null,null);
2019-03-13 20:34:50.274 IST [26817] LOG:  starting logical decoding for
slot "t"
2019-03-13 20:34:50.274 IST [26817] DETAIL:  Streaming transactions
committing after 0/6C000060, reading WAL from 0/6C000028.
2019-03-13 20:34:50.274 IST [26817] STATEMENT:  select * from
pg_logical_slot_get_changes('t',null,null);
2019-03-13 20:34:50.275 IST [26817] LOG:  logical decoding found
consistent point at 0/6C000028
2019-03-13 20:34:50.275 IST [26817] DETAIL:  There are no running
transactions.
2019-03-13 20:34:50.275 IST [26817] STATEMENT:  select * from
pg_logical_slot_get_changes('t',null,null);
TRAP: FailedAssertion("!(data == tupledata + tuplelen)", File:
"decode.c", Line: 977)
server closed the connection unexpectedly
     This probably means the server terminated abnormally
     before or while processing the request.
The connection to the server was lost. Attempting reset: 2019-03-13
20:34:50.276 IST [26809] LOG:  server process (PID 26817) was terminated
by signal 6: Aborted

Stack trace -

(gdb) bt
#0  0x00007f370e673277 in raise () from /lib64/libc.so.6
#1  0x00007f370e674968 in abort () from /lib64/libc.so.6
#2  0x0000000000a30edf in ExceptionalCondition (conditionName=0xc36090
"!(data == tupledata + tuplelen)", errorType=0xc35f5c "FailedAssertion",
fileName=0xc35d70 "decode.c",
     lineNumber=977) at assert.c:54
#3  0x0000000000843c6f in DecodeMultiInsert (ctx=0x2ba1ac8,
buf=0x7ffd7a5136d0) at decode.c:977
#4  0x0000000000842b32 in DecodeHeap2Op (ctx=0x2ba1ac8,
buf=0x7ffd7a5136d0) at decode.c:375
#5  0x00000000008424dd in LogicalDecodingProcessRecord (ctx=0x2ba1ac8,
record=0x2ba1d88) at decode.c:125
#6  0x000000000084830d in pg_logical_slot_get_changes_guts
(fcinfo=0x2b95838, confirm=true, binary=false) at logicalfuncs.c:307
#7  0x000000000084846a in pg_logical_slot_get_changes (fcinfo=0x2b95838)
at logicalfuncs.c:376
#8  0x00000000006e5b9f in ExecMakeTableFunctionResult
(setexpr=0x2b93ee8, econtext=0x2b93d98, argContext=0x2b99940,
expectedDesc=0x2b97970, randomAccess=false) at execSRF.c:233
#9  0x00000000006fb738 in FunctionNext (node=0x2b93c80) at
nodeFunctionscan.c:94
#10 0x00000000006e52b1 in ExecScanFetch (node=0x2b93c80,
accessMtd=0x6fb67b <FunctionNext>, recheckMtd=0x6fba77
<FunctionRecheck>) at execScan.c:93
#11 0x00000000006e5326 in ExecScan (node=0x2b93c80, accessMtd=0x6fb67b
<FunctionNext>, recheckMtd=0x6fba77 <FunctionRecheck>) at execScan.c:143
#12 0x00000000006fbac1 in ExecFunctionScan (pstate=0x2b93c80) at
nodeFunctionscan.c:270
#13 0x00000000006e3293 in ExecProcNodeFirst (node=0x2b93c80) at
execProcnode.c:445
#14 0x00000000006d8253 in ExecProcNode (node=0x2b93c80) at
../../../src/include/executor/executor.h:241
#15 0x00000000006daa4e in ExecutePlan (estate=0x2b93a28,
planstate=0x2b93c80, use_parallel_mode=false, operation=CMD_SELECT,
sendTuples=true, numberTuples=0,
     direction=ForwardScanDirection, dest=0x2b907e0, execute_once=true)
at execMain.c:1643
#16 0x00000000006d8865 in standard_ExecutorRun (queryDesc=0x2afff28,
direction=ForwardScanDirection, count=0, execute_once=true) at
execMain.c:362
#17 0x00000000006d869b in ExecutorRun (queryDesc=0x2afff28,
direction=ForwardScanDirection, count=0, execute_once=true) at
execMain.c:306
#18 0x00000000008ccef1 in PortalRunSelect (portal=0x2b36168,
forward=true, count=0, dest=0x2b907e0) at pquery.c:929
#19 0x00000000008ccb90 in PortalRun (portal=0x2b36168,
count=9223372036854775807, isTopLevel=true, run_once=true,
dest=0x2b907e0, altdest=0x2b907e0, completionTag=0x7ffd7a513e90 "")
     at pquery.c:770
#20 0x00000000008c6b58 in exec_simple_query (query_string=0x2adc1e8
"select * from pg_logical_slot_get_changes('t',null,null);") at
postgres.c:1215
#21 0x00000000008cae88 in PostgresMain (argc=1, argv=0x2b06590,
dbname=0x2b063d0 "postgres", username=0x2ad8da8 "centos") at postgres.c:4256
#22 0x0000000000828464 in BackendRun (port=0x2afe3b0) at postmaster.c:4399
#23 0x0000000000827c42 in BackendStartup (port=0x2afe3b0) at
postmaster.c:4090
#24 0x0000000000824036 in ServerLoop () at postmaster.c:1703
#25 0x00000000008238ec in PostmasterMain (argc=3, argv=0x2ad6d00) at
postmaster.c:1376
#26 0x0000000000748aab in main (argc=3, argv=0x2ad6d00) at main.c:228
(gdb)

regards,


On 03/07/2019 09:03 PM, tushar wrote:

> There is an another issue , where i am getting error while executing
> "pg_logical_slot_get_changes" on SLAVE
>
> Master (running on port=5432) -  run "make installcheck"  after
> setting  PATH=<installation/bin:$PATH )  and export
> PGDATABASE=postgres from regress/ folder
> Slave (running on port=5555)  -  Connect to regression database and
> select pg_logical_slot_get_changes
>
> [centos@mail-arts bin]$ ./psql postgres -p 5555 -f t.sql
> You are now connected to database "regression" as user "centos".
>  slot_name |    lsn
> -----------+-----------
>  m61       | 1/D437AD8
> (1 row)
>
> psql:t.sql:3: ERROR:  could not resolve cmin/cmax of catalog tuple
>
> [centos@mail-arts bin]$ cat t.sql
> \c regression
> SELECT * from   pg_create_logical_replication_slot('m61',
> 'test_decoding');
> select * from pg_logical_slot_get_changes('m61',null,null);
>
> regards,
>
> On 03/04/2019 10:57 PM, Andres Freund wrote:
>> Hi,
>>
>> On 2019-03-04 16:54:32 +0530, tushar wrote:
>>> On 03/01/2019 11:16 PM, Andres Freund wrote:
>>>> So, if I understand correctly you do*not*  have a phyiscal replication
>>>> slot for this standby? For the feature to work reliably that needs to
>>>> exist, and you need to have hot_standby_feedback enabled. Does having
>>>> that fix the issue?
>>> Ok, This time around  - I performed like this -
>>>
>>> .)Master cluster (set wal_level=logical and hot_standby_feedback=on in
>>> postgresql.conf) , start the server and create a physical
>>> replication slot
>> Note that hot_standby_feedback=on needs to be set on a standby, not on
>> the primary (although it doesn't do any harm there).
>>
>> Thanks,
>>
>> Andres
>>
>

--
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: Minimal logical decoding on standbys

Amit Khandekar-2
In reply to this post by Amit Khandekar-2
On Fri, 8 Mar 2019 at 20:59, Amit Khandekar <[hidden email]> wrote:

>
> On Mon, 4 Mar 2019 at 14:09, Amit Khandekar <[hidden email]> wrote:
> >
> > On Fri, 14 Dec 2018 at 06:25, Andres Freund <[hidden email]> wrote:
> > > I've a prototype attached, but let's discuss the details in a separate
> > > thread. This also needs to be changed for pluggable storage, as we don't
> > > know about table access methods in the startup process, so we can't call
> > > can't determine which AM the heap is from during
> > > btree_xlog_delete_get_latestRemovedXid() (and sibling routines).
> >
> > Attached is a WIP test patch
> > 0003-WIP-TAP-test-for-logical-decoding-on-standby.patch that has a
> > modified version of Craig Ringer's test cases
>
> Hi Andres,
>
> I am trying to come up with new testcases to test the recovery
> conflict handling. Before that I have some queries :
>
> With Craig Ringer's approach, the way to reproduce the recovery
> conflict was, I believe, easy : Do a checkpoint, which will log the
> global-catalog-xmin-advance WAL record, due to which the standby -
> while replaying the message - may find out that it's a recovery
> conflict. But with your approach, the latestRemovedXid is passed only
> during specific vacuum-related WAL records, so to reproduce the
> recovery conflict error, we need to make sure some specific WAL
> records are logged, such as XLOG_BTREE_DELETE. So we need to create a
> testcase such that while creating an index tuple, it erases dead
> tuples from a page, so that it eventually calls
> _bt_vacuum_one_page()=>_bt_delitems_delete(), thus logging a
> XLOG_BTREE_DELETE record.
>
> I tried to come up with this reproducible testcase without success.
> This seems difficult. Do you have an easier option ? May be we can use
> some other WAL records that may have easier more reliable test case
> for showing up recovery conflict ?
>

I managed to get a recovery conflict by :
1. Setting hot_standby_feedback to off
2. Creating a logical replication slot on standby
3. Creating a table on master, and insert some data.
2. Running : VACUUM FULL;

This gives WARNING messages in the standby log file.
2019-03-14 14:57:56.833 IST [40076] WARNING:  slot decoding_standby w/
catalog xmin 474 conflicts with removed xid 477
2019-03-14 14:57:56.833 IST [40076] CONTEXT:  WAL redo at 0/3069E98
for Heap2/CLEAN: remxid 477

But I did not add such a testcase into the test file, because with the
current patch, it does not do anything with the slot; it just keeps on
emitting WARNING in the log file; so we can't test this scenario as of
now using the tap test.


> Further, with your patch, in ResolveRecoveryConflictWithSlots(), it
> just throws a WARNING error level; so the wal receiver would not make
> the backends throw an error; hence the test case won't catch the
> error. Is that right ?

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Reply | Threaded
Open this post in threaded view
|

Re: Minimal logical decoding on standbys

Amit Khandekar-2
On Thu, 14 Mar 2019 at 15:00, Amit Khandekar <[hidden email]> wrote:

> I managed to get a recovery conflict by :
> 1. Setting hot_standby_feedback to off
> 2. Creating a logical replication slot on standby
> 3. Creating a table on master, and insert some data.
> 2. Running : VACUUM FULL;
>
> This gives WARNING messages in the standby log file.
> 2019-03-14 14:57:56.833 IST [40076] WARNING:  slot decoding_standby w/
> catalog xmin 474 conflicts with removed xid 477
> 2019-03-14 14:57:56.833 IST [40076] CONTEXT:  WAL redo at 0/3069E98
> for Heap2/CLEAN: remxid 477
>
> But I did not add such a testcase into the test file, because with the
> current patch, it does not do anything with the slot; it just keeps on
> emitting WARNING in the log file; so we can't test this scenario as of
> now using the tap test.

I am going ahead with drop-the-slot way of handling the recovery
conflict. I am trying out using ReplicationSlotDropPtr() to drop the
slot. It seems the required locks are already in place inside the for
loop of ResolveRecoveryConflictWithSlots(), so we can directly call
ReplicationSlotDropPtr() when the slot xmin conflict is found.

As explained above, the only way I could reproduce the conflict is by
turning hot_standby_feedback off on slave, creating and inserting into
a table on master and then running VACUUM FULL. But after doing this,
I am not able to verify whether the slot is dropped, because on slave,
any simple psql command thereon, waits on a lock acquired on sys
catache, e.g. pg_authid. Working on it.


--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company


Reply | Threaded
Open this post in threaded view
|

Re: Minimal logical decoding on standbys

Andres Freund
Hi,

On 2019-04-02 15:26:52 +0530, Amit Khandekar wrote:

> On Thu, 14 Mar 2019 at 15:00, Amit Khandekar <[hidden email]> wrote:
> > I managed to get a recovery conflict by :
> > 1. Setting hot_standby_feedback to off
> > 2. Creating a logical replication slot on standby
> > 3. Creating a table on master, and insert some data.
> > 2. Running : VACUUM FULL;
> >
> > This gives WARNING messages in the standby log file.
> > 2019-03-14 14:57:56.833 IST [40076] WARNING:  slot decoding_standby w/
> > catalog xmin 474 conflicts with removed xid 477
> > 2019-03-14 14:57:56.833 IST [40076] CONTEXT:  WAL redo at 0/3069E98
> > for Heap2/CLEAN: remxid 477
> >
> > But I did not add such a testcase into the test file, because with the
> > current patch, it does not do anything with the slot; it just keeps on
> > emitting WARNING in the log file; so we can't test this scenario as of
> > now using the tap test.
>
> I am going ahead with drop-the-slot way of handling the recovery
> conflict. I am trying out using ReplicationSlotDropPtr() to drop the
> slot. It seems the required locks are already in place inside the for
> loop of ResolveRecoveryConflictWithSlots(), so we can directly call
> ReplicationSlotDropPtr() when the slot xmin conflict is found.

Cool.


> As explained above, the only way I could reproduce the conflict is by
> turning hot_standby_feedback off on slave, creating and inserting into
> a table on master and then running VACUUM FULL. But after doing this,
> I am not able to verify whether the slot is dropped, because on slave,
> any simple psql command thereon, waits on a lock acquired on sys
> catache, e.g. pg_authid. Working on it.

I think that indicates a bug somewhere. If replay progressed, it should
have killed the slot, and continued replaying past the VACUUM
FULL. Those symptoms suggest replay is stuck somewhere. I suggest a)
compiling with WAL_DEBUG enabled, and turning on wal_debug=1, b) looking
at a backtrace of the startup process.

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: Minimal logical decoding on standbys

Amit Khandekar-2
On Tue, 2 Apr 2019 at 21:34, Andres Freund <[hidden email]> wrote:

>
> Hi,
>
> On 2019-04-02 15:26:52 +0530, Amit Khandekar wrote:
> > On Thu, 14 Mar 2019 at 15:00, Amit Khandekar <[hidden email]> wrote:
> > > I managed to get a recovery conflict by :
> > > 1. Setting hot_standby_feedback to off
> > > 2. Creating a logical replication slot on standby
> > > 3. Creating a table on master, and insert some data.
> > > 2. Running : VACUUM FULL;
> > >
> > > This gives WARNING messages in the standby log file.
> > > 2019-03-14 14:57:56.833 IST [40076] WARNING:  slot decoding_standby w/
> > > catalog xmin 474 conflicts with removed xid 477
> > > 2019-03-14 14:57:56.833 IST [40076] CONTEXT:  WAL redo at 0/3069E98
> > > for Heap2/CLEAN: remxid 477
> > >
> > > But I did not add such a testcase into the test file, because with the
> > > current patch, it does not do anything with the slot; it just keeps on
> > > emitting WARNING in the log file; so we can't test this scenario as of
> > > now using the tap test.
> >
> > I am going ahead with drop-the-slot way of handling the recovery
> > conflict. I am trying out using ReplicationSlotDropPtr() to drop the
> > slot. It seems the required locks are already in place inside the for
> > loop of ResolveRecoveryConflictWithSlots(), so we can directly call
> > ReplicationSlotDropPtr() when the slot xmin conflict is found.
>
> Cool.
>
>
> > As explained above, the only way I could reproduce the conflict is by
> > turning hot_standby_feedback off on slave, creating and inserting into
> > a table on master and then running VACUUM FULL. But after doing this,
> > I am not able to verify whether the slot is dropped, because on slave,
> > any simple psql command thereon, waits on a lock acquired on sys
> > catache, e.g. pg_authid. Working on it.
>
> I think that indicates a bug somewhere. If replay progressed, it should
> have killed the slot, and continued replaying past the VACUUM
> FULL. Those symptoms suggest replay is stuck somewhere. I suggest a)
> compiling with WAL_DEBUG enabled, and turning on wal_debug=1, b) looking
> at a backtrace of the startup process.
Oops, it was my own change that caused the hang. Sorry for the noise.
After using wal_debug, found out that after replaying the LOCK records
for the catalog pg_auth, it was not releasing it because it had
actually got stuck in ReplicationSlotDropPtr() itself. In
ResolveRecoveryConflictWithSlots(), a shared
ReplicationSlotControlLock was already held before iterating through
the slots, and now ReplicationSlotDropPtr() again tries to take the
same lock in exclusive mode for setting slot->in_use, leading to a
deadlock. I fixed that by releasing the shared lock before calling
ReplicationSlotDropPtr(), and then re-starting the slots' scan over
again since we released it. We do similar thing for
ReplicationSlotCleanup().

Attached is a rebased version of your patch
logical-decoding-on-standby.patch. This v2 version also has the above
changes. It also includes the tap test file which is still in WIP
state, mainly because I have yet to add the conflict recovery handling
scenarios.

I see that you have already committed the
move-latestRemovedXid-computation-for-nbtree-xlog related changes.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

logical-decoding-on-standby_v2.patch (52K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Minimal logical decoding on standbys

Amit Khandekar-2
On Wed, 3 Apr 2019 at 19:57, Amit Khandekar <[hidden email]> wrote:

> Oops, it was my own change that caused the hang. Sorry for the noise.
> After using wal_debug, found out that after replaying the LOCK records
> for the catalog pg_auth, it was not releasing it because it had
> actually got stuck in ReplicationSlotDropPtr() itself. In
> ResolveRecoveryConflictWithSlots(), a shared
> ReplicationSlotControlLock was already held before iterating through
> the slots, and now ReplicationSlotDropPtr() again tries to take the
> same lock in exclusive mode for setting slot->in_use, leading to a
> deadlock. I fixed that by releasing the shared lock before calling
> ReplicationSlotDropPtr(), and then re-starting the slots' scan over
> again since we released it. We do similar thing for
> ReplicationSlotCleanup().
>
> Attached is a rebased version of your patch
> logical-decoding-on-standby.patch. This v2 version also has the above
> changes. It also includes the tap test file which is still in WIP
> state, mainly because I have yet to add the conflict recovery handling
> scenarios.
Attached v3 patch includes a new scenario to test conflict recovery
handling by verifying that the conflicting slot gets dropped.

WIth this, I am done with the test changes, except the below question
that I had posted earlier which I would like to have inputs :

Regarding the test result failures, I could see that when we drop a
logical replication slot at standby server, then the catalog_xmin of
physical replication slot becomes NULL, whereas the test expects it to
be equal to xmin; and that's the reason a couple of test scenarios are
failing :

ok 33 - slot on standby dropped manually
Waiting for replication conn replica's replay_lsn to pass '0/31273E0' on master
done
not ok 34 - physical catalog_xmin still non-null
not ok 35 - xmin and catalog_xmin equal after slot drop
#   Failed test 'xmin and catalog_xmin equal after slot drop'
#   at t/016_logical_decoding_on_replica.pl line 272.
#          got:
#     expected: 2584

I am not sure what is expected. What actually happens is : the
physical xlot catalog_xmin remains NULL initially, but becomes
non-NULL after the logical replication slot is created on standby.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

logical-decoding-on-standby_v3.patch (54K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Minimal logical decoding on standbys

Andres Freund
Hi,

Thanks for the new version of the patch.  Btw, could you add Craig as a
co-author in the commit message of the next version of the patch? Don't
want to forget him.

On 2019-04-05 17:08:39 +0530, Amit Khandekar wrote:

> Regarding the test result failures, I could see that when we drop a
> logical replication slot at standby server, then the catalog_xmin of
> physical replication slot becomes NULL, whereas the test expects it to
> be equal to xmin; and that's the reason a couple of test scenarios are
> failing :
>
> ok 33 - slot on standby dropped manually
> Waiting for replication conn replica's replay_lsn to pass '0/31273E0' on master
> done
> not ok 34 - physical catalog_xmin still non-null
> not ok 35 - xmin and catalog_xmin equal after slot drop
> #   Failed test 'xmin and catalog_xmin equal after slot drop'
> #   at t/016_logical_decoding_on_replica.pl line 272.
> #          got:
> #     expected: 2584
>
> I am not sure what is expected. What actually happens is : the
> physical xlot catalog_xmin remains NULL initially, but becomes
> non-NULL after the logical replication slot is created on standby.

That seems like the correct behaviour to me - why would we still have a
catalog xmin if there's no slot logical slot?


> diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
> index 006446b..5785d2f 100644
> --- a/src/backend/replication/slot.c
> +++ b/src/backend/replication/slot.c
> @@ -1064,6 +1064,85 @@ ReplicationSlotReserveWal(void)
>   }
>  }
>  
> +void
> +ResolveRecoveryConflictWithSlots(Oid dboid, TransactionId xid)
> +{
> + int i;
> + bool found_conflict = false;
> +
> + if (max_replication_slots <= 0)
> + return;
> +
> +restart:
> + if (found_conflict)
> + {
> + CHECK_FOR_INTERRUPTS();
> + /*
> + * Wait awhile for them to die so that we avoid flooding an
> + * unresponsive backend when system is heavily loaded.
> + */
> + pg_usleep(100000);
> + found_conflict = false;
> + }
> +
> + LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
> + for (i = 0; i < max_replication_slots; i++)
> + {
> + ReplicationSlot *s;
> + NameData slotname;
> + TransactionId slot_xmin;
> + TransactionId slot_catalog_xmin;
> +
> + s = &ReplicationSlotCtl->replication_slots[i];
> +
> + /* cannot change while ReplicationSlotCtlLock is held */
> + if (!s->in_use)
> + continue;
> +
> + /* not our database, skip */
> + if (s->data.database != InvalidOid && s->data.database != dboid)
> + continue;
> +
> + SpinLockAcquire(&s->mutex);
> + slotname = s->data.name;
> + slot_xmin = s->data.xmin;
> + slot_catalog_xmin = s->data.catalog_xmin;
> + SpinLockRelease(&s->mutex);
> +
> + if (TransactionIdIsValid(slot_xmin) && TransactionIdPrecedesOrEquals(slot_xmin, xid))
> + {
> + found_conflict = true;
> +
> + ereport(WARNING,
> + (errmsg("slot %s w/ xmin %u conflicts with removed xid %u",
> + NameStr(slotname), slot_xmin, xid)));
> + }
> +
> + if (TransactionIdIsValid(slot_catalog_xmin) && TransactionIdPrecedesOrEquals(slot_catalog_xmin, xid))
> + {
> + found_conflict = true;
> +
> + ereport(WARNING,
> + (errmsg("slot %s w/ catalog xmin %u conflicts with removed xid %u",
> + NameStr(slotname), slot_catalog_xmin, xid)));
> + }
> +
> +
> + if (found_conflict)
> + {
> + elog(WARNING, "Dropping conflicting slot %s", s->data.name.data);
> + LWLockRelease(ReplicationSlotControlLock); /* avoid deadlock */
> + ReplicationSlotDropPtr(s);
> +
> + /* We released the lock above; so re-scan the slots. */
> + goto restart;
> + }
> + }
>
I think this should be refactored so that the two found_conflict cases
set a 'reason' variable (perhaps an enum?) to the particular reason, and
then only one warning should be emitted.  I also think that LOG might be
more appropriate than WARNING - as confusing as that is, LOG is more
severe than WARNING (see docs about log_min_messages).


> @@ -0,0 +1,386 @@
> +# Demonstrate that logical can follow timeline switches.
> +#
> +# Test logical decoding on a standby.
> +#
> +use strict;
> +use warnings;
> +use 5.8.0;
> +
> +use PostgresNode;
> +use TestLib;
> +use Test::More tests => 55;
> +use RecursiveCopy;
> +use File::Copy;
> +
> +my ($stdin, $stdout, $stderr, $ret, $handle, $return);
> +my $backup_name;
> +
> +# Initialize master node
> +my $node_master = get_new_node('master');
> +$node_master->init(allows_streaming => 1, has_archiving => 1);
> +$node_master->append_conf('postgresql.conf', q{
> +wal_level = 'logical'
> +max_replication_slots = 4
> +max_wal_senders = 4
> +log_min_messages = 'debug2'
> +log_error_verbosity = verbose
> +# send status rapidly so we promptly advance xmin on master
> +wal_receiver_status_interval = 1
> +# very promptly terminate conflicting backends
> +max_standby_streaming_delay = '2s'
> +});
> +$node_master->dump_info;
> +$node_master->start;
> +
> +$node_master->psql('postgres', q[CREATE DATABASE testdb]);
> +
> +$node_master->safe_psql('testdb', q[SELECT * FROM pg_create_physical_replication_slot('decoding_standby');]);
> +$backup_name = 'b1';
> +my $backup_dir = $node_master->backup_dir . "/" . $backup_name;
> +TestLib::system_or_bail('pg_basebackup', '-D', $backup_dir, '-d', $node_master->connstr('testdb'), '--slot=decoding_standby');
> +
> +sub print_phys_xmin
> +{
> + my $slot = $node_master->slot('decoding_standby');
> + return ($slot->{'xmin'}, $slot->{'catalog_xmin'});
> +}
> +
> +my ($xmin, $catalog_xmin) = print_phys_xmin();
> +# After slot creation, xmins must be null
> +is($xmin, '', "xmin null");
> +is($catalog_xmin, '', "catalog_xmin null");
> +
> +my $node_replica = get_new_node('replica');
> +$node_replica->init_from_backup(
> + $node_master, $backup_name,
> + has_streaming => 1,
> + has_restoring => 1);
> +$node_replica->append_conf('postgresql.conf',
> + q[primary_slot_name = 'decoding_standby']);
> +
> +$node_replica->start;
> +$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
> +
> +# with hot_standby_feedback off, xmin and catalog_xmin must still be null
> +($xmin, $catalog_xmin) = print_phys_xmin();
> +is($xmin, '', "xmin null after replica join");
> +is($catalog_xmin, '', "catalog_xmin null after replica join");
> +
> +$node_replica->append_conf('postgresql.conf',q[
> +hot_standby_feedback = on
> +]);
> +$node_replica->restart;
> +sleep(2); # ensure walreceiver feedback sent

Can we make this more robust? E.g. by waiting till pg_stat_replication
shows the change on the primary? Because I can guarantee that this'll
fail on slow buildfarm machines (say the valgrind animals).




> +$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
> +sleep(2); # ensure walreceiver feedback sent

Similar.


Greetings,

Andres Freund


1234