logical replication - negative bitmapset member not allowed

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

logical replication - negative bitmapset member not allowed

Tim Clarke-3
I'm getting this message every 5 seconds on a single-master,
single-slave replication of PG10.7->PG10.7 both on Centos. Its over the
'net but otherwise seems to perform excellently. Any ideas what's
causing it and how to fix?

--
Tim Clarke
IT Director
Direct: +44 (0)1376 504510 | Mobile: +44 (0)7887 563420



Main: +44 (0)1376 503500 | Fax: +44 (0)1376 503550
Web: https://www.manifest.co.uk/



Minerva Analytics Ltd
9 Freebournes Court | Newland Street | Witham | Essex | CM8 2BL | England

----------------------------------------------------------------------------------------------------------------------------

Copyright: This e-mail may contain confidential or legally privileged information. If you are not the named addressee you must not use or disclose such information, instead please report it to [hidden email]<mailto:[hidden email]>
Legal:  Minerva Analytics is the trading name of: Minerva Analytics
Ltd: Registered in England Number 11260966 & The Manifest Voting Agency Ltd: Registered in England Number 2920820 Registered Office at above address. Please Click Here >><https://www.manifest.co.uk/legal/> for further information.
Reply | Threaded
Open this post in threaded view
|

Re: logical replication - negative bitmapset member not allowed

Tom Lane-2
Tim Clarke <[hidden email]> writes:
> I'm getting this message every 5 seconds on a single-master,
> single-slave replication of PG10.7->PG10.7 both on Centos. Its over the
> 'net but otherwise seems to perform excellently. Any ideas what's
> causing it and how to fix?

That'd certainly be a bug, but we'd need to reproduce it to fix it.
What are you doing that's different from everybody else?  Can you
provide any other info to narrow down the problem?

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: logical replication - negative bitmapset member not allowed

Tim Clarke-3
Dang. I just replicated ~380 tables. One was missing an index so I
paused replication, added a unique key on publisher and subscriber,
re-enabled replication and refreshed the subscription.

The table has only 7 columns, I added a primary key with a default value
from a new sequence.

Tim Clarke
IT Director
Direct: +44 (0)1376 504510 | Mobile: +44 (0)7887 563420

On 01/04/2019 15:02, Tom Lane wrote:

> Tim Clarke <[hidden email]> writes:
>> I'm getting this message every 5 seconds on a single-master,
>> single-slave replication of PG10.7->PG10.7 both on Centos. Its over the
>> 'net but otherwise seems to perform excellently. Any ideas what's
>> causing it and how to fix?
> That'd certainly be a bug, but we'd need to reproduce it to fix it.
> What are you doing that's different from everybody else?  Can you
> provide any other info to narrow down the problem?
>
> regards, tom lane


Main: +44 (0)1376 503500 | Fax: +44 (0)1376 503550
Web: https://www.manifest.co.uk/



Minerva Analytics Ltd
9 Freebournes Court | Newland Street | Witham | Essex | CM8 2BL | England

----------------------------------------------------------------------------------------------------------------------------

Copyright: This e-mail may contain confidential or legally privileged information. If you are not the named addressee you must not use or disclose such information, instead please report it to [hidden email]<mailto:[hidden email]>
Legal:  Minerva Analytics is the trading name of: Minerva Analytics
Ltd: Registered in England Number 11260966 & The Manifest Voting Agency Ltd: Registered in England Number 2920820 Registered Office at above address. Please Click Here >><https://www.manifest.co.uk/legal/> for further information.
Reply | Threaded
Open this post in threaded view
|

Re: logical replication - negative bitmapset member not allowed

Alvaro Herrera-9
In reply to this post by Tom Lane-2
On 2019-Apr-01, Tom Lane wrote:

> Tim Clarke <[hidden email]> writes:
> > I'm getting this message every 5 seconds on a single-master,
> > single-slave replication of PG10.7->PG10.7 both on Centos. Its over the
> > 'net but otherwise seems to perform excellently. Any ideas what's
> > causing it and how to fix?
>
> That'd certainly be a bug, but we'd need to reproduce it to fix it.
> What are you doing that's different from everybody else?  Can you
> provide any other info to narrow down the problem?

Maybe the replica identity of a table got set to a unique index on oid?
Or something else involving system columns?  (If replication is
otherwise working, the I suppose there's a separate publication that's
having the error; the first thing to isolate would be to see what tables
are involved in that publication).

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: logical replication - negative bitmapset member not allowed

Tom Lane-2
In reply to this post by Tim Clarke-3
Tim Clarke <[hidden email]> writes:
> Dang. I just replicated ~380 tables. One was missing an index so I
> paused replication, added a unique key on publisher and subscriber,
> re-enabled replication and refreshed the subscription.

Well, that's not much help :-(.  Can you provide any info to narrow
down where this is happening?  I mean, you haven't even told us whether
it's the primary or the slave that is complaining.  Does it seem to
be associated with any particular command?  (Turning on log_statement
and/or log_replication_commands would likely help with that.)  Does
data seem to be getting transferred despite the complaint?  If not,
what's missing on the slave?

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: logical replication - negative bitmapset member not allowed

Tim Clarke-3
On 02/04/2019 14:59, Tom Lane wrote:
> Well, that's not much help :-(.  Can you provide any info to narrow
> down where this is happening?  I mean, you haven't even told us whether
> it's the primary or the slave that is complaining.  Does it seem to
> be associated with any particular command?  (Turning on log_statement
> and/or log_replication_commands would likely help with that.)  Does
> data seem to be getting transferred despite the complaint?  If not,
> what's missing on the slave?
>
> regards, tom lane


I've been working to narrow it, the error is being reported on the slave.

The only schema changes have been the two primary keys added to two
tables. The problem occurred during this cycle:

1) Replication proceeding fine for ~380 tables, all added individually
not "all tables".

2) Add primary key on master.

3) Add primary key on slave.

4) Refresh subscription on slave; error starts being reported.

I've cleared it by dropping the slave database, re-creating from the
live schema then fully replicating. Its all running happily now.


Tim Clarke



Main: +44 (0)1376 503500 | Fax: +44 (0)1376 503550
Web: https://www.manifest.co.uk/



Minerva Analytics Ltd
9 Freebournes Court | Newland Street | Witham | Essex | CM8 2BL | England

----------------------------------------------------------------------------------------------------------------------------

Copyright: This e-mail may contain confidential or legally privileged information. If you are not the named addressee you must not use or disclose such information, instead please report it to [hidden email]<mailto:[hidden email]>
Legal:  Minerva Analytics is the trading name of: Minerva Analytics
Ltd: Registered in England Number 11260966 & The Manifest Voting Agency Ltd: Registered in England Number 2920820 Registered Office at above address. Please Click Here >><https://www.manifest.co.uk/legal/> for further information.
Reply | Threaded
Open this post in threaded view
|

Re: logical replication - negative bitmapset member not allowed

Tom Lane-2
Tim Clarke <[hidden email]> writes:
> I've cleared it by dropping the slave database, re-creating from the
> live schema then fully replicating. Its all running happily now.

I'm glad you're out of the woods, but we still have a bug there
waiting to bite the next person.  I wonder if you'd be willing to
spend some time trying to develop a reproduction sequence for this
(obviously, working on a test setup not your live servers).
Presumably there's something in the subscription-alteration logic
that needs work, but I don't think we have enough detail here for
somebody else to reproduce the error without a lot of guesswork.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: logical replication - negative bitmapset member not allowed

Tim Clarke-3
On 02/04/2019 15:46, Tom Lane wrote:
> I'm glad you're out of the woods, but we still have a bug there
> waiting to bite the next person.  I wonder if you'd be willing to
> spend some time trying to develop a reproduction sequence for this
> (obviously, working on a test setup not your live servers).
> Presumably there's something in the subscription-alteration logic
> that needs work, but I don't think we have enough detail here for
> somebody else to reproduce the error without a lot of guesswork.
>
> regards, tom lane


I'll do what I can :)


Tim Clarke



Main: +44 (0)1376 503500 | Fax: +44 (0)1376 503550
Web: https://www.manifest.co.uk/



Minerva Analytics Ltd
9 Freebournes Court | Newland Street | Witham | Essex | CM8 2BL | England

----------------------------------------------------------------------------------------------------------------------------

Copyright: This e-mail may contain confidential or legally privileged information. If you are not the named addressee you must not use or disclose such information, instead please report it to [hidden email]<mailto:[hidden email]>
Legal:  Minerva Analytics is the trading name of: Minerva Analytics
Ltd: Registered in England Number 11260966 & The Manifest Voting Agency Ltd: Registered in England Number 2920820 Registered Office at above address. Please Click Here >><https://www.manifest.co.uk/legal/> for further information.
Reply | Threaded
Open this post in threaded view
|

Re: logical replication - negative bitmapset member not allowed

Peter Eisentraut-6
In reply to this post by Alvaro Herrera-9
On 2019-04-01 23:43, Alvaro Herrera wrote:
> Maybe the replica identity of a table got set to a unique index on oid?
> Or something else involving system columns?  (If replication is
> otherwise working, the I suppose there's a separate publication that's
> having the error; the first thing to isolate would be to see what tables
> are involved in that publication).

Looking through the code, the bms_add_member() call in
logicalrep_read_attrs() does not use the usual
FirstLowInvalidHeapAttributeNumber offset, so that seems like a possible
problem.

However, I can't quite reproduce this.  There are various other checks
that prevent this scenario, but it's plausible that with a bit of
whacking around you could hit this error message.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: logical replication - negative bitmapset member not allowed

Tim Clarke-3
On 04/04/2019 22:37, Peter Intrauterine wrote:

> On 2019-04-01 23:43, Alvaro Herrera wrote:
>> Maybe the replica identity of a table got set to a unique index on oid?
>> Or something else involving system columns?  (If replication is
>> otherwise working, the I suppose there's a separate publication that's
>> having the error; the first thing to isolate would be to see what tables
>> are involved in that publication).
> Looking through the code, the bms_add_member() call in
> logicalrep_read_attrs() does not use the usual
> FirstLowInvalidHeapAttributeNumber offset, so that seems like a possible
> problem.
>
> However, I can't quite reproduce this.  There are various other checks
> that prevent this scenario, but it's plausible that with a bit of
> whacking around you could hit this error message.
>

Promise I've not been whacking around......


Tim Clarke



Main: +44 (0)1376 503500 | Fax: +44 (0)1376 503550
Web: https://www.manifest.co.uk/



Minerva Analytics Ltd
9 Freebournes Court | Newland Street | Witham | Essex | CM8 2BL | England

----------------------------------------------------------------------------------------------------------------------------

Copyright: This e-mail may contain confidential or legally privileged information. If you are not the named addressee you must not use or disclose such information, instead please report it to [hidden email]<mailto:[hidden email]>
Legal:  Minerva Analytics is the trading name of: Minerva Analytics
Ltd: Registered in England Number 11260966 & The Manifest Voting Agency Ltd: Registered in England Number 2920820 Registered Office at above address. Please Click Here >><https://www.manifest.co.uk/legal/> for further information.
Reply | Threaded
Open this post in threaded view
|

Re: logical replication - negative bitmapset member not allowed

Jehan-Guillaume de Rorthais
In reply to this post by Peter Eisentraut-6
Hello,

On Thu, 4 Apr 2019 23:37:04 +0200
Peter Eisentraut <[hidden email]> wrote:

> On 2019-04-01 23:43, Alvaro Herrera wrote:
> > Maybe the replica identity of a table got set to a unique index on oid?
> > Or something else involving system columns?  (If replication is
> > otherwise working, the I suppose there's a separate publication that's
> > having the error; the first thing to isolate would be to see what tables
> > are involved in that publication).  
>
> Looking through the code, the bms_add_member() call in
> logicalrep_read_attrs() does not use the usual
> FirstLowInvalidHeapAttributeNumber offset, so that seems like a possible
> problem.
>
> However, I can't quite reproduce this.  There are various other checks
> that prevent this scenario, but it's plausible that with a bit of
> whacking around you could hit this error message.

Here is a script to reproduce it under version 10, 11 and 12:

################################################
# env
PUB=/tmp/pub
SUB=/tmp/sub
unset PGPORT PGHOST PGDATABASE PGDATA
export PGUSER=postgres

# cleanup
kill %1
pg_ctl -w -s -D "$PUB" -m immediate stop; echo $?
pg_ctl -w -s -D "$SUB" -m immediate stop; echo $?
rm -r "$PUB" "$SUB"

# cluster
initdb -U postgres -N "$PUB" &>/dev/null; echo $?
initdb -U postgres -N "$SUB" &>/dev/null; echo $?
echo "wal_level=logical" >> "$PUB"/postgresql.conf
echo "port=5433" >> "$SUB"/postgresql.conf
pg_ctl -w -s -D $PUB -l "$PUB"-"$(date +%FT%T)".log start; echo $?
pg_ctl -w -s -D $SUB -l "$SUB"-"$(date +%FT%T)".log start; echo $?
pgbench -p 5432 -qi
pg_dump -p 5432 -s | psql -qXp 5433

# fake activity
pgbench -p 5432 -T 300 -c 2 &

# replication setup
psql -p 5432 -Xc "CREATE PUBLICATION prov FOR ALL TABLES"
psql -p 5433 -Xc "CREATE SUBSCRIPTION sub
                  CONNECTION 'port=5432'
                  PUBLICATION prov"

# wait for the streaming
unset V;
while [ "$V" != "streaming" ]; do sleep 1
    V=$(psql -AtXc "SELECT 'streaming'
                    FROM pg_stat_replication WHERE state='streaming'")
done

# trigger the error message
psql -p 5433 -Xc "ALTER SUBSCRIPTION sub DISABLE"
psql -p 5433 -Xc "ALTER TABLE pgbench_history ADD id SERIAL PRIMARY KEY"
psql -p 5432 -Xc "ALTER TABLE pgbench_history ADD id SERIAL PRIMARY KEY"
psql -p 5433 -Xc "ALTER SUBSCRIPTION sub ENABLE"
################################################

Regards,