Hello,
We have a PG 9.6.19/9.6.20 streaming replication set using PGDG packages over Debian 9 and 10. We started experiencing an interesting issue yesterday. This is the layout; all arrows are stream replication built using `pg_basebackup` over replication slots. Master N0 (9.6.19 / Debian 9) -> Replica A (9.6.19 / Debian 9) | | | +-----------------> Replica B (9.6.20 / Debian 10) | v Replica N1 (9.6.20 / Debian 9) -> Replica C (9.6.20 / Debian 9) | +------------------> Replica D (9.6.20 / Debian 10) We have a query that is a simple `select * from table where pk = 'fixed value'`. It fetches a *single* row vía a PK Index Scan using `=` on a TEXT value, no implicit conversions. Said query used to work fine across the replication set. After updating from 9.6.19 to 9.6.20, we noticed the query was not working on D. It did NOT bring any rows. If the query is attempted using LIKE or ~ (with a left-anchored pattern), the correct row is fetched BUT using a Sequential Scan. We checked B, same behavior. However, the query works normally (single row, PK IDX scan) on N0, A, N1, and C. We've tried: 1. REINDEX table / VACUUM ANALYZE table -- nothing changed. 2. REINDEX DATABASE / REINDEX SYSTEM on N0 -- nothing changed. 3. VACUUM FULL ANALYZE on N0 -- nothing changed. 3. Re-creating replicas B (and D) from scratch (remove $PGDATA, `pg_basebackup`, use replication slot) -- nothing changed. 4. Creating a NEW replica X (9.6.20 / Debian 10) on a DIFFERENT machine. First, using N1 as master, then using N0 as master. Nothing changes. The query NEVER works on replicas B, D, nor the NEW X. Looks like a the 9.6.20 over Debian 10 is the culprit, but I have nothing else to work on. Suggestions? -- Ernesto Hernández-Novich - @iamemhn - Unix: Live free or die! Geek by nature, Linux by choice, Debian of course. If you can't aptitude it, it isn't useful or doesn't exist. GPG Key Fingerprint = 0064 ADF5 EB5C DE16 99C1 6C56 F2A3 86B5 A757 E5A1 |
On Sat, Nov 14, 2020 at 12:17 AM Ernesto Hernández-Novich <[hidden email]> wrote: Hello, These issues are almost certainly because of the glibc locale changes between Debian 9 and Debian 10, and not because of the PostgreSQL upgrade. If you have master and standby on different glibc versions (so debian 9 vs 10), all text based indexes can behave differently. All the nodes must run the same version for this to work. So in this scenario, you need to reinstall B and D with debian 9, or you need to upgrade your other nodes to debian 10 (and if you do that, then you have to reindex the master node once it has been upgraded). |
In reply to this post by Ernesto Hernández-Novich
Hi,
On 2020-11-13 15:14:23 -0800, Ernesto Hernández-Novich wrote: > We have a PG 9.6.19/9.6.20 streaming replication set using PGDG > packages over Debian 9 and 10. We started experiencing an interesting > issue yesterday. This is the layout; all arrows are stream replication > built using `pg_basebackup` over replication slots. > > Master N0 (9.6.19 / Debian 9) -> Replica A (9.6.19 / Debian 9) > | | > | +-----------------> Replica B (9.6.20 / Debian 10) > | > v > Replica N1 (9.6.20 / Debian 9) -> Replica C (9.6.20 / Debian 9) > | > +------------------> Replica D (9.6.20 / Debian 10) > > We have a query that is a simple `select * from table where pk = 'fixed > value'`. It fetches a *single* row vía a PK Index Scan using `=` on a > TEXT value, no implicit conversions. Said query used to work fine > across the replication set. > > After updating from 9.6.19 to 9.6.20, we noticed the query was not > working on D. It did NOT bring any rows. If the query is attempted > using LIKE or ~ (with a left-anchored pattern), the correct row is > fetched BUT using a Sequential Scan. We checked B, same behavior. > However, the query works normally (single row, PK IDX scan) on N0, A, > N1, and C. That likely is related to the collation definitions differing between the systems. Note that the systems where it doesn't work are Debian 10, whereas the rest is Debian 9. See https://wiki.postgresql.org/wiki/Locale_data_changes Greetings, Andres Freund |
In reply to this post by Magnus Hagander-2
On 2020-11-14 00:59:01 +0100, Magnus Hagander wrote:
> So in this scenario, you need to reinstall B and D with debian 9, or > you need to upgrade your other nodes to debian 10 (and if you do that, > then you have to reindex the master node once it has been upgraded). Changing the collation of all columns to "C" would probably also work. |
In reply to this post by Magnus Hagander-2
On Sat, 2020-11-14 at 00:59 +0100, Magnus Hagander wrote:
> On Sat, Nov 14, 2020 at 12:17 AM Ernesto Hernández-Novich < > [hidden email]> wrote: [...] > > After updating from 9.6.19 to 9.6.20, we noticed the query was not > > working on D. [...] > > Looks like a the 9.6.20 over Debian 10 is the culprit, but I have > > nothing else to work on. > > These issues are almost certainly because of the glibc locale changes > between Debian 9 and Debian 10, and not because of the PostgreSQL > upgrade. > > If you have master and standby on different glibc versions (so debian > 9 vs 10), all text based indexes can behave differently. All the > nodes must run the same version for this to work. So in this > scenario, you need to reinstall B and D with debian 9, or you need to > upgrade your other nodes to debian 10 (and if you do that, then you > have to reindex the master node once it has been upgraded). I agree on the locale matching and will try your suggestion. I must say the Debian 10 machines were working just fine with PG 9.6.19 for over two months or so. We noticed the issue *just* this week after upgrading from 9.6.19 onto 9.6.20. Regards, -- Ernesto Hernández-Novich - @iamemhn - Unix: Live free or die! Geek by nature, Linux by choice, Debian of course. If you can't aptitude it, it isn't useful or doesn't exist. GPG Key Fingerprint = 0064 ADF5 EB5C DE16 99C1 6C56 F2A3 86B5 A757 E5A1 |
On Sat, Nov 14, 2020 at 1:10 AM Ernesto Hernández-Novich <[hidden email]> wrote: On Sat, 2020-11-14 at 00:59 +0100, Magnus Hagander wrote: Most likely they weren't, but they looked like they were. It could be that the problem wasn't triggered until now, but it could also be that it "sometimes worked and sometimes didn't", and you just didn't notice -- I've come across several installations that had exactly that problem. |
Free forum by Nabble | Edit this page |