Quantcast

BUG #14635: Query is executed slower on hot standby slave database then on master database

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

BUG #14635: Query is executed slower on hot standby slave database then on master database

vgomenyuk
The following bug has been logged on the website:

Bug reference:      14635
Logged by:          Vitaliy Gomenyuk
Email address:      [hidden email]
PostgreSQL version: 9.2.14
Operating system:   CentOS 6
Description:        

Hello,

We have two equal servers (CPU, RAM, HDD).
We are using Postgresql 9.2.14 currently.
Have setup master-slave streaming replication.

Running next query at master and slave databases:
explain analyze SELECT "om".* FROM "Table" AS "om" WHERE (om."UserID" =
11111) AND (NOT om."Deleted") AND (om."Status" = 0) ORDER BY "StampToSend"
asc LIMIT 11;
An execution plans are the same on master and slave databases.

An execution time on master database: 138.116 ms
An execution time on hot standby slave database with working replication:
1623.496 ms
An execution time on slave database with turned off replication (after
creating trigger file): 132.354 ms (equal to an execution time on master
database).

Why there is so huge difference in execution time between Postgresql with
and without working streaming replication?
How to improve the situation: to make an execution time on Hot standby slave
dtabase with working replication equeal to master database?

Need help. Thanks.

Vitaliy


--
Sent via pgsql-bugs mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: BUG #14635: Query is executed slower on hot standby slave database then on master database

Haribabu Kommi-2


On Fri, Apr 28, 2017 at 11:38 PM, <[hidden email]> wrote:
The following bug has been logged on the website:

Bug reference:      14635
Logged by:          Vitaliy Gomenyuk
Email address:      [hidden email]
PostgreSQL version: 9.2.14
Operating system:   CentOS 6
Description:

Hello,

We have two equal servers (CPU, RAM, HDD).
We are using Postgresql 9.2.14 currently.
Have setup master-slave streaming replication.

Running next query at master and slave databases:
explain analyze SELECT "om".* FROM "Table" AS "om" WHERE (om."UserID" =
11111) AND (NOT om."Deleted") AND (om."Status" = 0) ORDER BY "StampToSend"
asc LIMIT 11;
An execution plans are the same on master and slave databases.

An execution time on master database: 138.116 ms
An execution time on hot standby slave database with working replication:
1623.496 ms
An execution time on slave database with turned off replication (after
creating trigger file): 132.354 ms (equal to an execution time on master
database).

Why there is so huge difference in execution time between Postgresql with
and without working streaming replication?
How to improve the situation: to make an execution time on Hot standby slave
dtabase with working replication equeal to master database?

Is the issue happening always or only sometimes? 
can you please check explain plan output on both the machines with verbose
and buffers options.

explain (analyze, verbose, buffers) /*query*/

The explain output may help in identifying the plan of the query where it is taking
more time compared to master.

Regards,
Hari Babu
Fujitsu Australia
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: BUG #14635: Query is executed slower on hot standby slave database then on master database

vgomenyuk
Hello, 

I didn't found a way how to write a response for your message on Postgresql support web site, I didn't find ant button like "Response" or "New comment" or so. Can you please share this info, maybe there is present an instruction? 

Will write my response by mail.

Yes, it happens always.

There is an execution plan from master:
texting=# explain (analyze, verbose, buffers) SELECT "om".* FROM "Table" AS "om" WHERE (om."UserID" = 11111) AND (NOT om."Deleted") AND (om."Status" = 0) ORDER BY "StampToSend" asc LIMIT 11;

                  QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------
 Limit  (cost=3819.29..3819.30 rows=6 width=207) (actual time=139.847..139.847 rows=0 loops=1)
   Output: "ID", "MessageTypeID", "ServerID", "UserID", "RecipientsCount", "Credits", "Subject", "Message", "StampToSend", "StampProcessed"
, "SendToNewRecipients", "StampNewRecipientsChecked", "InsufficientCreditsEmailSent", "Deleted", "Status", "CreatedAt", "UpdatedAt", "Code"
, "Source", "MultiplePartsNumber"
   Buffers: shared hit=70256
   ->  Sort  (cost=3819.29..3819.30 rows=6 width=207) (actual time=139.845..139.845 rows=0 loops=1)
         Output: "ID", "MessageTypeID", "ServerID", "UserID", "RecipientsCount", "Credits", "Subject", "Message", "StampToSend", "StampProc
essed", "SendToNewRecipients", "StampNewRecipientsChecked", "InsufficientCreditsEmailSent", "Deleted", "Status", "CreatedAt", "UpdatedAt",
"Code", "Source", "MultiplePartsNumber"
         Sort Key: om."StampToSend"
         Sort Method: quicksort  Memory: 25kB
         Buffers: shared hit=70256
         ->  Bitmap Heap Scan on public."Table" om  (cost=3795.14..3819.21 rows=6 width=207) (actual time=139.836..139.836 rows=
0 loops=1)
               Output: "ID", "MessageTypeID", "ServerID", "UserID", "RecipientsCount", "Credits", "Subject", "Message", "StampToSend", "Sta
mpProcessed", "SendToNewRecipients", "StampNewRecipientsChecked", "InsufficientCreditsEmailSent", "Deleted", "Status", "CreatedAt", "Update
dAt", "Code", "Source", "MultiplePartsNumber"
               Recheck Cond: ((om."UserID" = 11111) AND (om."Status" = 0))
               Filter: (NOT om."Deleted")
               Buffers: shared hit=70256
               ->  BitmapAnd  (cost=3795.14..3795.14 rows=6 width=0) (actual time=139.832..139.832 rows=0 loops=1)
                     Buffers: shared hit=70256
                     ->  Bitmap Index Scan on "OutgoingMessages_UserID"  (cost=0.00..237.99 rows=6872 width=0) (actual time=0.032..0.032 ro
ws=12 loops=1)
                           Index Cond: (om."UserID" = 11111)
                           Buffers: shared hit=4
                     ->  Bitmap Index Scan on "OutgoingMessages_Status_StampToSend_Deleted"  (cost=0.00..3556.90 rows=80249 width=0) (actua
l time=139.761..139.761 rows=9158 loops=1)
                           Index Cond: ((om."Status" = 0) AND (om."Deleted" = false))
                           Buffers: shared hit=70252
 Total runtime: 139.900 ms
(22 rows)


There is an execution plan from slave:
texting=# explain (analyze, verbose, buffers) SELECT "om".* FROM "Table" AS "om" WHERE (om."UserID" = 11111) AND (NOT om."Deleted") AND (om."Status" = 0) ORDER BY "StampToSend" asc LIMIT 11;

                  QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------
 Limit  (cost=3819.29..3819.30 rows=6 width=207) (actual time=1472.199..1472.199 rows=0 loops=1)
   Output: "ID", "MessageTypeID", "ServerID", "UserID", "RecipientsCount", "Credits", "Subject", "Message", "StampToSend", "StampProcessed"
, "SendToNewRecipients", "StampNewRecipientsChecked", "InsufficientCreditsEmailSent", "Deleted", "Status", "CreatedAt", "UpdatedAt", "Code"
, "Source", "MultiplePartsNumber"
   Buffers: shared hit=70256
   ->  Sort  (cost=3819.29..3819.30 rows=6 width=207) (actual time=1472.197..1472.197 rows=0 loops=1)
         Output: "ID", "MessageTypeID", "ServerID", "UserID", "RecipientsCount", "Credits", "Subject", "Message", "StampToSend", "StampProc
essed", "SendToNewRecipients", "StampNewRecipientsChecked", "InsufficientCreditsEmailSent", "Deleted", "Status", "CreatedAt", "UpdatedAt",
"Code", "Source", "MultiplePartsNumber"
         Sort Key: om."StampToSend"
         Sort Method: quicksort  Memory: 25kB
         Buffers: shared hit=70256
         ->  Bitmap Heap Scan on public."Table" om  (cost=3795.14..3819.21 rows=6 width=207) (actual time=1472.191..1472.191 row
s=0 loops=1)
               Output: "ID", "MessageTypeID", "ServerID", "UserID", "RecipientsCount", "Credits", "Subject", "Message", "StampToSend", "Sta
mpProcessed", "SendToNewRecipients", "StampNewRecipientsChecked", "InsufficientCreditsEmailSent", "Deleted", "Status", "CreatedAt", "Update
dAt", "Code", "Source", "MultiplePartsNumber"
               Recheck Cond: ((om."UserID" = 11111) AND (om."Status" = 0))
               Filter: (NOT om."Deleted")
               Buffers: shared hit=70256
               ->  BitmapAnd  (cost=3795.14..3795.14 rows=6 width=0) (actual time=1472.186..1472.186 rows=0 loops=1)
                     Buffers: shared hit=70256
                     ->  Bitmap Index Scan on "OutgoingMessages_UserID"  (cost=0.00..237.99 rows=6872 width=0) (actual time=0.025..0.025 ro
ws=12 loops=1)
                           Index Cond: (om."UserID" = 11111)
                           Buffers: shared hit=4
                     ->  Bitmap Index Scan on "OutgoingMessages_Status_StampToSend_Deleted"  (cost=0.00..3556.90 rows=80249 width=0) (actua
l time=1470.853..1470.853 rows=8671249 loops=1)
                           Index Cond: ((om."Status" = 0) AND (om."Deleted" = false))
                           Buffers: shared hit=70252
 Total runtime: 1472.253 ms
(22 rows)


The difference is:
->  Bitmap Index Scan on "OutgoingMessages_Status_StampToSend_Deleted"  (cost=0.00..3556.90 rows=80249 width=0) (actual time=139.761..139.761 rows=9158 loops=1)

->  Bitmap Index Scan on "OutgoingMessages_Status_StampToSend_Deleted"  (cost=0.00..3556.90 rows=80249 width=0) (actual time=1470.853..1470.853 rows=8671249 loops=1)

On Tue, May 2, 2017 at 4:20 PM, Haribabu Kommi <[hidden email]> wrote:


On Fri, Apr 28, 2017 at 11:38 PM, <[hidden email]> wrote:
The following bug has been logged on the website:

Bug reference:      14635
Logged by:          Vitaliy Gomenyuk
Email address:      [hidden email]
PostgreSQL version: 9.2.14
Operating system:   CentOS 6
Description:

Hello,

We have two equal servers (CPU, RAM, HDD).
We are using Postgresql 9.2.14 currently.
Have setup master-slave streaming replication.

Running next query at master and slave databases:
explain analyze SELECT "om".* FROM "Table" AS "om" WHERE (om."UserID" =
11111) AND (NOT om."Deleted") AND (om."Status" = 0) ORDER BY "StampToSend"
asc LIMIT 11;
An execution plans are the same on master and slave databases.

An execution time on master database: 138.116 ms
An execution time on hot standby slave database with working replication:
1623.496 ms
An execution time on slave database with turned off replication (after
creating trigger file): 132.354 ms (equal to an execution time on master
database).

Why there is so huge difference in execution time between Postgresql with
and without working streaming replication?
How to improve the situation: to make an execution time on Hot standby slave
dtabase with working replication equeal to master database?

Is the issue happening always or only sometimes? 
can you please check explain plan output on both the machines with verbose
and buffers options.

explain (analyze, verbose, buffers) /*query*/

The explain output may help in identifying the plan of the query where it is taking
more time compared to master.

Regards,
Hari Babu
Fujitsu Australia



--

Best regards,

Vitaliy Gomenyuk Senior DBA
[hidden email] | +380 67 220 5903 
[hidden email] | website | Facebook | Twitter
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: BUG #14635: Query is executed slower on hot standby slave database then on master database

Tom Lane-2
Vitaliy Gomenyuk <[hidden email]> writes:
> [ same query slower on slave ]

Hmm, the discrepancy is evidently in the larger bitmap index scan:

> There is an execution plan from master:
>                      ->  Bitmap Index Scan on "OutgoingMessages_Status_StampToSend_Deleted"  (cost=0.00..3556.90 rows=80249 width=0) (actual time=139.761..139.761 rows=9158 loops=1)
>                            Index Cond: ((om."Status" = 0) AND (om."Deleted" = false))
>                            Buffers: shared hit=70252

> There is an execution plan from slave:
>                      ->  Bitmap Index Scan on "OutgoingMessages_Status_StampToSend_Deleted"  (cost=0.00..3556.90 rows=80249 width=0) (actual time=1470.853..1470.853 rows=8671249 loops=1)
>                            Index Cond: ((om."Status" = 0) AND (om."Deleted" = false))
>                            Buffers: shared hit=70252

The contents of the indexes should be the same, so why is the slave
returning so many more rows?  It has to be because the index entries are
not marked as killed (known-dead-to-everybody), or not being treated as
killed, in the slave.  I vaguely recall that there's a difference in the
rules for index entry visibility on slaves, but it's not clear to me why
that should be.

                        regards, tom lane


--
Sent via pgsql-bugs mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: BUG #14635: Query is executed slower on hot standby slave database then on master database

Haribabu Kommi-2


On Thu, May 4, 2017 at 12:05 AM, Tom Lane <[hidden email]> wrote:
Vitaliy Gomenyuk <[hidden email]> writes:
> [ same query slower on slave ]

Hmm, the discrepancy is evidently in the larger bitmap index scan:

> There is an execution plan from master:
>                      ->  Bitmap Index Scan on "OutgoingMessages_Status_StampToSend_Deleted"  (cost=0.00..3556.90 rows=80249 width=0) (actual time=139.761..139.761 rows=9158 loops=1)
>                            Index Cond: ((om."Status" = 0) AND (om."Deleted" = false))
>                            Buffers: shared hit=70252

> There is an execution plan from slave:
>                      ->  Bitmap Index Scan on "OutgoingMessages_Status_StampToSend_Deleted"  (cost=0.00..3556.90 rows=80249 width=0) (actual time=1470.853..1470.853 rows=8671249 loops=1)
>                            Index Cond: ((om."Status" = 0) AND (om."Deleted" = false))
>                            Buffers: shared hit=70252

The contents of the indexes should be the same, so why is the slave
returning so many more rows?  It has to be because the index entries are
not marked as killed (known-dead-to-everybody), or not being treated as
killed, in the slave.  I vaguely recall that there's a difference in the
rules for index entry visibility on slaves, but it's not clear to me why
that should be.

The index cleanup by the full vacuum and vacuum one page are WAL logged,
so when they gets replayed on the slave, both the indexes must be same.

May be the WAL didn't replayed on the slave because of conflict transaction?
Or Any other scenarios it may be different?


Hi Vitaliy,

Is it possible for you check the status of the replication? and also is it possible
for you to create another fresh slave and check whether the issue is happening
there also?

Regards,
Hari Babu
Fujitsu Australia
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: BUG #14635: Query is executed slower on hot standby slave database then on master database

vgomenyuk
Hello Haribabu,

Thank you for your answer.

There is status of the replication:
select pg_last_xlog_receive_location() "receive_location", pg_last_xlog_replay_location() "replay_location", pg_is_in_recovery() "recovery_status", pg_is_xlog_replay_paused() "replication_paused", now() - pg_last_xact_replay_timestamp() "replication_delay";
 
receive_location | replay_location | recovery_status | replication_paused | replication_delay
------------------+-----------------+-----------------+--------------------+-------------------
 66A/8F77A330     | 66A/8F77A330    | t               | f                  | 00:00:00.015721
(1 row)

We have several slave servers and I also have setup new fresh slave databases several times - I got the same situation each time.

I still need help with it.

Waiting for your response. Thanks.

On Thu, May 4, 2017 at 3:26 AM, Haribabu Kommi <[hidden email]> wrote:


On Thu, May 4, 2017 at 12:05 AM, Tom Lane <[hidden email]> wrote:
Vitaliy Gomenyuk <[hidden email]> writes:
> [ same query slower on slave ]

Hmm, the discrepancy is evidently in the larger bitmap index scan:

> There is an execution plan from master:
>                      ->  Bitmap Index Scan on "OutgoingMessages_Status_StampToSend_Deleted"  (cost=0.00..3556.90 rows=80249 width=0) (actual time=139.761..139.761 rows=9158 loops=1)
>                            Index Cond: ((om."Status" = 0) AND (om."Deleted" = false))
>                            Buffers: shared hit=70252

> There is an execution plan from slave:
>                      ->  Bitmap Index Scan on "OutgoingMessages_Status_StampToSend_Deleted"  (cost=0.00..3556.90 rows=80249 width=0) (actual time=1470.853..1470.853 rows=8671249 loops=1)
>                            Index Cond: ((om."Status" = 0) AND (om."Deleted" = false))
>                            Buffers: shared hit=70252

The contents of the indexes should be the same, so why is the slave
returning so many more rows?  It has to be because the index entries are
not marked as killed (known-dead-to-everybody), or not being treated as
killed, in the slave.  I vaguely recall that there's a difference in the
rules for index entry visibility on slaves, but it's not clear to me why
that should be.

The index cleanup by the full vacuum and vacuum one page are WAL logged,
so when they gets replayed on the slave, both the indexes must be same.

May be the WAL didn't replayed on the slave because of conflict transaction?
Or Any other scenarios it may be different?


Hi Vitaliy,

Is it possible for you check the status of the replication? and also is it possible
for you to create another fresh slave and check whether the issue is happening
there also?

Regards,
Hari Babu
Fujitsu Australia



--

Best regards,

Vitaliy Gomenyuk Senior DBA
[hidden email] | +380 67 220 5903 
[hidden email] | website | Facebook | Twitter
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: BUG #14635: Query is executed slower on hot standby slave database then on master database

vgomenyuk
Hello, guys,

Any updates on this?

Thank you in advance.

On Mon, May 8, 2017 at 4:22 PM, Vitaliy Gomenyuk <[hidden email]> wrote:
Hello Haribabu,

Thank you for your answer.

There is status of the replication:
select pg_last_xlog_receive_location() "receive_location", pg_last_xlog_replay_location() "replay_location", pg_is_in_recovery() "recovery_status", pg_is_xlog_replay_paused() "replication_paused", now() - pg_last_xact_replay_timestamp() "replication_delay";
 
receive_location | replay_location | recovery_status | replication_paused | replication_delay
------------------+-----------------+-----------------+--------------------+-------------------
 66A/8F77A330     | 66A/8F77A330    | t               | f                  | 00:00:00.015721
(1 row)

We have several slave servers and I also have setup new fresh slave databases several times - I got the same situation each time.

I still need help with it.

Waiting for your response. Thanks.

On Thu, May 4, 2017 at 3:26 AM, Haribabu Kommi <[hidden email]> wrote:


On Thu, May 4, 2017 at 12:05 AM, Tom Lane <[hidden email]> wrote:
Vitaliy Gomenyuk <[hidden email]> writes:
> [ same query slower on slave ]

Hmm, the discrepancy is evidently in the larger bitmap index scan:

> There is an execution plan from master:
>                      ->  Bitmap Index Scan on "OutgoingMessages_Status_StampToSend_Deleted"  (cost=0.00..3556.90 rows=80249 width=0) (actual time=139.761..139.761 rows=9158 loops=1)
>                            Index Cond: ((om."Status" = 0) AND (om."Deleted" = false))
>                            Buffers: shared hit=70252

> There is an execution plan from slave:
>                      ->  Bitmap Index Scan on "OutgoingMessages_Status_StampToSend_Deleted"  (cost=0.00..3556.90 rows=80249 width=0) (actual time=1470.853..1470.853 rows=8671249 loops=1)
>                            Index Cond: ((om."Status" = 0) AND (om."Deleted" = false))
>                            Buffers: shared hit=70252

The contents of the indexes should be the same, so why is the slave
returning so many more rows?  It has to be because the index entries are
not marked as killed (known-dead-to-everybody), or not being treated as
killed, in the slave.  I vaguely recall that there's a difference in the
rules for index entry visibility on slaves, but it's not clear to me why
that should be.

The index cleanup by the full vacuum and vacuum one page are WAL logged,
so when they gets replayed on the slave, both the indexes must be same.

May be the WAL didn't replayed on the slave because of conflict transaction?
Or Any other scenarios it may be different?


Hi Vitaliy,

Is it possible for you check the status of the replication? and also is it possible
for you to create another fresh slave and check whether the issue is happening
there also?

Regards,
Hari Babu
Fujitsu Australia



--

Best regards,

Vitaliy Gomenyuk Senior DBA
[hidden email] | <a href="tel:+380%2067%20220%205903" value="+380672205903" target="_blank">+380 67 220 5903 
[hidden email] | website | Facebook | Twitter



--

Best regards,

Vitaliy Gomenyuk Senior DBA
[hidden email] | +380 67 220 5903 
[hidden email] | website | Facebook | Twitter
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: BUG #14635: Query is executed slower on hot standby slave database then on master database

Bruce Momjian
In reply to this post by Haribabu Kommi-2
On Thu, May  4, 2017 at 10:26:17AM +1000, Haribabu Kommi wrote:

>     The contents of the indexes should be the same, so why is the slave
>     returning so many more rows?  It has to be because the index entries are
>     not marked as killed (known-dead-to-everybody), or not being treated as
>     killed, in the slave.  I vaguely recall that there's a difference in the
>     rules for index entry visibility on slaves, but it's not clear to me why
>     that should be.
>
>
> The index cleanup by the full vacuum and vacuum one page are WAL logged,
> so when they gets replayed on the slave, both the indexes must be same.
>
> May be the WAL didn't replayed on the slave because of conflict transaction?
> Or Any other scenarios it may be different?

We don't WAL-log hint bits.  Could that affect queries running on the
standbys?

--
  Bruce Momjian  <[hidden email]>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +


--
Sent via pgsql-bugs mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: BUG #14635: Query is executed slower on hot standby slave database then on master database

vgomenyuk
What does it mean "We don't WAL-log hint bits"?

On Fri, May 12, 2017 at 8:30 PM, Bruce Momjian <[hidden email]> wrote:
On Thu, May  4, 2017 at 10:26:17AM +1000, Haribabu Kommi wrote:
>     The contents of the indexes should be the same, so why is the slave
>     returning so many more rows?  It has to be because the index entries are
>     not marked as killed (known-dead-to-everybody), or not being treated as
>     killed, in the slave.  I vaguely recall that there's a difference in the
>     rules for index entry visibility on slaves, but it's not clear to me why
>     that should be.
>
>
> The index cleanup by the full vacuum and vacuum one page are WAL logged,
> so when they gets replayed on the slave, both the indexes must be same.
>
> May be the WAL didn't replayed on the slave because of conflict transaction?
> Or Any other scenarios it may be different?

We don't WAL-log hint bits.  Could that affect queries running on the
standbys?

--
  Bruce Momjian  <[hidden email]>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



--

Best regards,

Vitaliy Gomenyuk Senior DBA
[hidden email] | +380 67 220 5903 
[hidden email] | website | Facebook | Twitter
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: BUG #14635: Query is executed slower on hot standby slave database then on master database

Haribabu Kommi-2
In reply to this post by Bruce Momjian


On Sat, May 13, 2017 at 3:30 AM, Bruce Momjian <[hidden email]> wrote:
On Thu, May  4, 2017 at 10:26:17AM +1000, Haribabu Kommi wrote:
>     The contents of the indexes should be the same, so why is the slave
>     returning so many more rows?  It has to be because the index entries are
>     not marked as killed (known-dead-to-everybody), or not being treated as
>     killed, in the slave.  I vaguely recall that there's a difference in the
>     rules for index entry visibility on slaves, but it's not clear to me why
>     that should be.
>
>
> The index cleanup by the full vacuum and vacuum one page are WAL logged,
> so when they gets replayed on the slave, both the indexes must be same.
>
> May be the WAL didn't replayed on the slave because of conflict transaction?
> Or Any other scenarios it may be different?

We don't WAL-log hint bits.  Could that affect queries running on the
standbys?

Yes, this may be a reason for extra number of records that are present in the
index.

Hi Vitaly,

Hint bits are used to mark tuples as created and/or deleted by transactions
that are known committed or aborted. The changes to the hint bits are not
WAL logged. These are WAL logged when the data checksums are enabled
or wal_log_hintbits GUC is set, but these two options are not available in
PostgreSQL version 9.2

Is it possible for you test the query when you execute the reindex of the index?

or

Try to add pgstattuple and pageinspect extensions and verify whether the index
is same in both master and standby.

If there exists more number of records in standby compared to master, this may 
be because of some cleanup that is happening in master and not done in the
standby.

Regards,
Hari Babu
Fujitsu Australia
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: BUG #14635: Query is executed slower on hot standby slave database then on master database

Michael Paquier
In reply to this post by vgomenyuk
On Mon, May 15, 2017 at 5:41 PM, Vitaliy Gomenyuk
<[hidden email]> wrote:

> On Fri, May 12, 2017 at 8:30 PM, Bruce Momjian <[hidden email]> wrote:
>>
>> On Thu, May  4, 2017 at 10:26:17AM +1000, Haribabu Kommi wrote:
>> >     The contents of the indexes should be the same, so why is the slave
>> >     returning so many more rows?  It has to be because the index entries are
>> >     not marked as killed (known-dead-to-everybody), or not being treated as
>> >     killed, in the slave.  I vaguely recall that there's a difference in the
>> >     rules for index entry visibility on slaves, but it's not clear to me why
>> >     that should be.
>> >
>> >
>> > The index cleanup by the full vacuum and vacuum one page are WAL logged,
>> > so when they gets replayed on the slave, both the indexes must be same.
>> >
>> > May be the WAL didn't replayed on the slave because of conflict transaction?
>> > Or Any other scenarios it may be different?
>>
>> We don't WAL-log hint bits.  Could that affect queries running on the
>> standbys?
>
> What does it mean "We don't WAL-log hint bits"?

(Please avoid top-posting...)

Haribaru has given upthread an answer good enough for that. Here are
more details...

Here is a description about what are hint bits:
https://wiki.postgresql.org/wiki/Hint_Bits

This sentence means that no WAL records are generated when a hint bit
is updated on a page. There are circumstances where setting up a hint
bit on a page causes WAL activity though: if wal_log_hints is enabled,
if a hint bit is the first update happening on a page, then a WAL
record to log a full page write of this page is taken.
--
Michael


--
Sent via pgsql-bugs mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: BUG #14635: Query is executed slower on hot standby slave database then on master database

Sawada Masahiko
In reply to this post by Haribabu Kommi-2
On Mon, May 15, 2017 at 10:44 PM, Haribabu Kommi
<[hidden email]> wrote:

>
>
> On Sat, May 13, 2017 at 3:30 AM, Bruce Momjian <[hidden email]> wrote:
>>
>> On Thu, May  4, 2017 at 10:26:17AM +1000, Haribabu Kommi wrote:
>> >     The contents of the indexes should be the same, so why is the slave
>> >     returning so many more rows?  It has to be because the index entries
>> > are
>> >     not marked as killed (known-dead-to-everybody), or not being treated
>> > as
>> >     killed, in the slave.  I vaguely recall that there's a difference in
>> > the
>> >     rules for index entry visibility on slaves, but it's not clear to me
>> > why
>> >     that should be.
>> >
>> >
>> > The index cleanup by the full vacuum and vacuum one page are WAL logged,
>> > so when they gets replayed on the slave, both the indexes must be same.
>> >
>> > May be the WAL didn't replayed on the slave because of conflict
>> > transaction?
>> > Or Any other scenarios it may be different?
>>
>> We don't WAL-log hint bits.  Could that affect queries running on the
>> standbys?
>
>
> Yes, this may be a reason for extra number of records that are present in
> the
> index.
>
> Hi Vitaly,
>
> Hint bits are used to mark tuples as created and/or deleted by transactions
> that are known committed or aborted. The changes to the hint bits are not
> WAL logged. These are WAL logged when the data checksums are enabled
> or wal_log_hintbits GUC is set, but these two options are not available in
> PostgreSQL version 9.2
>
> Is it possible for you test the query when you execute the reindex of the
> index?
>
> or
>
> Try to add pgstattuple and pageinspect extensions and verify whether the
> index
> is same in both master and standby.
>
> If there exists more number of records in standby compared to master, this
> may
> be because of some cleanup that is happening in master and not done in the
> standby.
>

Because hint bit affects to only heap tuple I think that it's not
relevant. But because marking index entry as dead is not WAL-logged
it's possible that unnecessary index entries are marked as dead on
master server while these are not marked on standby server. This would
make index scan on standby server return entry much more than master
server.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


--
Sent via pgsql-bugs mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: BUG #14635: Query is executed slower on hot standby slave database then on master database

vgomenyuk
Thank you for all your answers.

1. We can't reindex the table on a hot-standby database:
texting=# REINDEX TABLE "Table";
ERROR:  cannot execute REINDEX during recovery

2. I'm not sure that the problem is with not WAL-logged hint bits, because due to my next investigation:
  2.0. We have two equal servers for master and slave database (CPU, RAM, HDD).
  2.1. On master database I'm running a query, it executes in 138.116 ms
  2.2. On a hot-standby slave database with working replication process I'm running the same query, it executes in 1623.496 ms (much slower)
  2.3. Then on that hot-standby slave database I've created a trigger file, which stops a replication process immediately and right after that, I'm running the same query, it executes in 132.354 ms (fast like on master). 
  2.4. During the time when I'm creating a trigger file (less then 1 second), indexes in that table can't be changed. Even more, that slave hot-standby database was created from the full master copy just before I started test. So master and slave databases are equal. One difference is turned on replication process in a hot-standby mode. So only a replication process can cause this behaviour. But why and how to improve the situation?

A question is still open.




On Tue, May 16, 2017 at 4:58 PM, Masahiko Sawada <[hidden email]> wrote:
On Mon, May 15, 2017 at 10:44 PM, Haribabu Kommi
<[hidden email]> wrote:
>
>
> On Sat, May 13, 2017 at 3:30 AM, Bruce Momjian <[hidden email]> wrote:
>>
>> On Thu, May  4, 2017 at 10:26:17AM +1000, Haribabu Kommi wrote:
>> >     The contents of the indexes should be the same, so why is the slave
>> >     returning so many more rows?  It has to be because the index entries
>> > are
>> >     not marked as killed (known-dead-to-everybody), or not being treated
>> > as
>> >     killed, in the slave.  I vaguely recall that there's a difference in
>> > the
>> >     rules for index entry visibility on slaves, but it's not clear to me
>> > why
>> >     that should be.
>> >
>> >
>> > The index cleanup by the full vacuum and vacuum one page are WAL logged,
>> > so when they gets replayed on the slave, both the indexes must be same.
>> >
>> > May be the WAL didn't replayed on the slave because of conflict
>> > transaction?
>> > Or Any other scenarios it may be different?
>>
>> We don't WAL-log hint bits.  Could that affect queries running on the
>> standbys?
>
>
> Yes, this may be a reason for extra number of records that are present in
> the
> index.
>
> Hi Vitaly,
>
> Hint bits are used to mark tuples as created and/or deleted by transactions
> that are known committed or aborted. The changes to the hint bits are not
> WAL logged. These are WAL logged when the data checksums are enabled
> or wal_log_hintbits GUC is set, but these two options are not available in
> PostgreSQL version 9.2
>
> Is it possible for you test the query when you execute the reindex of the
> index?
>
> or
>
> Try to add pgstattuple and pageinspect extensions and verify whether the
> index
> is same in both master and standby.
>
> If there exists more number of records in standby compared to master, this
> may
> be because of some cleanup that is happening in master and not done in the
> standby.
>

Because hint bit affects to only heap tuple I think that it's not
relevant. But because marking index entry as dead is not WAL-logged
it's possible that unnecessary index entries are marked as dead on
master server while these are not marked on standby server. This would
make index scan on standby server return entry much more than master
server.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



--

Best regards,

Vitaliy Gomenyuk DBA
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: BUG #14635: Query is executed slower on hot standby slave database then on master database

Jeff Janes
On Thu, May 18, 2017 at 9:26 AM, Vitaliy Gomenyuk <[hidden email]> wrote:

2. I'm not sure that the problem is with not WAL-logged hint bits, because due to my next investigation:
  2.0. We have two equal servers for master and slave database (CPU, RAM, HDD).
  2.1. On master database I'm running a query, it executes in 138.116 ms
  2.2. On a hot-standby slave database with working replication process I'm running the same query, it executes in 1623.496 ms (much slower)
  2.3. Then on that hot-standby slave database I've created a trigger file, which stops a replication process immediately and right after that, I'm running the same query, it executes in 132.354 ms (fast like on master). 
  2.4. During the time when I'm creating a trigger file (less then 1 second), indexes in that table can't be changed. Even more, that slave hot-standby database was created from the full master copy just before I started test. So master and slave databases are equal. One difference is turned on replication process in a hot-standby mode. So only a replication process can cause this behaviour.

If the index hint bits were set on the master and then a base backup is taken, then those bits are set on the replica created from that backup.  But it does not use the index hint bits while it is in hot standby mode.  But when it is promoted, it will start using them.
 
 
But why and how to improve the situation?

Vacuum your table (on the master).  

Cheers,

Jeff
Previous Thread Next Thread
Loading...