why select count(*) consumes wal logs

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

why select count(*) consumes wal logs

Ravi Krishna
PG 10.5

I loaded 133 million rows to a wide table (more than 100 cols) via COPY. The table has
no index at this time. Since I am the only user I don't see any other activity.
Now when I run select count(*) on the table where I just loaded data, it runs for ever,
more than 10min and still running. Intrigued, I checked locks and saw nothing.  Then I noticed something
strange.  When select count(*) runs, PG is writing to wal_logs, and that too a large amount. Why?  
I suspect vaccum is getting triggered, but this is a brand new table with no updates. So it should not.

Is there a SQL to peek into what PG is doing to write so much to WAL logs ?


Reply | Threaded
Open this post in threaded view
|

Re: why select count(*) consumes wal logs

Ravi Krishna
Must be something to do with Vaccum as the second time I ran the SQL, it did not consume WAL logs.

Reply | Threaded
Open this post in threaded view
|

Re: why select count(*) consumes wal logs

Michael Nolan
In reply to this post by Ravi Krishna


On Tue, Nov 6, 2018 at 11:08 AM Ravi Krishna <[hidden email]> wrote:
PG 10.5

I loaded 133 million rows to a wide table (more than 100 cols) via COPY.

It's always a good idea after doing a large scale data load to do a vacuum analyze on the table (or the entire database.)
--
Mike Nolan
Reply | Threaded
Open this post in threaded view
|

Re: why select count(*) consumes wal logs

Ron-2
On 11/06/2018 11:12 AM, Michael Nolan wrote:
On Tue, Nov 6, 2018 at 11:08 AM Ravi Krishna <[hidden email]> wrote:
PG 10.5

I loaded 133 million rows to a wide table (more than 100 cols) via COPY.

It's always a good idea after doing a large scale data load to do a vacuum analyze on the table (or the entire database.)


I understand the need to ANALYZE (populate the histograms needed by the dynamic optimizer), but why VACUUM (which is recommended after updates and deletes).

Thanks

--
Angular momentum makes the world go 'round.
Reply | Threaded
Open this post in threaded view
|

RE: why select count(*) consumes wal logs

Kumar, Virendra

I concord.

Why VACUUM when there is no update or deletes.

 

Regards,

Virendra

 

From: Ron [mailto:[hidden email]]
Sent: Tuesday, November 06, 2018 12:20 PM
To: [hidden email]
Subject: Re: why select count(*) consumes wal logs

 

On 11/06/2018 11:12 AM, Michael Nolan wrote:

On Tue, Nov 6, 2018 at 11:08 AM Ravi Krishna <[hidden email]> wrote:

PG 10.5

I loaded 133 million rows to a wide table (more than 100 cols) via COPY.

 

It's always a good idea after doing a large scale data load to do a vacuum analyze on the table (or the entire database.)

 


I understand the need to ANALYZE (populate the histograms needed by the dynamic optimizer), but why VACUUM (which is recommended after updates and deletes).

Thanks

--
Angular momentum makes the world go 'round.




This message is intended only for the use of the addressee and may contain
information that is PRIVILEGED AND CONFIDENTIAL.

If you are not the intended recipient, you are hereby notified that any
dissemination of this communication is strictly prohibited. If you have
received this communication in error, please erase all copies of the message
and its attachments and notify the sender immediately. Thank you.
Reply | Threaded
Open this post in threaded view
|

Re: why select count(*) consumes wal logs

Tom Lane-2
In reply to this post by Ravi Krishna
Ravi Krishna <[hidden email]> writes:
> I loaded 133 million rows to a wide table (more than 100 cols) via COPY. The table has
> no index at this time. Since I am the only user I don't see any other activity.
> Now when I run select count(*) on the table where I just loaded data, it runs for ever,
> more than 10min and still running. Intrigued, I checked locks and saw nothing.  Then I noticed something
> strange.  When select count(*) runs, PG is writing to wal_logs, and that too a large amount. Why?  

That represents setting the yes-this-row-is-committed hint bits on the
newly loaded rows.  The first access to any such row will set that bit,
whether it's a select or a VACUUM or whatever.

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: why select count(*) consumes wal logs

Ravi Krishna
In reply to this post by Ravi Krishna
>That represents setting the yes-this-row-is-committed hint bits on the
>newly loaded rows.  The first access to any such row will set that bit,
>whether it's a select or a VACUUM or whatever.

yes now I recollect reading this in a blog.  Thanks Tom.

Reply | Threaded
Open this post in threaded view
|

Re: why select count(*) consumes wal logs

Michael Nolan
In reply to this post by Tom Lane-2



On Tue, Nov 6, 2018 at 11:40 AM Tom Lane <[hidden email]> wrote:

That represents setting the yes-this-row-is-committed hint bits on the
newly loaded rows.  The first access to any such row will set that bit,
whether it's a select or a VACUUM or whatever.

Tom, does that include ANALYZE?
--
Mike Nolan
Reply | Threaded
Open this post in threaded view
|

Re: why select count(*) consumes wal logs

Tom Lane-2
Michael Nolan <[hidden email]> writes:
> On Tue, Nov 6, 2018 at 11:40 AM Tom Lane <[hidden email]> wrote:
>> That represents setting the yes-this-row-is-committed hint bits on the
>> newly loaded rows.  The first access to any such row will set that bit,
>> whether it's a select or a VACUUM or whatever.

> Tom, does that include ANALYZE?

Yes, but remember that ANALYZE doesn't scan the whole table; it'll only
set the bit on rows it visits.

(I forget at the moment if it's guaranteed to set the bit on all rows
in each page it examines, or only on the rows it selects to sample.
But in any case it will not examine every page in the table.)

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: why select count(*) consumes wal logs

Bruno Lavoie-2
In reply to this post by Tom Lane-2


Le mar. 6 nov. 2018 12:40 PM, Tom Lane <[hidden email]> a écrit :
Ravi Krishna <[hidden email]> writes:
> I loaded 133 million rows to a wide table (more than 100 cols) via COPY. The table has
> no index at this time. Since I am the only user I don't see any other activity.
> Now when I run select count(*) on the table where I just loaded data, it runs for ever,
> more than 10min and still running. Intrigued, I checked locks and saw nothing.  Then I noticed something
> strange.  When select count(*) runs, PG is writing to wal_logs, and that too a large amount. Why? 

That represents setting the yes-this-row-is-committed hint bits on the
newly loaded rows.  The first access to any such row will set that bit,
whether it's a select or a VACUUM or whatever.

                        regards, tom lane


And IIRC, it can generate a high WAL traffic since the first page change after a checkpoint is done with full page write. And you said that it's happening on a big table with wide rows....
Reply | Threaded
Open this post in threaded view
|

RE: why select count(*) consumes wal logs

Steven Winfield

As long as you don’t have page checksums turned on, you can prevent this by turning off wal_log_hints.

 

Steve.





This email is confidential. If you are not the intended recipient, please advise us immediately and delete this message. The registered name of Cantab- part of GAM Systematic is Cantab Capital Partners LLP. See - http://www.gam.com/en/Legal/Email+disclosures+EU for further information on confidentiality, the risks of non-secure electronic communication, and certain disclosures which we are required to make in accordance with applicable legislation and regulations. If you cannot access this link, please notify us by reply message and we will send the contents to you.

GAM Holding AG and its subsidiaries (Cantab – GAM Systematic) will collect and use information about you in the course of your interactions with us. Full details about the data types we collect and what we use this for and your related rights is set out in our online privacy policy at https://www.gam.com/en/legal/privacy-policy. Please familiarise yourself with this policy and check it from time to time for updates as it supplements this notice
Reply | Threaded
Open this post in threaded view
|

Re: why select count(*) consumes wal logs

Ravi Krishna
> As long as you don’t have page checksums turned on,
> you can prevent this by turning off wal_log_hints.
 
  


I did not run initdb. How to find out which parameter were used with initdb. For page checksums
to be on, it must have been run with -k option.

Our wal_log_hints is left at default which means off.

thanks

Reply | Threaded
Open this post in threaded view
|

RE: why select count(*) consumes wal logs

Steven Winfield
> How to find out which parameter were used with initdb

pg_controldata -D <datadir> | grep sum
...should give you something like:
Data page checksum version: 0
...and 0 means off.

Similarly, from SQL:
select data_page_checksum_version from pg_control_init()

Steve.




This email is confidential. If you are not the intended recipient, please advise us immediately and delete this message. The registered name of Cantab- part of GAM Systematic is Cantab Capital Partners LLP. See - http://www.gam.com/en/Legal/Email+disclosures+EU for further information on confidentiality, the risks of non-secure electronic communication, and certain disclosures which we are required to make in accordance with applicable legislation and regulations. If you cannot access this link, please notify us by reply message and we will send the contents to you.

GAM Holding AG and its subsidiaries (Cantab – GAM Systematic) will collect and use information about you in the course of your interactions with us. Full details about the data types we collect and what we use this for and your related rights is set out in our online privacy policy at https://www.gam.com/en/legal/privacy-policy. Please familiarise yourself with this policy and check it from time to time for updates as it supplements this notice
Reply | Threaded
Open this post in threaded view
|

Re: why select count(*) consumes wal logs

Ravi Krishna

> select data_page_checksum_version from pg_control_init()

returned 1.  So we have page_checksum turned on, and wal_log_hints off.

Reply | Threaded
Open this post in threaded view
|

Re: why select count(*) consumes wal logs

Thomas Kellerer
Ravi Krishna schrieb am 07.11.2018 um 15:10:
>
>> select data_page_checksum_version from pg_control_init()
>
> returned 1.  So we have page_checksum turned on, and wal_log_hints off.

If page_checksum is enabled, then wal_log_hints is ignored (or actually always assumed "on")