stale WAL files?

classic Classic list List threaded Threaded
22 messages Options
12
lup
Reply | Threaded
Open this post in threaded view
|

stale WAL files?

lup
PG10.7, Centos7

On Mar15 we filled our default tablespace/WAL partition.  Cleaned up some old dumps and restarted.  pg_wal had apparently exploded but cleaned itself up by the next day. On Mar16 I ran CHECKPOINT in all databases on that server (except template0).  All seems fine except for 271 WALs from MAR16 going nowhere fast.  Of course we see transient WALS come and go every day.  We did a restart of the server last Friday (Mar22) without issue.

-rw-------. 1 postgres postgres 16777216 Mar 16 16:32 0000000100000CE9000000DD
 ...(269 others)
-rw-------. 1 postgres postgres 16777216 Mar 16 17:01 0000000100000CEA000000E9

What’s keeping these alive.  Or can they be deleted? Any normal admin routine likely to clean these up?


One of our crew was apparently trying to create a function: he had a run-away json query which caused memory grief but I don’t see that reflected anywhere in the log. The log file for that day goes south as follows.  

[1]2019-03-15 12:09:15.230 MDT [29189] STATEMENT:  select count(*) from sui.probancset_group_member where group_id = '4ce6a94d-bb2d-\
43c2-a1f9-7b68f1618cd4';
[2]2019-03-15 17:07:30.818 MDT [35020] ERROR:  could not write to hash-join temporary file: No space left on device
[3]2019-03-15 17:07:30.818 MDT [35020] STATEMENT:  update segment as s set events_less = s.events_less + b.events_less, events_equal\
 = s.events_equal + b.events_equal, events_greater = s.events_greater + b.events_greater, threshold_events = s.threshold_events +\
 b.threshold_events from bulk."bc_1819_17_fd6dbc1e_57e5_4d49_b896_59d6687c8ee6" as b where s.markerset_id = '8d723d2f-1281-48c5-9\
016-2dab3f4d242b' and s.probandset_id = b.probandset_id and s.markerset_id = b.markerset_id and s.startbase = b.startbase and s.e\
ndbase = b.endbase and  s.probandset_id >= '90000000-0000-0000-0000-000000000000' and s.probandset_id < 'a0000000-0000-0000-0000-\
000000000000'
2019-03-15 17:07:30.818 MDT [317161] PANIC:  could not write to file "pg_wal/xlogtemp.317161": No space left on device
[4]2019-03-15 17:08:19.231 MDT [35166] ERROR:  unterminated dollar-quoted string at or near "$$

        begin
              --new probandset_group record
              insert into probandset_group(id,name) select uuid_generate_v4(),'1808_p3c2n4';

              --create necessary new probandset and probandset_group_member records
              select addprobandset(a.name, people, groupname) from
                     (select powerset((select array_agg(name order by name) lst from base.person
                             where name in (superset))) as name
                             except (select regexp_split_to_array(p.name,',') from  probandset p  )) a
                     where array_length(a.name,1)>1;

              --update superset field for new probandset_group
              update probandset_group set proband_superset_id = (select id from probandset where name = superset);
        end;

[1] Last reported normal operation
[2] death knell
[3] First failed operation from our application
[4] continued effort from create function work
 
Thanks in advance.




Reply | Threaded
Open this post in threaded view
|

Re: stale WAL files?

Adrian Klaver-4
On 3/25/19 5:10 PM, Rob Sargent wrote:

> PG10.7, Centos7
>
> On Mar15 we filled our default tablespace/WAL partition.  Cleaned up
> some old dumps and restarted.  pg_wal had apparently exploded but
> cleaned itself up by the next day. On Mar16 I ran CHECKPOINT in all
> databases on that server (except template0).  All seems fine except for
> 271 WALs from MAR16 going nowhere fast.  Of course we see transient WALS
> come and go every day.  We did a restart of the server last Friday
> (Mar22) without issue.
>
> -rw-------. 1 postgres postgres 16777216 Mar 16 16:32
> 0000000100000CE9000000DD
>   ...(269 others)
> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01
> 0000000100000CEA000000E9
>
> What’s keeping these alive.  Or can they be deleted? Any normal admin
> routine likely to clean these up?

wal_keep_segments?

Do you have replication set up and replication slots in use?

>
>
> One of our crew was apparently trying to create a function: he had a
> run-away json query which caused memory grief but I don’t see that
> reflected anywhere in the log. The log file for that day goes south as
> follows.
>
>     [1]2019-03-15 12:09:15.230 MDT [29189] STATEMENT:  select count(*)
>     from sui.probancset_group_member where group_id = '4ce6a94d-bb2d-\
>     43c2-a1f9-7b68f1618cd4';
>     [2]2019-03-15 17:07:30.818 MDT [35020] ERROR:  could not write to
>     hash-join temporary file: No space left on device
>     [3]2019-03-15 17:07:30.818 MDT [35020] STATEMENT:  update segment as
>     s set events_less = s.events_less + b.events_less, events_equal\
>       = s.events_equal + b.events_equal, events_greater =
>     s.events_greater + b.events_greater, threshold_events =
>     s.threshold_events +\
>       b.threshold_events from
>     bulk."bc_1819_17_fd6dbc1e_57e5_4d49_b896_59d6687c8ee6" as b where
>     s.markerset_id = '8d723d2f-1281-48c5-9\
>     016-2dab3f4d242b' and s.probandset_id = b.probandset_id and
>     s.markerset_id = b.markerset_id and s.startbase = b.startbase and s.e\
>     ndbase = b.endbase and  s.probandset_id >=
>     '90000000-0000-0000-0000-000000000000' and s.probandset_id <
>     'a0000000-0000-0000-0000-\
>     000000000000'
>     2019-03-15 17:07:30.818 MDT [317161] PANIC:  could not write to file
>     "pg_wal/xlogtemp.317161": No space left on device
>     [4]2019-03-15 17:08:19.231 MDT [35166] ERROR:  unterminated
>     dollar-quoted string at or near "$$
>
>              begin
>                    --new probandset_group record
>                    insert into probandset_group(id,name) select
>     uuid_generate_v4(),'1808_p3c2n4';
>
>                    --create necessary new probandset and
>     probandset_group_member records
>                    select addprobandset(a.name, people, groupname) from
>                           (select powerset((select array_agg(name order
>     by name) lst from base.person
>                                   where name in (superset))) as name
>                                   except (select
>     regexp_split_to_array(p.name,',') from  probandset p  )) a
>                           where array_length(a.name,1)>1;
>
>                    --update superset field for new probandset_group
>                    update probandset_group set proband_superset_id =
>     (select id from probandset where name = superset);
>              end;
>
>     [1] Last reported normal operation
>     [2] death knell
>     [3] First failed operation from our application
>     [4] continued effort from create function work
>
> Thanks in advance.
>
>
>
>


--
Adrian Klaver
[hidden email]

lup
Reply | Threaded
Open this post in threaded view
|

Re: stale WAL files?

lup


On Mar 26, 2019, at 8:08 AM, Adrian Klaver <[hidden email]> wrote:

On 3/25/19 5:10 PM, Rob Sargent wrote:
PG10.7, Centos7
On Mar15 we filled our default tablespace/WAL partition.  Cleaned up some old dumps and restarted. pg_wal had apparently exploded but cleaned itself up by the next day. On Mar16 I ran CHECKPOINT in all databases on that server (except template0).  All seems fine except for 271 WALs from MAR16 going nowhere fast.  Of course we see transient WALS come and go every day.  We did a restart of the server last Friday (Mar22) without issue.
-rw-------. 1 postgres postgres 16777216 Mar 16 16:32 0000000100000CE9000000DD
 ...(269 others)
-rw-------. 1 postgres postgres 16777216 Mar 16 17:01 0000000100000CEA000000E9
What’s keeping these alive.  Or can they be deleted? Any normal admin routine likely to clean these up?

wal_keep_segments?
Commented out in postgres.conf

Do you have replication set up and replication slots in use?
No, sorry I should have said that up front. We’re simple folk.

Reply | Threaded
Open this post in threaded view
|

Re: stale WAL files?

Michael Paquier-2
On Tue, Mar 26, 2019 at 09:50:37AM -0600, Rob Sargent wrote:
> No, sorry I should have said that up front. We’re simple folk.

What is the WAL position (LSN) where Postgres is writing to and what
is the set of WAL segments in pg_wal (or pg_xlog if that's a server
older than 10)?  Please double-check the configuration value of
wal_keep_segments, and as mentioned upthread, could you make sure that
you have no replication slots active?  This can be done simply by
querying pg_replication_slots.  Please note as well that checkpoints
are server-wide, so there is no point to run them on all databases.
Only one command will be effective for all databases.
--
Michael

signature.asc (849 bytes) Download Attachment
lup
Reply | Threaded
Open this post in threaded view
|

Re: stale WAL files?

lup

On 3/28/19 7:30 AM, Michael Paquier wrote:
> On Tue, Mar 26, 2019 at 09:50:37AM -0600, Rob Sargent wrote:
>> No, sorry I should have said that up front. We’re simple folk.
> What is the WAL position (LSN)
postgres=# select * from pg_current_wal_flush_lsn();
  pg_current_wal_flush_lsn
--------------------------
  CEA/E57EAA8
(1 row)

postgres=# select * from pg_current_wal_insert_lsn();
  pg_current_wal_insert_lsn
---------------------------
  CEA/E57EAA8
(1 row)

postgres=# select * from pg_current_wal_lsn();
  pg_current_wal_lsn
--------------------
  CEA/E57EAA8
(1 row)

> where Postgres is writing to and what
> is the set of WAL segments in pg_wal (or pg_xlog if that's a server
> older than 10)?

  select version();
version
---------------------------------------------------------------------------------------------------------
  PostgreSQL 10.7 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5
20150623 (Red Hat 4.8.5-36), 64-bit
(1 row)

This is pg10 so it's pg_wal.  ls -ltr


-rw-------. 1 postgres postgres 16777216 Mar 16 16:33
0000000100000CEA000000B1
-rw-------. 1 postgres postgres 16777216 Mar 16 16:33
0000000100000CEA000000B2

  ... 217 more on through to ...

-rw-------. 1 postgres postgres 16777216 Mar 16 17:01
0000000100000CEA000000E8
-rw-------. 1 postgres postgres 16777216 Mar 16 17:01
0000000100000CEA000000E9
-rw-------. 1 postgres postgres 16777216 Mar 28 09:46
0000000100000CEA0000000E

> Please double-check the configuration value of
> wal_keep_segments,
#wal_keep_segments = 0          # in logfile segments, 16MB each
> and as mentioned upthread, could you make sure that
> you have no replication slots active?

This could be part of the problem?

#max_replication_slots = 10     # max number of replication slots

but

   select * from pg_replication_slots;
  slot_name | plugin | slot_type | datoid | database | temporary |
active | active_pid | xmin | catalog_xmin | restart_lsn |
confirmed_flush_lsn
-----------+--------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------
(0 rows)


> This can be done simply by
> querying pg_replication_slots.  Please note as well that checkpoints
> are server-wide, so there is no point to run them on all databases.
> Only one command will be effective for all databases.
I suspected as much, but there weren't many dbs so I went all in.
> --
> Michael


Reply | Threaded
Open this post in threaded view
|

Re: stale WAL files?

Michael Paquier-2
On Thu, Mar 28, 2019 at 09:53:16AM -0600, Rob Sargent wrote:

> This is pg10 so it's pg_wal.  ls -ltr
>
>
> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33
> 0000000100000CEA000000B1
> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33
> 0000000100000CEA000000B2
>
>  ... 217 more on through to ...
>
> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01
> 0000000100000CEA000000E8
> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01
> 0000000100000CEA000000E9
> -rw-------. 1 postgres postgres 16777216 Mar 28 09:46
> 0000000100000CEA0000000E
In Postgres 10 and older versions, the server keeps WAL segment for
the last completed segment, and the previous completed segment.  So
even if a checkpoint is issued, the current WAL insert point is never
really going to be on the first segment in pg_wal.  Isn't that the
origin of what you think is a problem?  So, say, if you issue a
checkpoint again, don't you see 0000000100000CEA000000B1 going away?

In Postgres 11, WAL segments worth only one checkpoint are kept
around.
--
Michael

signature.asc (849 bytes) Download Attachment
lup
Reply | Threaded
Open this post in threaded view
|

Re: stale WAL files?

lup


> On Mar 29, 2019, at 6:58 AM, Michael Paquier <[hidden email]> wrote:
>
>> On Thu, Mar 28, 2019 at 09:53:16AM -0600, Rob Sargent wrote:
>> This is pg10 so it's pg_wal.  ls -ltr
>>
>>
>> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33
>> 0000000100000CEA000000B1
>> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33
>> 0000000100000CEA000000B2
>>
>>  ... 217 more on through to ...
>>
>> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01
>> 0000000100000CEA000000E8
>> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01
>> 0000000100000CEA000000E9
>> -rw-------. 1 postgres postgres 16777216 Mar 28 09:46
>> 0000000100000CEA0000000E
>
> In Postgres 10 and older versions, the server keeps WAL segment for
> the last completed segment, and the previous completed segment.  So
> even if a checkpoint is issued, the current WAL insert point is never
> really going to be on the first segment in pg_wal.  Isn't that the
> origin of what you think is a problem?  So, say, if you issue a
> checkpoint again, don't you see 0000000100000CEA000000B1 going away?
>
> In Postgres 11, WAL segments worth only one checkpoint are kept
> around.
> --
> Michael

I’m on the road all through the weekend with limited connectivity. But recall that I’m getting new, transient WAL files

lup
Reply | Threaded
Open this post in threaded view
|

Re: stale WAL files?

lup
In reply to this post by Michael Paquier-2


> On Mar 29, 2019, at 6:58 AM, Michael Paquier <[hidden email]> wrote:
>
>> On Thu, Mar 28, 2019 at 09:53:16AM -0600, Rob Sargent wrote:
>> This is pg10 so it's pg_wal.  ls -ltr
>>
>>
>> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33
>> 0000000100000CEA000000B1
>> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33
>> 0000000100000CEA000000B2
>>
>>  ... 217 more on through to ...
>>
>> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01
>> 0000000100000CEA000000E8
>> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01
>> 0000000100000CEA000000E9
>> -rw-------. 1 postgres postgres 16777216 Mar 28 09:46
>> 0000000100000CEA0000000E
>
> In Postgres 10 and older versions, the server keeps WAL segment for
> the last completed segment, and the previous completed segment.  So
> even if a checkpoint is issued, the current WAL insert point is never
> really going to be on the first segment in pg_wal.  Isn't that the
> origin of what you think is a problem?  So, say, if you issue a
> checkpoint again, don't you see 0000000100000CEA000000B1 going away?
I had CEA000000015, generated today, as only entry since Mar 16. Issued checkpoint, soon had CEA0000016 and 15.  Five minutes later I still hav 15 and 16 (and of course all 271 from Mar 16).

>
> In Postgres 11, WAL segments worth only one checkpoint are kept
> around.
> --
> Michael


lup
Reply | Threaded
Open this post in threaded view
|

Re: stale WAL files?

lup
In reply to this post by Michael Paquier-2


> On Mar 29, 2019, at 6:58 AM, Michael Paquier <[hidden email]> wrote:
>
>> On Thu, Mar 28, 2019 at 09:53:16AM -0600, Rob Sargent wrote:
>> This is pg10 so it's pg_wal.  ls -ltr
>>
>>
>> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33
>> 0000000100000CEA000000B1
>> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33
>> 0000000100000CEA000000B2
>>
>>  ... 217 more on through to ...
>>
>> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01
>> 0000000100000CEA000000E8
>> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01
>> 0000000100000CEA000000E9
>> -rw-------. 1 postgres postgres 16777216 Mar 28 09:46
>> 0000000100000CEA0000000E
>
Today there are 210 Mar 16 WAL files versus the originally reported 271.  I cannot at this point confirm the original count, other that to say I used “ls -ltr | grep ‘Mar 16’ | wc -l” to get both numbers.

Is it interesting that the earliest files (by ls time stamp) are not lowest numerically? Those two file names end “0000B[12]” (written at 16:33) in a range across the directory from “00001A” through “0000E9”? “0000B0” was written at16:53 and “0000B3” was written at 16:54
Is there any analysis of the file names I could do which might shed any light on the issue?

lup
Reply | Threaded
Open this post in threaded view
|

Re: stale WAL files?

lup
In reply to this post by Michael Paquier-2

> On Mar 29, 2019, at 6:58 AM, Michael Paquier <[hidden email]> wrote:
>
>> On Thu, Mar 28, 2019 at 09:53:16AM -0600, Rob Sargent wrote:
>> This is pg10 so it's pg_wal.  ls -ltr
>>
>>
>> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33
>> 0000000100000CEA000000B1
>> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33
>> 0000000100000CEA000000B2
>>
>>  ... 217 more on through to ...
>>
>> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01
>> 0000000100000CEA000000E8
>> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01
>> 0000000100000CEA000000E9
>> -rw-------. 1 postgres postgres 16777216 Mar 28 09:46
>> 0000000100000CEA0000000E
>
I’m now down to 208 Mar 16 WAL files so they are being processed (at least deleted).  I’ve taken a snapshot of the pg_wal dir such that I can see which files get processed. It’s none of the files I’ve listed previously

lup
Reply | Threaded
Open this post in threaded view
|

Re: stale WAL files?

lup


> On Mar 30, 2019, at 10:54 AM, Gmail <[hidden email]> wrote:
>
>
>>>> On Mar 29, 2019, at 6:58 AM, Michael Paquier <[hidden email]> wrote:
>>>
>>> On Thu, Mar 28, 2019 at 09:53:16AM -0600, Rob Sargent wrote:
>>> This is pg10 so it's pg_wal.  ls -ltr
>>>
>>>
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33
>>> 0000000100000CEA000000B1
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33
>>> 0000000100000CEA000000B2
>>>
>>> ... 217 more on through to ...
>>>
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01
>>> 0000000100000CEA000000E8
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01
>>> 0000000100000CEA000000E9
>>> -rw-------. 1 postgres postgres 16777216 Mar 28 09:46
>>> 0000000100000CEA0000000E
> I’m now down to 208 Mar 16 WAL files so they are being processed (at least deleted).  I’ve taken a snapshot of the pg_wal dir such that I can see which files get processed. It’s none of the files I’ve listed previously

Two more have been cleaned up.  001C and 001D generated at 16:38 Mar 16



Reply | Threaded
Open this post in threaded view
|

Re: stale WAL files?

ichbinrene

On Sat, Mar 30, 2019 at 5:03 PM Gmail <[hidden email]> wrote:


> On Mar 30, 2019, at 10:54 AM, Gmail <[hidden email]> wrote:
>
>
>>>> On Mar 29, 2019, at 6:58 AM, Michael Paquier <[hidden email]> wrote:
>>>
>>> On Thu, Mar 28, 2019 at 09:53:16AM -0600, Rob Sargent wrote:
>>> This is pg10 so it's pg_wal.  ls -ltr
>>>
>>>
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33
>>> 0000000100000CEA000000B1
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33
>>> 0000000100000CEA000000B2
>>>
>>> ... 217 more on through to ...
>>>
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01
>>> 0000000100000CEA000000E8
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01
>>> 0000000100000CEA000000E9
>>> -rw-------. 1 postgres postgres 16777216 Mar 28 09:46
>>> 0000000100000CEA0000000E
> I’m now down to 208 Mar 16 WAL files so they are being processed (at least deleted).  I’ve taken a snapshot of the pg_wal dir such that I can see which files get processed. It’s none of the files I’ve listed previously

Two more have been cleaned up.  001C and 001D generated at 16:38 Mar 16



Please share your complete postgresql.conf file and the results from this query: 
SELECT * FROM pg_settings;
has someone in the past configured wal archiving?
You've ran out of disk space as this log message you shared states:
No space left on device
what's the output of df -h

--
El genio es 1% inspiración y 99% transpiración.
Thomas Alva Edison
http://pglearn.blogspot.mx/

Reply | Threaded
Open this post in threaded view
|

Re: stale WAL files?

ichbinrene

On Mon, Apr 1, 2019 at 6:30 PM Rene Romero Benavides <[hidden email]> wrote:

On Sat, Mar 30, 2019 at 5:03 PM Gmail <[hidden email]> wrote:


> On Mar 30, 2019, at 10:54 AM, Gmail <[hidden email]> wrote:
>
>
>>>> On Mar 29, 2019, at 6:58 AM, Michael Paquier <[hidden email]> wrote:
>>>
>>> On Thu, Mar 28, 2019 at 09:53:16AM -0600, Rob Sargent wrote:
>>> This is pg10 so it's pg_wal.  ls -ltr
>>>
>>>
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33
>>> 0000000100000CEA000000B1
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33
>>> 0000000100000CEA000000B2
>>>
>>> ... 217 more on through to ...
>>>
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01
>>> 0000000100000CEA000000E8
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01
>>> 0000000100000CEA000000E9
>>> -rw-------. 1 postgres postgres 16777216 Mar 28 09:46
>>> 0000000100000CEA0000000E
> I’m now down to 208 Mar 16 WAL files so they are being processed (at least deleted).  I’ve taken a snapshot of the pg_wal dir such that I can see which files get processed. It’s none of the files I’ve listed previously

Two more have been cleaned up.  001C and 001D generated at 16:38 Mar 16



Please share your complete postgresql.conf file and the results from this query: 
SELECT * FROM pg_settings;
has someone in the past configured wal archiving?
You've ran out of disk space as this log message you shared states:
No space left on device
what's the output of df -h

--
El genio es 1% inspiración y 99% transpiración.
Thomas Alva Edison
http://pglearn.blogspot.mx/


BTW , how spread apart are checkpoints happening? do you have stats on that? maybe they're too spread apart and that's why WAL files cannot be recycled rapidly enough?  
--
El genio es 1% inspiración y 99% transpiración.
Thomas Alva Edison
http://pglearn.blogspot.mx/

lup
Reply | Threaded
Open this post in threaded view
|

Re: stale WAL files?

lup
I’m under fighting a nasty cold.  It may take a day or two for me to response to recent questions and suggestion.



lup
Reply | Threaded
Open this post in threaded view
|

Re: stale WAL files?

lup
In reply to this post by ichbinrene
Sorry folks, I’m still home nursing a nasty chest cold and my only tool today is an iPad.
I have failed to get the postgresql.conf into the copy buffer so that, along with the results of pg_settings, will have to wait for another day.

Today there are “only” 135 Mar 16 WAL files.  I haven’t sorted out which have been cleaned up but can do so if that’s thought to be helpful.

There is still 2.2G in the pg_wal directory but that disc has ~360G left. (I believe the burst of WAL files was the result of a novice using LIMIT with a to-Json function and the target table has >100M rows.)

Given that current WALs come and go regularly, I think the CHECKPOINT is running frequently enough (for normal loads at least).


On Apr 1, 2019, at 9:20 PM, Rene Romero Benavides <[hidden email]> wrote:


On Mon, Apr 1, 2019 at 6:30 PM Rene Romero Benavides <[hidden email]> wrote:

On Sat, Mar 30, 2019 at 5:03 PM Gmail <[hidden email]> wrote:


> On Mar 30, 2019, at 10:54 AM, Gmail <[hidden email]> wrote:
>
>
>>>> On Mar 29, 2019, at 6:58 AM, Michael Paquier <[hidden email]> wrote:
>>>
>>> On Thu, Mar 28, 2019 at 09:53:16AM -0600, Rob Sargent wrote:
>>> This is pg10 so it's pg_wal.  ls -ltr
>>>
>>>
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33
>>> 0000000100000CEA000000B1
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33
>>> 0000000100000CEA000000B2
>>>
>>> ... 217 more on through to ...
>>>
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01
>>> 0000000100000CEA000000E8
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01
>>> 0000000100000CEA000000E9
>>> -rw-------. 1 postgres postgres 16777216 Mar 28 09:46
>>> 0000000100000CEA0000000E
> I’m now down to 208 Mar 16 WAL files so they are being processed (at least deleted).  I’ve taken a snapshot of the pg_wal dir such that I can see which files get processed. It’s none of the files I’ve listed previously

Two more have been cleaned up.  001C and 001D generated at 16:38 Mar 16



Please share your complete postgresql.conf file and the results from this query: 
SELECT * FROM pg_settings;
has someone in the past configured wal archiving?
You've ran out of disk space as this log message you shared states:
No space left on device
what's the output of df -h

--
El genio es 1% inspiración y 99% transpiración.
Thomas Alva Edison
http://pglearn.blogspot.mx/


BTW , how spread apart are checkpoints happening? do you have stats on that? maybe they're too spread apart and that's why WAL files cannot be recycled rapidly enough?  
--
El genio es 1% inspiración y 99% transpiración.
Thomas Alva Edison
http://pglearn.blogspot.mx/

lup
Reply | Threaded
Open this post in threaded view
|

Re: stale WAL files?

lup
In reply to this post by ichbinrene


On Apr 1, 2019, at 9:20 PM, Rene Romero Benavides <[hidden email]> wrote:


On Mon, Apr 1, 2019 at 6:30 PM Rene Romero Benavides <[hidden email]> wrote:

On Sat, Mar 30, 2019 at 5:03 PM Gmail <[hidden email]> wrote:


> On Mar 30, 2019, at 10:54 AM, Gmail <[hidden email]> wrote:
>
>
>>>> On Mar 29, 2019, at 6:58 AM, Michael Paquier <[hidden email]> wrote:
>>>
>>> On Thu, Mar 28, 2019 at 09:53:16AM -0600, Rob Sargent wrote:
>>> This is pg10 so it's pg_wal.  ls -ltr
>>>
>>>
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33
>>> 0000000100000CEA000000B1
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33
>>> 0000000100000CEA000000B2
>>>
>>> ... 217 more on through to ...
>>>
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01
>>> 0000000100000CEA000000E8
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01
>>> 0000000100000CEA000000E9
>>> -rw-------. 1 postgres postgres 16777216 Mar 28 09:46
>>> 0000000100000CEA0000000E
> I’m now down to 208 Mar 16 WAL files so they are being processed (at least deleted).  I’ve taken a snapshot of the pg_wal dir such that I can see which files get processed. It’s none of the files I’ve listed previously

Two more have been cleaned up.  001C and 001D generated at 16:38 Mar 16



Please share your complete postgresql.conf file and the results from this query: 
SELECT * FROM pg_settings;
has someone in the past configured wal archiving?
You've ran out of disk space as this log message you shared states:
No space left on device
what's the output of df -h

--
El genio es 1% inspiración y 99% transpiración.
Thomas Alva Edison
http://pglearn.blogspot.mx/


BTW , how spread apart are checkpoints happening? do you have stats on that? maybe they're too spread apart and that's why WAL files cannot be recycled rapidly enough?  
--
El genio es 1% inspiración y 99% transpiración.
Thomas Alva Edison
http://pglearn.blogspot.mx/

I think my attempt to send postgresql.conf and pg_settings report ran afoul of the mailer. Here again by attachment.





pg_settings.log (1M) Download Attachment
postgresql.conf.hide (23K) Download Attachment
lup
Reply | Threaded
Open this post in threaded view
|

Re: stale WAL files?

lup
In reply to this post by ichbinrene


On Apr 1, 2019, at 9:20 PM, Rene Romero Benavides <[hidden email]> wrote:


On Mon, Apr 1, 2019 at 6:30 PM Rene Romero Benavides <[hidden email]> wrote:

On Sat, Mar 30, 2019 at 5:03 PM Gmail <[hidden email]> wrote:


> On Mar 30, 2019, at 10:54 AM, Gmail <[hidden email]> wrote:
>
>
>>>> On Mar 29, 2019, at 6:58 AM, Michael Paquier <[hidden email]> wrote:
>>>
>>> On Thu, Mar 28, 2019 at 09:53:16AM -0600, Rob Sargent wrote:
>>> This is pg10 so it's pg_wal.  ls -ltr
>>>
>>>
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33
>>> 0000000100000CEA000000B1
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33
>>> 0000000100000CEA000000B2
>>>
>>> ... 217 more on through to ...
>>>
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01
>>> 0000000100000CEA000000E8
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01
>>> 0000000100000CEA000000E9
>>> -rw-------. 1 postgres postgres 16777216 Mar 28 09:46
>>> 0000000100000CEA0000000E
> I’m now down to 208 Mar 16 WAL files so they are being processed (at least deleted).  I’ve taken a snapshot of the pg_wal dir such that I can see which files get processed. It’s none of the files I’ve listed previously

Two more have been cleaned up.  001C and 001D generated at 16:38 Mar 16



Please share your complete postgresql.conf file and the results from this query: 
SELECT * FROM pg_settings;
has someone in the past configured wal archiving?
You've ran out of disk space as this log message you shared states:
No space left on device
what's the output of df -h

--
El genio es 1% inspiración y 99% transpiración.
Thomas Alva Edison
http://pglearn.blogspot.mx/


BTW , how spread apart are checkpoints happening? do you have stats on that? maybe they're too spread apart and that's why WAL files cannot be recycled rapidly enough?  
--
El genio es 1% inspiración y 99% transpiración.
Thomas Alva Edison
http://pglearn.blogspot.mx/

two attempts (one in-line, one with attachement) at sending postgresql.conf and pg_settings report have been sent to a moderator.


Reply | Threaded
Open this post in threaded view
|

Re: stale WAL files?

ichbinrene


On Wed, Apr 3, 2019 at 1:05 PM Rob Sargent <[hidden email]> wrote:


On Apr 1, 2019, at 9:20 PM, Rene Romero Benavides <[hidden email]> wrote:


On Mon, Apr 1, 2019 at 6:30 PM Rene Romero Benavides <[hidden email]> wrote:

On Sat, Mar 30, 2019 at 5:03 PM Gmail <[hidden email]> wrote:


> On Mar 30, 2019, at 10:54 AM, Gmail <[hidden email]> wrote:
>
>
>>>> On Mar 29, 2019, at 6:58 AM, Michael Paquier <[hidden email]> wrote:
>>>
>>> On Thu, Mar 28, 2019 at 09:53:16AM -0600, Rob Sargent wrote:
>>> This is pg10 so it's pg_wal.  ls -ltr
>>>
>>>
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33
>>> 0000000100000CEA000000B1
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33
>>> 0000000100000CEA000000B2
>>>
>>> ... 217 more on through to ...
>>>
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01
>>> 0000000100000CEA000000E8
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01
>>> 0000000100000CEA000000E9
>>> -rw-------. 1 postgres postgres 16777216 Mar 28 09:46
>>> 0000000100000CEA0000000E
> I’m now down to 208 Mar 16 WAL files so they are being processed (at least deleted).  I’ve taken a snapshot of the pg_wal dir such that I can see which files get processed. It’s none of the files I’ve listed previously

Two more have been cleaned up.  001C and 001D generated at 16:38 Mar 16



Please share your complete postgresql.conf file and the results from this query: 
SELECT * FROM pg_settings;
has someone in the past configured wal archiving?
You've ran out of disk space as this log message you shared states:
No space left on device
what's the output of df -h

--
El genio es 1% inspiración y 99% transpiración.
Thomas Alva Edison
http://pglearn.blogspot.mx/


BTW , how spread apart are checkpoints happening? do you have stats on that? maybe they're too spread apart and that's why WAL files cannot be recycled rapidly enough?  
--
El genio es 1% inspiración y 99% transpiración.
Thomas Alva Edison
http://pglearn.blogspot.mx/

two attempts (one in-line, one with attachement) at sending postgresql.conf and pg_settings report have been sent to a moderator.



As per your configuration :
max_wal_size = 50GB
this seems to be the cause for the WAL files piling up.

this has been declared twice, the last one is taking effect.
--
El genio es 1% inspiración y 99% transpiración.
Thomas Alva Edison
http://pglearn.blogspot.mx/

lup
Reply | Threaded
Open this post in threaded view
|

Re: stale WAL files?

lup


On Apr 3, 2019, at 5:23 PM, Rene Romero Benavides <[hidden email]> wrote:



On Wed, Apr 3, 2019 at 1:05 PM Rob Sargent <[hidden email]> wrote:


On Apr 1, 2019, at 9:20 PM, Rene Romero Benavides <[hidden email]> wrote:


On Mon, Apr 1, 2019 at 6:30 PM Rene Romero Benavides <[hidden email]> wrote:

On Sat, Mar 30, 2019 at 5:03 PM Gmail <[hidden email]> wrote:


> On Mar 30, 2019, at 10:54 AM, Gmail <[hidden email]> wrote:
>
>
>>>> On Mar 29, 2019, at 6:58 AM, Michael Paquier <[hidden email]> wrote:
>>>
>>> On Thu, Mar 28, 2019 at 09:53:16AM -0600, Rob Sargent wrote:
>>> This is pg10 so it's pg_wal.  ls -ltr
>>>
>>>
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33
>>> 0000000100000CEA000000B1
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 16:33
>>> 0000000100000CEA000000B2
>>>
>>> ... 217 more on through to ...
>>>
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01
>>> 0000000100000CEA000000E8
>>> -rw-------. 1 postgres postgres 16777216 Mar 16 17:01
>>> 0000000100000CEA000000E9
>>> -rw-------. 1 postgres postgres 16777216 Mar 28 09:46
>>> 0000000100000CEA0000000E
> I’m now down to 208 Mar 16 WAL files so they are being processed (at least deleted).  I’ve taken a snapshot of the pg_wal dir such that I can see which files get processed. It’s none of the files I’ve listed previously

Two more have been cleaned up.  001C and 001D generated at 16:38 Mar 16



Please share your complete postgresql.conf file and the results from this query: 
SELECT * FROM pg_settings;
has someone in the past configured wal archiving?
You've ran out of disk space as this log message you shared states:
No space left on device
what's the output of df -h

--
El genio es 1% inspiración y 99% transpiración.
Thomas Alva Edison
http://pglearn.blogspot.mx/


BTW , how spread apart are checkpoints happening? do you have stats on that? maybe they're too spread apart and that's why WAL files cannot be recycled rapidly enough?  
--
El genio es 1% inspiración y 99% transpiración.
Thomas Alva Edison
http://pglearn.blogspot.mx/

two attempts (one in-line, one with attachement) at sending postgresql.conf and pg_settings report have been sent to a moderator.



As per your configuration :
max_wal_size = 50GB
this seems to be the cause for the WAL files piling up.

this has been declared twice, the last one is taking effect.

That’s an interesting catch.  Thank you.  I’ll have that reverted that to default.
Note that the WAL files are all the default 16M however.
Currently we’re down to 88 Mar 16 WAL files.  My inclination is to wait this out, to see if all of Mar 16 goes away quietly then reset our backups.


 
--
El genio es 1% inspiración y 99% transpiración.
Thomas Alva Edison
http://pglearn.blogspot.mx/


lup
Reply | Threaded
Open this post in threaded view
|

Re: stale WAL files?

lup
In reply to this post by ichbinrene



As per your configuration :
max_wal_size = 50GB
this seems to be the cause for the WAL files piling up.

this has been declared twice, the last one is taking effect.
--
El genio es 1% inspiración y 99% transpiración.
Thomas Alva Edison
http://pglearn.blogspot.mx/

I've manage to generate another 359 WAL files in a 10 minute span yesterday (now only 357 remain and I suspect they will wither away as before).  Are these being held simply because of the high max_wal_size value?

This is a development environment, wherein I'm loading 4M+ records, first into 41 staging tables 100K rows per.  In a loop over each staging table, the data is then placed into application tables via selects. First select * into "matching table" then select id into intersection record (id, fixed groupId).  Each such iteration is in it's own transaction.  I have dropped and recreate this same database numerous times working my way up from 100K to 4M records, dialing in application parameters according to number of primary records.  I have not, however, dropped the last incarnation.

12