011_crash_recovery.pl failes using wal_block_size=16K

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

011_crash_recovery.pl failes using wal_block_size=16K

walker
Hi, hackers


cd source_dir
./configure --enable-tap-tests --with-wal-blocksize=16
make world
make install-world
cd source_dir/src/test/recovery
make check PROVE_TESTS='t/011_crash_recovery.pl' PROVE_FLAGS='--verbose'

the output of the last command is:
011_crash_recovery.pl ..
1..3
ok 1 - own xid is in-progress
not ok 2 - new xid after restart is greater

#    Failed test 'new xid after restart is greater'
#    at t/011_crash_recovery.pl line 61
#       '485'
#             >
#       '485'
not ok 3 - xid is aborted after crash

#    Failed test 'xid is aborted after crash'
#    at t/011_crash_recovery.pl line 65.
#                got: 'committed'
#        expected: 'aborted'
# Looks like you failed 2 tests of 3.
Dubious test returned 2(stat 512, 0x200)
Failed 2/3 subtests
......


But if I modified something in t/011_crash_recovery.pl, this perl script works fine, as follows:
is($node->safe_psql('postgres'), qq[SELECT pg_xact_status('$xid');]),
             'in progress', 'own xid is in-progress');

sleep(1);  # here new added, just make sure the CREATE TABLE XLOG can be flushed into WAL segment file on disk.

# Crash and restart the postmaster
$node->stop('immediate');
$node->start;


I think the problem is that before crash(simulated by stop with immediate mode), the XLOG of "create table mine" didn't get flushed into wal file on disk. Instead, if delay some time, e.g. 200ms, or more after issue create table, in theory, the data in wal buffer should be written to disk by wal writer.

However, I'm not sure the root cause. what's the difference between wal_blocksize=8k and wal_blocksize=16k while flushing wal buffer data to disk?

thanks
walker







Reply | Threaded
Open this post in threaded view
|

Re: 011_crash_recovery.pl failes using wal_block_size=16K

Kyotaro Horiguchi-4
Oops! I forgot that the issue starts from this mail.

At Thu, 4 Mar 2021 22:34:38 +0800, "walker" <[hidden email]> wrote in
> 011_crash_recovery.pl ..
> 1..3
> ok 1 - own xid is in-progress
> not ok 2 - new xid after restart is greater

> But if I modified something in t/011_crash_recovery.pl, this perl script works fine, as follows:
> is($node-&gt;safe_psql('postgres'), qq[SELECT pg_xact_status('$xid');]),
> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;'in progress', 'own xid is in-progress');
>
>
> sleep(1); &nbsp;# here new added,&nbsp;just make sure the CREATE TABLE XLOG can be flushed into WAL segment file on disk.

The sleep let the unwriten WAL records go out to disk (buffer).

> I think the problem is that before crash(simulated by stop with immediate mode), the XLOG of "create table mine" didn't get flushed into wal file on disk. Instead, if delay some time, e.g. 200ms, or more after issue create table, in theory, the data in wal buffer should be written to disk by wal writer.

Right.

> However, I'm not sure the root cause. what's the difference between wal_blocksize=8k and wal_blocksize=16k while flushing wal buffer data to disk?

I'm sorry that I didn't follow this message.  However, the explanation
is in the following mail.

https://www.postgresql.org/message-id/20210305.135342.384699732619433016.horikyota.ntt%40gmail.com

In short, the doubled block size prevents wal-writes from happen.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center