Debian mips: Failed test 'Check expected t_009_tbl data on standby'

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Debian mips: Failed test 'Check expected t_009_tbl data on standby'

apt.postgresql.org repository
The 11rc1 build was failing on Debian mips:

cd /<<PKGBUILDDIR>>/build/../src/test/recovery && TESTDIR='/<<PKGBUILDDIR>>/build/src/test/recovery' PATH="/<<PKGBUILDDIR>>/build/tmp_install/usr/lib/postgresql/11/bin:$PATH" LD_LIBRARY_PATH="/<<PKGBUILDDIR>>/build/tmp_install/usr/lib/mips-linux-gnu" PGPORT='65432' PG_REGRESS='/<<PKGBUILDDIR>>/build/src/test/recovery/../../../src/test/regress/pg_regress' /usr/bin/prove -I /<<PKGBUILDDIR>>/build/../src/test/perl/ -I /<<PKGBUILDDIR>>/build/../src/test/recovery  t/*.pl
t/001_stream_rep.pl .................. ok
t/002_archiving.pl ................... ok
t/003_recovery_targets.pl ............ ok
t/004_timeline_switch.pl ............. ok
t/005_replay_delay.pl ................ ok
t/006_logical_decoding.pl ............ ok
t/007_sync_rep.pl .................... ok
t/008_fsm_truncation.pl .............. ok

#   Failed test 'Check expected t_009_tbl data on standby'
#   at t/009_twophase.pl line 405.
#          got: '1|issued to london
# 2|issued to london
# 5|issued to london
# 6|issued to london
# 9|issued to london
# 10|issued to london
# 11|issued to london
# 12|issued to london
# 13|issued to london
# 15|issued to london
# 16|issued to london
# 17|issued to london
# 18|issued to london
# 19|issued to london
# 20|issued to london
# 21|issued to london
# 22|issued to london
# 23|issued to paris
# 24|issued to paris
# 25|issued to london
# 26|issued to london
# |'
#     expected: '1|issued to london
# 2|issued to london
# 5|issued to london
# 6|issued to london
# 9|issued to london
# 10|issued to london
# 11|issued to london
# 12|issued to london
# 13|issued to london
# 14|issued to london
# 15|issued to london
# 16|issued to london
# 17|issued to london
# 18|issued to london
# 19|issued to london
# 20|issued to london
# 21|issued to london
# 22|issued to london
# 23|issued to paris
# 24|issued to paris
# 25|issued to london
# 26|issued to london'
# Looks like you failed 1 test of 20.
t/009_twophase.pl ....................
Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/20 subtests
t/010_logical_decoding_timelines.pl .. ok
t/011_crash_recovery.pl .............. ok
t/012_subtransactions.pl ............. ok
t/013_crash_restart.pl ............... ok
t/014_unlogged_reinit.pl ............. ok
t/015_promotion_pages.pl ............. ok

Test Summary Report
-------------------
t/009_twophase.pl                  (Wstat: 256 Tests: 20 Failed: 1)
  Failed test:  19
  Non-zero exit status: 1


The diff is that "14|issued to london" was expected, but "|" was
received.

https://buildd.debian.org/status/fetch.php?pkg=postgresql-11&arch=mips&ver=11~rc1-1&stamp=1539192131&raw=0

Unfortunately the build log does not contain the relevant
postmaster.log for this test.

The build succeeded when I retried it manually on a different box.

Is that something to be concerned about? JIT is active on this
platform, albeit with an older version (1:7~svn331965-1 vs 1:7-5).

Christoph

Reply | Threaded
Open this post in threaded view
|

Re: Debian mips: Failed test 'Check expected t_009_tbl data on standby'

Tom Lane-2
Christoph Berg <[hidden email]> writes:
> The 11rc1 build was failing on Debian mips:
> ...
> The diff is that "14|issued to london" was expected, but "|" was
> received.
> The build succeeded when I retried it manually on a different box.

Was it repeatable on the original box?

My gut feeling is that this is probably a test-case instability
(eg timing dependency) rather than an actual bug, but of course
it's hard to be sure.

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: Debian mips: Failed test 'Check expected t_009_tbl data on standby'

apt.postgresql.org repository
Re: Tom Lane 2018-10-11 <[hidden email]>
> Christoph Berg <[hidden email]> writes:
> > The 11rc1 build was failing on Debian mips:
> > ...
> > The diff is that "14|issued to london" was expected, but "|" was
> > received.
> > The build succeeded when I retried it manually on a different box.
>
> Was it repeatable on the original box?

I asked for a rebuild on the buildd network, which succeeded. It wasn't
the very same machine, though. The processor information says
"EdgeRouter Pro" both for the failing build and the manual build, the
3rd (good) build now is "Rhino Labs UTM8".

> My gut feeling is that this is probably a test-case instability
> (eg timing dependency) rather than an actual bug, but of course
> it's hard to be sure.

Looking at src/test/recovery/t/009_twophase.pl, this is testing 2PC,
and the missing row is from:

$cur_master->psql(
    'postgres', "
    BEGIN;
    INSERT INTO t_009_tbl VALUES (13, 'issued to ${cur_master_name}');
    SAVEPOINT s1;
    INSERT INTO t_009_tbl VALUES (14, 'issued to ${cur_master_name}');
    PREPARE TRANSACTION 'xact_009_6';
    COMMIT PREPARED 'xact_009_6';");
$cur_master->teardown_node;

So it is very strange that the "13" came through, but "14" did not.
Doesn't look like a test that could be unstable, at least not in that
way.

Christoph

Reply | Threaded
Open this post in threaded view
|

Re: Debian mips: Failed test 'Check expected t_009_tbl data on standby'

Michael Paquier-2
On Thu, Oct 11, 2018 at 09:49:27PM +0200, Christoph Berg wrote:

> Looking at src/test/recovery/t/009_twophase.pl, this is testing 2PC,
> and the missing row is from:
>
> $cur_master->psql(
>     'postgres', "
>     BEGIN;
>     INSERT INTO t_009_tbl VALUES (13, 'issued to ${cur_master_name}');
>     SAVEPOINT s1;
>     INSERT INTO t_009_tbl VALUES (14, 'issued to ${cur_master_name}');
>     PREPARE TRANSACTION 'xact_009_6';
>     COMMIT PREPARED 'xact_009_6';");
> $cur_master->teardown_node;
>
> So it is very strange that the "13" came through, but "14" did not.
> Doesn't look like a test that could be unstable, at least not in that
> way.
Do you still have the logs of the previous run for the standby?  I am
wondering if you could have seen this issue caused by duplicated XIDs
found in XLOG_RUNNING_XACTS:
https://www.postgresql.org/message-id/0c96b653-4696-d4b4-6b5d-78143175d113@...
That's hard to reproduce, still that could be possible if a standby
snapshot was taken in parallel of this transaction running on the
primary.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Debian mips: Failed test 'Check expected t_009_tbl data on standby'

apt.postgresql.org repository
Re: Michael Paquier 2018-10-12 <[hidden email]>
> Do you still have the logs of the previous run for the standby?

Sorry, all I have is the (link to) the build log in the original
posting. I can run some tests on the mips porter box if you have any
ideas for things to try.

What's missing is a way to determine which .log files to show in case
of a test failure. At the moment I'm showing the latest 3 by mtime:

        unset MAKELEVEL; if ! make -C build check-world EXTRA_REGRESS_OPTS='--port=$(shell perl -le 'print 1024 + int(rand(64000))')'; then \
            for l in `find build \( -name regression.diffs -o -name initdb.log -o -name postmaster.log \) | perl -we 'print map { "$$_\n"; } sort { (stat $$a)[9] <=> (stat $$b)[9] } map { chomp; $$_; } <>' | tail -3`; do \
                echo "******** $$l ********"; \
                cat $$l; \
            done; \
            case $(DEB_HOST_ARCH) in \
                hurd-*|kfreebsd-*) exit 0 ;; \
                *) exit 1 ;; \
            esac; \
        fi

Maybe I should change that to 10. Or just show all given it happens
only for failing builds. (In case anyone is wondering: hurd doesn't
have semaphores, and kfreebsd always fails the plperl tests.)

Christoph