Three animals fail test-decoding-check on REL_10_STABLE

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Three animals fail test-decoding-check on REL_10_STABLE

Thomas Munro-3
Hi,

Only gaur shows useful logs:

  SELECT 'init' FROM
pg_create_logical_replication_slot('regression_slot',
'test_decoding');
! ERROR:  could not access file "test_decoding": No such file or directory

Does this mean it didn't build the test_decoding module?

Of the failing animals, damselfly builds with the highest frequency,
and it reports the following 4 commits between the first failure[1]
and the preceding success (and has been failing ever since):

962da60591 Tue Jan 1 01:39:34 2019 UTC  Fix generation of padding
message before encrypting Elgamal in pgcrypto
bedda9fbb7 Mon Dec 31 21:57:57 2018 UTC  Process EXTRA_INSTALL
serially, during the first temp-install.
e7ebc8c285 Mon Dec 31 21:55:04 2018 UTC  Send EXTRA_INSTALL errors to
install.log, not stderr.
7c97b0f55e Mon Dec 31 21:51:18 2018 UTC  pg_regress: Promptly detect
failed postmaster startup.

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=damselfly&dt=2019-01-01%2010%3A39%3A41

--
Thomas Munro
http://www.enterprisedb.com

Reply | Threaded
Open this post in threaded view
|

Re: Three animals fail test-decoding-check on REL_10_STABLE

Tom Lane-2
Thomas Munro <[hidden email]> writes:
> Only gaur shows useful logs:

>   SELECT 'init' FROM
> pg_create_logical_replication_slot('regression_slot',
> 'test_decoding');
> ! ERROR:  could not access file "test_decoding": No such file or directory

> Does this mean it didn't build the test_decoding module?

I'm wondering if it built it but didn't install it, as a result of
some problem with

> bedda9fbb7 Mon Dec 31 21:57:57 2018 UTC  Process EXTRA_INSTALL
> serially, during the first temp-install.

Will take a look later, but since gaur is so slow, it may be awhile
before I have any answers.

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: Three animals fail test-decoding-check on REL_10_STABLE

Tom Lane-2
I wrote:
> Thomas Munro <[hidden email]> writes:
>> Does this mean it didn't build the test_decoding module?

> I'm wondering if it built it but didn't install it, as a result of
> some problem with
>> bedda9fbb7 Mon Dec 31 21:57:57 2018 UTC  Process EXTRA_INSTALL
>> serially, during the first temp-install.

So it appears that in v10,

        ./configure ... --enable-tap-tests ...
        make
        make install
        cd contrib/test_decoding
        make check

fails due to failure to install test_decoding into the tmp_install
tree, while it works in v11.  Moreover, that's not specific to
gaur: it happens on my Linux box too.  I'm not very sure why only
three buildfarm animals are unhappy --- maybe in the buildfarm
context it requires a specific combination of options to show the
problem.

There's no obvious difference between bedda9fbb and 6dd690be3,
so I surmise that that patch depended somehow on some previous
work that only went into v11 not v10.  Haven't found what, yet.

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: Three animals fail test-decoding-check on REL_10_STABLE

Tom Lane-2
I wrote:
> There's no obvious difference between bedda9fbb and 6dd690be3,
> so I surmise that that patch depended somehow on some previous
> work that only went into v11 not v10.  Haven't found what, yet.

Ah, looks like it was 42e61c774.  I'll push a fix shortly.

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: Three animals fail test-decoding-check on REL_10_STABLE

Tom Lane-2
In reply to this post by Tom Lane-2
I wrote:

> So it appears that in v10,
> ./configure ... --enable-tap-tests ...
> make
> make install
> cd contrib/test_decoding
> make check
> fails due to failure to install test_decoding into the tmp_install
> tree, while it works in v11.  Moreover, that's not specific to
> gaur: it happens on my Linux box too.  I'm not very sure why only
> three buildfarm animals are unhappy --- maybe in the buildfarm
> context it requires a specific combination of options to show the
> problem.

While I think I've fixed this bug, I'm still quite confused about why
only some buildfarm animals showed the problem.  Comparing log files,
it seems that the ones that were working were relying on having
done a complete temp-install at a higher level, while the ones that
were failing were trying to make a temp install from scratch in
contrib/test_decoding and hence seeing the bug.  For example,
longfin's test-decoding-check log starts out

napshot: 2019-01-11 21:12:17

/Applications/Xcode.app/Contents/Developer/usr/bin/make -C ../../src/test/regress all
/Applications/Xcode.app/Contents/Developer/usr/bin/make -C ../../../src/port all
/Applications/Xcode.app/Contents/Developer/usr/bin/make -C ../backend submake-errcodes
make[3]: Nothing to be done for `submake-errcodes'.

while gaur's starts out

Snapshot: 2019-01-11 07:30:45

rm -rf '/home/bfarm/bf-data/REL_10_STABLE/pgsql.build'/tmp_install
/bin/sh ../../config/install-sh -c -d '/home/bfarm/bf-data/REL_10_STABLE/pgsql.build'/tmp_install/log
make -C '../..' DESTDIR='/home/bfarm/bf-data/REL_10_STABLE/pgsql.build'/tmp_install install >'/home/bfarm/bf-data/REL_10_STABLE/pgsql.build'/tmp_install/log/install.log 2>&1
make -j1  checkprep >>'/home/bfarm/bf-data/REL_10_STABLE/pgsql.build'/tmp_install/log/install.log 2>&1
make -C ../../src/test/regress all
make[1]: Entering directory `/home/bfarm/bf-data/REL_10_STABLE/pgsql.build/src/test/regress'
make -C ../../../src/port all
make[2]: Entering directory `/home/bfarm/bf-data/REL_10_STABLE/pgsql.build/src/port'
make -C ../backend submake-errcodes
make[3]: Entering directory `/home/bfarm/bf-data/REL_10_STABLE/pgsql.build/src/backend'
make[3]: Nothing to be done for `submake-errcodes'.

These two animals are running the same buildfarm client version,
and I don't see any relevant difference in their configurations,
so why are they behaving differently?  Andrew, any ideas?

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: Three animals fail test-decoding-check on REL_10_STABLE

Andrew Dunstan

On 1/11/19 6:33 PM, Tom Lane wrote:

> I wrote:
>> So it appears that in v10,
>> ./configure ... --enable-tap-tests ...
>> make
>> make install
>> cd contrib/test_decoding
>> make check
>> fails due to failure to install test_decoding into the tmp_install
>> tree, while it works in v11.  Moreover, that's not specific to
>> gaur: it happens on my Linux box too.  I'm not very sure why only
>> three buildfarm animals are unhappy --- maybe in the buildfarm
>> context it requires a specific combination of options to show the
>> problem.
> While I think I've fixed this bug, I'm still quite confused about why
> only some buildfarm animals showed the problem.  Comparing log files,
> it seems that the ones that were working were relying on having
> done a complete temp-install at a higher level, while the ones that
> were failing were trying to make a temp install from scratch in
> contrib/test_decoding and hence seeing the bug.  For example,
> longfin's test-decoding-check log starts out
>
> napshot: 2019-01-11 21:12:17
>
> /Applications/Xcode.app/Contents/Developer/usr/bin/make -C ../../src/test/regress all
> /Applications/Xcode.app/Contents/Developer/usr/bin/make -C ../../../src/port all
> /Applications/Xcode.app/Contents/Developer/usr/bin/make -C ../backend submake-errcodes
> make[3]: Nothing to be done for `submake-errcodes'.
>
> while gaur's starts out
>
> Snapshot: 2019-01-11 07:30:45
>
> rm -rf '/home/bfarm/bf-data/REL_10_STABLE/pgsql.build'/tmp_install
> /bin/sh ../../config/install-sh -c -d '/home/bfarm/bf-data/REL_10_STABLE/pgsql.build'/tmp_install/log
> make -C '../..' DESTDIR='/home/bfarm/bf-data/REL_10_STABLE/pgsql.build'/tmp_install install >'/home/bfarm/bf-data/REL_10_STABLE/pgsql.build'/tmp_install/log/install.log 2>&1
> make -j1  checkprep >>'/home/bfarm/bf-data/REL_10_STABLE/pgsql.build'/tmp_install/log/install.log 2>&1
> make -C ../../src/test/regress all
> make[1]: Entering directory `/home/bfarm/bf-data/REL_10_STABLE/pgsql.build/src/test/regress'
> make -C ../../../src/port all
> make[2]: Entering directory `/home/bfarm/bf-data/REL_10_STABLE/pgsql.build/src/port'
> make -C ../backend submake-errcodes
> make[3]: Entering directory `/home/bfarm/bf-data/REL_10_STABLE/pgsql.build/src/backend'
> make[3]: Nothing to be done for `submake-errcodes'.
>
> These two animals are running the same buildfarm client version,
> and I don't see any relevant difference in their configurations,
> so why are they behaving differently?  Andrew, any ideas?
>
>



Possibly an error in 
https://github.com/PGBuildFarm/client-code/commit/3026438dcefebcc6fe2d44eb7b60812e257a0614


It looks like longfin detects that it has all it needs to proceed, and
so calls make with "NO_INSTALL=yes", but gaur doesn't.  Not sure why
that would be - if anything I'd expect the test to fail on OSX rather
than HP-UX. Is there something weird about naming of library files on HP-UX?


cheers


andrew



Reply | Threaded
Open this post in threaded view
|

Re: Three animals fail test-decoding-check on REL_10_STABLE

Tom Lane-2
Andrew Dunstan <[hidden email]> writes:
> On 1/11/19 6:33 PM, Tom Lane wrote:
>> While I think I've fixed this bug, I'm still quite confused about why
>> only some buildfarm animals showed the problem.

> ... Is there something weird about naming of library files on HP-UX?

Doh!  I looked right at this code last night, but it failed to click:

    # these files should be present if we've temp_installed everything,
    # and not if we haven't. The represent core, contrib and test_modules.
    return ( (-d $tmp_loc)
          && (-f "$bindir/postgres"       || -f "$bindir/postgres.exe")
          && (-f "$libdir/hstore.so"      || -f "$libdir/hstore.dll")
          && (-f "$libdir/test_parser.so" || -f "$libdir/test_parser.dll"));

On HPUX (at least the version gaur is running), the extension for
shared libraries is ".sl" not ".so".

That doesn't explain the failures on damselfly and koreaceratops,
but they're both running very old buildfarm clients, which most
likely just don't have the optimization to share a temp-install.

I wonder if it's practical to scrape DLSUFFIX out of src/Makefile.port
instead of listing all the possibilities here.  But I'm not sure how
you'd deal with this bit in Makefile.hpux:

ifeq ($(host_cpu), ia64)
   DLSUFFIX = .so
else
   DLSUFFIX = .sl
endif

Anyway, the bigger picture here is that the shared-temp-install
optimization is masking bugs in local "make check" rules.  Not
sure how much we care about that, though.  Any such bug is only
of interest to developers, and it only matters if someone actually
stumbles over it.

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: Three animals fail test-decoding-check on REL_10_STABLE

Andrew Dunstan-8

On 1/12/19 2:03 PM, Tom Lane wrote:

> Andrew Dunstan <[hidden email]> writes:
>> On 1/11/19 6:33 PM, Tom Lane wrote:
>>> While I think I've fixed this bug, I'm still quite confused about why
>>> only some buildfarm animals showed the problem.
>> ... Is there something weird about naming of library files on HP-UX?
> Doh!  I looked right at this code last night, but it failed to click:
>
>     # these files should be present if we've temp_installed everything,
>     # and not if we haven't. The represent core, contrib and test_modules.
>     return ( (-d $tmp_loc)
>           && (-f "$bindir/postgres"       || -f "$bindir/postgres.exe")
>           && (-f "$libdir/hstore.so"      || -f "$libdir/hstore.dll")
>           && (-f "$libdir/test_parser.so" || -f "$libdir/test_parser.dll"));
>
> On HPUX (at least the version gaur is running), the extension for
> shared libraries is ".sl" not ".so".
>
> That doesn't explain the failures on damselfly and koreaceratops,
> but they're both running very old buildfarm clients, which most
> likely just don't have the optimization to share a temp-install.


Yes, they are on an older version that doesn't use the NO_TEMP_INSTALL
flag at all.



> I wonder if it's practical to scrape DLSUFFIX out of src/Makefile.port
> instead of listing all the possibilities here.  But I'm not sure how
> you'd deal with this bit in Makefile.hpux:
>
> ifeq ($(host_cpu), ia64)
>    DLSUFFIX = .so
> else
>    DLSUFFIX = .sl
> endif


I'd rather get make to tell us directly, something like:


    .PHONY: show_dl_suffix
    show_dl_suffix:
        @echo $(DLSUFFIX)


I can arrange something like that in the buildfarm code if we think the
use case is too narrow.


> Anyway, the bigger picture here is that the shared-temp-install
> optimization is masking bugs in local "make check" rules.  Not
> sure how much we care about that, though.  Any such bug is only
> of interest to developers, and it only matters if someone actually
> stumbles over it.
>
>


right.


cheers


andrew


--
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Three animals fail test-decoding-check on REL_10_STABLE

Tom Lane-2
Andrew Dunstan <[hidden email]> writes:
> On 1/12/19 2:03 PM, Tom Lane wrote:
>> I wonder if it's practical to scrape DLSUFFIX out of src/Makefile.port
>> instead of listing all the possibilities here.

> I'd rather get make to tell us directly, something like:
>     .PHONY: show_dl_suffix
>     show_dl_suffix:
>         @echo $(DLSUFFIX)

No objection here, but of course you'd have to back-patch that into
all active branches.

(The Darwin case is slightly exciting, but it looks like you'd get
the right answer as long as Makefile.shlib doesn't get involved.)

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: Three animals fail test-decoding-check on REL_10_STABLE

Andrew Dunstan-8

On 1/13/19 9:24 AM, Tom Lane wrote:

> Andrew Dunstan <[hidden email]> writes:
>> On 1/12/19 2:03 PM, Tom Lane wrote:
>>> I wonder if it's practical to scrape DLSUFFIX out of src/Makefile.port
>>> instead of listing all the possibilities here.
>> I'd rather get make to tell us directly, something like:
>>     .PHONY: show_dl_suffix
>>     show_dl_suffix:
>>         @echo $(DLSUFFIX)
> No objection here, but of course you'd have to back-patch that into
> all active branches.
>
> (The Darwin case is slightly exciting, but it looks like you'd get
> the right answer as long as Makefile.shlib doesn't get involved.)
>
>



OK, I'll make that happen.


cheers


andrew


--
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services