[Fwd: RE: Build farm on Windows]

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

[Fwd: RE: Build farm on Windows]

Andrew Dunstan

Can one of the Windows buildfarm owners please try building and running
"make check" by hand rather than using the buildfarm script? It looks
like they all stopped reporting around the same time, and this might
give us a better clue about when things fall over.

Also, if you're up for it, please try reversing this patch, which looks
innocuous enough, but is the only thing I can see in the relevant time
period that looks at all suspicious:
http://archives.postgresql.org/pgsql-committers/2006-07/msg00256.php

cheers

andrew


-------- Original Message --------
Subject: RE: Build farm on Windows
Date: Fri, 28 Jul 2006 13:53:18 +1000
From: Phil Cairns <[hidden email]>
To: 'Andrew Dunstan' <[hidden email]>



Hi Andrew, this is yak calling from Australia.

I think I have a problem here with the HEAD build. The last few times I've
run the build, it has sat in "make check" for a long time (well over an
hour). According to the Task Manager, postmaster.exe is taking most of this
time, and it also seems to be leaking memory. After about an hour of running
today, postmaster.exe is using about 100MB of RAM, and is still busily
firing off instances of postgres.exe.

The process is hard to kill as well. It doesn't respond to a Ctrl+C in the
MSYS window, so I kill it by stopping postmaster.exe from within the Task
Manager, and it cleans things up from there.

Does this sound like something wrong with my setup? I'm pretty sure I
haven't changed anything since my last successful run 3 days ago.

All the best,
        Phil.



Reply | Threaded
Open this post in threaded view
|

Re: [Fwd: RE: Build farm on Windows]

Stefan Kaltenbrunner
Andrew Dunstan wrote:
> Can one of the Windows buildfarm owners please try building and running
> "make check" by hand rather than using the buildfarm script? It looks
> like they all stopped reporting around the same time, and this might
> give us a better clue about when things fall over.
>
> Also, if you're up for it, please try reversing this patch, which looks
> innocuous enough, but is the only thing I can see in the relevant time
> period that looks at all suspicious:
> http://archives.postgresql.org/pgsql-committers/2006-07/msg00256.php

will see what i can do(it definitly hangs in make check here too) - but
this issue seem to kill my box up to the point where it is impossible to
login(!) and i have to hard-reboot it.
Looks like it is churning CPU like mad when that happens ...


Stefan

Reply | Threaded
Open this post in threaded view
|

Re: [Fwd: RE: Build farm on Windows]

Stefan Kaltenbrunner
In reply to this post by Andrew Dunstan
Andrew Dunstan wrote:
> Can one of the Windows buildfarm owners please try building and running
> "make check" by hand rather than using the buildfarm script? It looks
> like they all stopped reporting around the same time, and this might
> give us a better clue about when things fall over.
>
> Also, if you're up for it, please try reversing this patch, which looks
> innocuous enough, but is the only thing I can see in the relevant time
> period that looks at all suspicious:
> http://archives.postgresql.org/pgsql-committers/2006-07/msg00256.php


looks like the postmaster fails to startup up:

./pg_regress --temp-install=./tmp_check --top-builddir=../../..
--temp-port=55678 --schedule=./parallel_schedule --multibyte=SQL_ASCII
--load-language=plpgsql
============== removing existing temp installation    ==============
============== creating temporary installation        ==============
============== initializing database system           ==============
============== starting postmaster                    ==============

pg_regress: postmaster did not start within 60 seconds
Examine ./log/postmaster.log for the reason
make[2]: *** [check] Error 2
make[2]: Leaving directory
`/home/pgbuild/pgfarmbuild/HEAD/pgsql.1436/src/test/regress'
make[1]: *** [check] Error 2
make[1]: Leaving directory
`/home/pgbuild/pgfarmbuild/HEAD/pgsql.1436/src/test'
make: *** [check] Error 2



and the logfile is full of:


FATAL:  failed to initialize timezone_abbreviations to "Default"
FATAL:  failed to initialize timezone_abbreviations to "Default"
LOG:  background writer process (PID 1568) exited with exit code 0
LOG:  terminating any other active server processes
FATAL:  failed to initialize timezone_abbreviations to "Default"
LOG:  all server processes terminated; reinitializing
FATAL:  failed to initialize timezone_abbreviations to "Default"
FATAL:  failed to initialize timezone_abbreviations to "Default"
LOG:  background writer process (PID 244) exited with exit code 0
LOG:  terminating any other active server processes
LOG:  all server processes terminated; reinitializing
FATAL:  failed to initialize timezone_abbreviations to "Default"
FATAL:  failed to initialize timezone_abbreviations to "Default"
LOG:  background writer process (PID 468) exited with exit code 0
LOG:  terminating any other active server processes

...


Stefan

Reply | Threaded
Open this post in threaded view
|

Re: [Fwd: RE: Build farm on Windows]

Andrew Dunstan
In reply to this post by Stefan Kaltenbrunner
Stefan Kaltenbrunner wrote:

> Andrew Dunstan wrote:
>  
>> Can one of the Windows buildfarm owners please try building and running
>> "make check" by hand rather than using the buildfarm script? It looks
>> like they all stopped reporting around the same time, and this might
>> give us a better clue about when things fall over.
>>
>> Also, if you're up for it, please try reversing this patch, which looks
>> innocuous enough, but is the only thing I can see in the relevant time
>> period that looks at all suspicious:
>> http://archives.postgresql.org/pgsql-committers/2006-07/msg00256.php
>>    
>
> will see what i can do(it definitly hangs in make check here too) - but
> this issue seem to kill my box up to the point where it is impossible to
> login(!) and i have to hard-reboot it.
> Looks like it is churning CPU like mad when that happens ...
>
>
>  

Does it get past the initdb stage? Past db startup? Past creating the
regression db? Run any tests and report results?

cheers

andrew


Reply | Threaded
Open this post in threaded view
|

Re: [Fwd: RE: Build farm on Windows]

Andrew Dunstan
In reply to this post by Stefan Kaltenbrunner

The TimeZone changes are looking might suspicious ...

cheers

andrew

Stefan Kaltenbrunner wrote:

> Andrew Dunstan wrote:
>  
>> Can one of the Windows buildfarm owners please try building and running
>> "make check" by hand rather than using the buildfarm script? It looks
>> like they all stopped reporting around the same time, and this might
>> give us a better clue about when things fall over.
>>
>> Also, if you're up for it, please try reversing this patch, which looks
>> innocuous enough, but is the only thing I can see in the relevant time
>> period that looks at all suspicious:
>> http://archives.postgresql.org/pgsql-committers/2006-07/msg00256.php
>>    
>
>
> looks like the postmaster fails to startup up:
>
> ./pg_regress --temp-install=./tmp_check --top-builddir=../../..
> --temp-port=55678 --schedule=./parallel_schedule --multibyte=SQL_ASCII
> --load-language=plpgsql
> ============== removing existing temp installation    ==============
> ============== creating temporary installation        ==============
> ============== initializing database system           ==============
> ============== starting postmaster                    ==============
>
> pg_regress: postmaster did not start within 60 seconds
> Examine ./log/postmaster.log for the reason
> make[2]: *** [check] Error 2
> make[2]: Leaving directory
> `/home/pgbuild/pgfarmbuild/HEAD/pgsql.1436/src/test/regress'
> make[1]: *** [check] Error 2
> make[1]: Leaving directory
> `/home/pgbuild/pgfarmbuild/HEAD/pgsql.1436/src/test'
> make: *** [check] Error 2
>
>
>
> and the logfile is full of:
>
>
> FATAL:  failed to initialize timezone_abbreviations to "Default"
> FATAL:  failed to initialize timezone_abbreviations to "Default"
> LOG:  background writer process (PID 1568) exited with exit code 0
> LOG:  terminating any other active server processes
> FATAL:  failed to initialize timezone_abbreviations to "Default"
> LOG:  all server processes terminated; reinitializing
> FATAL:  failed to initialize timezone_abbreviations to "Default"
> FATAL:  failed to initialize timezone_abbreviations to "Default"
> LOG:  background writer process (PID 244) exited with exit code 0
> LOG:  terminating any other active server processes
> LOG:  all server processes terminated; reinitializing
> FATAL:  failed to initialize timezone_abbreviations to "Default"
> FATAL:  failed to initialize timezone_abbreviations to "Default"
> LOG:  background writer process (PID 468) exited with exit code 0
> LOG:  terminating any other active server processes
>
> ...
>
>
> Stefan
>
>  


Reply | Threaded
Open this post in threaded view
|

Re: [Fwd: RE: Build farm on Windows]

Tom Lane-2
Andrew Dunstan <[hidden email]> writes:
> The TimeZone changes are looking might suspicious ...

>> FATAL:  failed to initialize timezone_abbreviations to "Default"

Hm.  It looks like this is working in the postmaster but failing
in subprocesses.  I'll see if I can duplicate it using EXEC_BACKEND.

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: [Fwd: RE: Build farm on Windows]

Tom Lane-2
I wrote:
> Andrew Dunstan <[hidden email]> writes:
>> The TimeZone changes are looking might suspicious ...

> FATAL:  failed to initialize timezone_abbreviations to "Default"

> Hm.  It looks like this is working in the postmaster but failing
> in subprocesses.  I'll see if I can duplicate it using EXEC_BACKEND.

Nope, works fine with EXEC_BACKEND, so it's something Windows-specific.
I'm not sure why you're not getting any more specific messages ---
they should be coming out at WARNING level AFAICS.  You'll need to trace
through load_tzoffsets() and see why it's failing in the subprocess.

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: [Fwd: RE: Build farm on Windows]

Stefan Kaltenbrunner
Tom Lane wrote:

> I wrote:
>> Andrew Dunstan <[hidden email]> writes:
>>> The TimeZone changes are looking might suspicious ...
>
>> FATAL:  failed to initialize timezone_abbreviations to "Default"
>
>> Hm.  It looks like this is working in the postmaster but failing
>> in subprocesses.  I'll see if I can duplicate it using EXEC_BACKEND.
>
> Nope, works fine with EXEC_BACKEND, so it's something Windows-specific.
> I'm not sure why you're not getting any more specific messages ---
> they should be coming out at WARNING level AFAICS.  You'll need to trace
> through load_tzoffsets() and see why it's failing in the subprocess.

that was a bit painful but we failed to see a useful error message due
to the fact that we have been activly suppressing it - with a quick hack
like:

---
/home/pgbuild/pgfarmbuild/HEAD/pgsql/src/backend/utils/misc/tzparser.c
    Tue Jul 25 05:51:21 2006
+++ src/backend/utils/misc/tzparser.c   Fri Jul 28 19:33:24 2006
@@ -326,7 +326,6 @@
        if (!tzFile)
        {
                /* at level 0, if file doesn't exist, guc.c's complaint
is enough */
-               if (errno != ENOENT || depth > 0)
                        ereport(tz_elevel,
                                        (errcode_for_file_access(),
                                         errmsg("could not read time
zone file \"%s\": %m",


(will probably get mangled by my mailer)


I get a much more useful:

WARNING:  could not read time zone file "Default": No such file or directory
FATAL:  failed to initialize timezone_abbreviations to "Default"
WARNING:  could not read time zone file "Default": No such file or directory
FATAL:  failed to initialize timezone_abbreviations to "Default"
LOG:  background writer process (PID 3776) exited with exit code 0
LOG:  terminating any other active server processes
WARNING:  could not read time zone file "Default": No such file or directory
FATAL:  failed to initialize timezone_abbreviations to "Default"
LOG:  all server processes terminated; reinitializing
WARNING:  could not read time zone file "Default": No such file or directory

which gives a strong further hint at the underlying issue.


Stefan

Reply | Threaded
Open this post in threaded view
|

Re: [Fwd: RE: Build farm on Windows]

Tom Lane-2
Stefan Kaltenbrunner <[hidden email]> writes:
> I get a much more useful:

> WARNING:  could not read time zone file "Default": No such file or directory
> FATAL:  failed to initialize timezone_abbreviations to "Default"

Hm, but why would the file not be there?  Try hacking it to print the
whole path it's trying to open, maybe that will help.

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: [Fwd: RE: Build farm on Windows]

Stefan Kaltenbrunner
Tom Lane wrote:
> Stefan Kaltenbrunner <[hidden email]> writes:
>> I get a much more useful:
>
>> WARNING:  could not read time zone file "Default": No such file or directory
>> FATAL:  failed to initialize timezone_abbreviations to "Default"
>
> Hm, but why would the file not be there?  Try hacking it to print the
> whole path it's trying to open, maybe that will help.

WARNING:  could not read time zone file
"/home/pgbuild/devel/pginst/share/postgresql/timezonesets/Default": No
such file or directory
FATAL:  failed to initialize timezone_abbreviations to "Default"
WARNING:  could not read time zone file
"/home/pgbuild/devel/pginst/share/postgresql/timezonesets/Default": No
such file or directory
FATAL:  failed to initialize timezone_abbreviations to "Default"
LOG:  background writer process (PID 1460) exited with exit code 0
LOG:  terminating any other active server processes
WARNING:  could not read time zone file
"/home/pgbuild/devel/pginst/share/postgresql/timezonesets/Default": No
such file or directory

$ ls -l /home/pgbuild/devel/pginst/share/postgresql/timezonesets/Default
-rw-r--r--    1 pgbuild  Administ    28630 Jul 28 20:03
/home/pgbuild/devel/pginst/share/postgresql/timezonesets/Default

so it's there but as a msys-virtual path - is that get passed to some
win32 function expecting a windows-style path ?



Stefan

Reply | Threaded
Open this post in threaded view
|

Re: [Fwd: RE: Build farm on Windows]

Tom Lane-2
Stefan Kaltenbrunner <[hidden email]> writes:
> WARNING:  could not read time zone file
> "/home/pgbuild/devel/pginst/share/postgresql/timezonesets/Default": No
> such file or directory

> $ ls -l /home/pgbuild/devel/pginst/share/postgresql/timezonesets/Default
> -rw-r--r--    1 pgbuild  Administ    28630 Jul 28 20:03
> /home/pgbuild/devel/pginst/share/postgresql/timezonesets/Default

> so it's there but as a msys-virtual path - is that get passed to some
> win32 function expecting a windows-style path ?

Hm.  We pass it to fopen().  The equivalent code in pgtz.c generates the
path to /timezone files exactly the same way, but uses open() ... is
there a difference in what they'll take?

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: [Fwd: RE: Build farm on Windows]

Andrew Dunstan
In reply to this post by Stefan Kaltenbrunner
Stefan Kaltenbrunner wrote:
> WARNING:  could not read time zone file
> "/home/pgbuild/devel/pginst/share/postgresql/timezonesets/Default": No
> such file or directory
>  

This is an MSys virtual path, of which postgres naturally knows
nothing.We should have made the appropriate calls to turn it into a
genuine Windows path. (Darn, not having a Windows box to test on is
annoying).

cheers

andrew

Reply | Threaded
Open this post in threaded view
|

Re: [Fwd: RE: Build farm on Windows]

Tom Lane-2
In reply to this post by Stefan Kaltenbrunner
Stefan Kaltenbrunner <[hidden email]> writes:
> WARNING:  could not read time zone file
> "/home/pgbuild/devel/pginst/share/postgresql/timezonesets/Default": No
> such file or directory

> so it's there but as a msys-virtual path - is that get passed to some
> win32 function expecting a windows-style path ?

Doh, I see what's the problem: we calculate the sharedir path using
my_exec_path, and falling back to the hardwired PGSHAREDIR path if
my_exec_path isn't correct.  The problem is that in a Windows
subprocess, my_exec_path isn't correct until read_backend_variables
has been done, and *that happens after InitializeGUCOptions* in
SubPostmasterMain().  So we're trying to set up the tz name data
before we have the path we need.

The reason I didn't notice this in testing with EXEC_BACKEND is that
I wasn't testing in a relocated installation, and so the fallback
get_share_path calculation got the right answer anyway.

Not sure about a clean fix.  Probably we'll have to do something
similar to the way TimeZone is handled, where we don't try to read
in the data until later on in the initialization sequence.

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: [Fwd: RE: Build farm on Windows]

Andrew Dunstan
Tom Lane wrote:

> Stefan Kaltenbrunner <[hidden email]> writes:
>  
>> WARNING:  could not read time zone file
>> "/home/pgbuild/devel/pginst/share/postgresql/timezonesets/Default": No
>> such file or directory
>>    
>
>  
>> so it's there but as a msys-virtual path - is that get passed to some
>> win32 function expecting a windows-style path ?
>>    
>
> Doh, I see what's the problem: we calculate the sharedir path using
> my_exec_path, and falling back to the hardwired PGSHAREDIR path if
> my_exec_path isn't correct.  The problem is that in a Windows
> subprocess, my_exec_path isn't correct until read_backend_variables
> has been done, and *that happens after InitializeGUCOptions* in
> SubPostmasterMain().  So we're trying to set up the tz name data
> before we have the path we need.
>  

Is there a reason we have to do things in this order? Could we just
postpone the call to InitializeGUCOptions() for a couple of lines?

If not, then ...

> The reason I didn't notice this in testing with EXEC_BACKEND is that
> I wasn't testing in a relocated installation, and so the fallback
> get_share_path calculation got the right answer anyway.
>
> Not sure about a clean fix.  Probably we'll have to do something
> similar to the way TimeZone is handled, where we don't try to read
> in the data until later on in the initialization sequence.
>
>
>  

I guess we'd need to set a flag that would postpone reading the data
just during the startup phase, but have it called immediately in all
other cases.

cheers

andrew


Reply | Threaded
Open this post in threaded view
|

Re: [Fwd: RE: Build farm on Windows]

Tom Lane-2
Andrew Dunstan <[hidden email]> writes:
> Is there a reason we have to do things in this order? Could we just
> postpone the call to InitializeGUCOptions() for a couple of lines?

Maybe, but I'm disinclined to mess with that.  I have a patch that
makes it work like TimeZone, but am having difficulty committing
... looks like that Polish script kiddie is at it again ...

                        regards, tom lane