LDAP check flapping on crake due to race

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

LDAP check flapping on crake due to race

Thomas Munro-5
Hi,

There are one or two failures per month on crake.  It looks like when
authentication is rejected, as expected in the tests, the psql process
is exiting, but there is a race where the Perl script still wants to
write a dummy query to its stdin (?), so you get:

psql: FATAL:  LDAP authentication failed for user "test1"
ack Broken pipe: write( 13, 'SELECT 1' ) at
/usr/share/perl5/vendor_perl/IPC/Run/IO.pm line 549.

Example:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake&dt=2019-11-10%2023%3A36%3A04

tmunro=> select animal, snapshot, branch from run where fail_stage =
'ldapCheck' order by snapshot desc;
 animal |      snapshot       |    branch
--------+---------------------+---------------
 crake  | 2020-08-02 02:32:30 | REL_13_STABLE
 crake  | 2020-07-22 23:36:04 | REL_12_STABLE
 crake  | 2020-07-14 00:52:04 | REL_13_STABLE
 crake  | 2020-05-15 17:35:05 | REL_11_STABLE
 crake  | 2020-04-07 20:51:03 | REL_12_STABLE
 mantid | 2020-03-04 18:17:58 | REL_12_STABLE
 mantid | 2020-03-04 17:59:58 | REL_11_STABLE
 crake  | 2020-01-17 14:33:21 | REL_12_STABLE
 crake  | 2019-11-10 23:36:04 | REL_11_STABLE
 crake  | 2019-09-09 08:48:25 | HEAD
 crake  | 2019-08-05 21:18:23 | REL_12_STABLE
 crake  | 2019-07-19 01:33:31 | HEAD
 crake  | 2019-07-16 01:06:02 | REL_11_STABLE
(13 rows)

(Ignore mantid, it had a temporary setup problem that was resolved.)


Reply | Threaded
Open this post in threaded view
|

Re: LDAP check flapping on crake due to race

Noah Misch-2
On Sun, Aug 02, 2020 at 05:29:57PM +1200, Thomas Munro wrote:
> There are one or two failures per month on crake.  It looks like when
> authentication is rejected, as expected in the tests, the psql process
> is exiting, but there is a race where the Perl script still wants to
> write a dummy query to its stdin (?), so you get:
>
> psql: FATAL:  LDAP authentication failed for user "test1"
> ack Broken pipe: write( 13, 'SELECT 1' ) at
> /usr/share/perl5/vendor_perl/IPC/Run/IO.pm line 549.

Do you suppose a fix like e12a472 would cover this?  ("psql <&-" fails with
status 1 after successful authentication, and authentication failure gives
status 2.)


Reply | Threaded
Open this post in threaded view
|

Re: LDAP check flapping on crake due to race

Tom Lane-2
Noah Misch <[hidden email]> writes:
> On Sun, Aug 02, 2020 at 05:29:57PM +1200, Thomas Munro wrote:
>> There are one or two failures per month on crake.  It looks like when
>> authentication is rejected, as expected in the tests, the psql process
>> is exiting, but there is a race where the Perl script still wants to
>> write a dummy query to its stdin (?), so you get:
>> psql: FATAL:  LDAP authentication failed for user "test1"
>> ack Broken pipe: write( 13, 'SELECT 1' ) at
>> /usr/share/perl5/vendor_perl/IPC/Run/IO.pm line 549.

> Do you suppose a fix like e12a472 would cover this?  ("psql <&-" fails with
> status 1 after successful authentication, and authentication failure gives
> status 2.)

AFAICT the failure is happening down inside PostgresNode::psql's call
of IPC::Run::run, so we don't really have the option to adjust things
in exactly that way.  (And messing with sub psql for the benefit of
this one caller seems pretty risky anyway.)

I'm inclined to suggest that the LDAP test's test_access could use
an empty stdin and pass "-c 'SELECT 1'" as a command line option
instead.  (Maybe that's exactly what you meant, but I'm not sure.)

I've not been able to duplicate this locally, so I have no idea if
that'd really fix it.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: LDAP check flapping on crake due to race

Thomas Munro-5
On Mon, Aug 3, 2020 at 4:09 AM Tom Lane <[hidden email]> wrote:
> I'm inclined to suggest that the LDAP test's test_access could use
> an empty stdin and pass "-c 'SELECT 1'" as a command line option
> instead.  (Maybe that's exactly what you meant, but I'm not sure.)

Good idea.  Here's a patch like that.

> I've not been able to duplicate this locally, so I have no idea if
> that'd really fix it.

Me neither -- I guess someone who enjoys perl could hack IPC::Run to
take a short nap at the right moment.

0001-Fix-rare-failure-in-LDAP-tests.patch (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: LDAP check flapping on crake due to race

Noah Misch-2
On Mon, Aug 03, 2020 at 12:12:57PM +1200, Thomas Munro wrote:
> On Mon, Aug 3, 2020 at 4:09 AM Tom Lane <[hidden email]> wrote:
> > I'm inclined to suggest that the LDAP test's test_access could use
> > an empty stdin and pass "-c 'SELECT 1'" as a command line option
> > instead.  (Maybe that's exactly what you meant, but I'm not sure.)
>
> Good idea.  Here's a patch like that.

While I had meant a different approach, this is superior.

> > I've not been able to duplicate this locally, so I have no idea if
> > that'd really fix it.
>
> Me neither -- I guess someone who enjoys perl could hack IPC::Run to
> take a short nap at the right moment.

Not essential to reproduce first, I think.


Reply | Threaded
Open this post in threaded view
|

Re: LDAP check flapping on crake due to race

Thomas Munro-5
On Mon, Aug 3, 2020 at 12:29 PM Noah Misch <[hidden email]> wrote:
> On Mon, Aug 03, 2020 at 12:12:57PM +1200, Thomas Munro wrote:
> > On Mon, Aug 3, 2020 at 4:09 AM Tom Lane <[hidden email]> wrote:
> > > I'm inclined to suggest that the LDAP test's test_access could use
> > > an empty stdin and pass "-c 'SELECT 1'" as a command line option
> > > instead.  (Maybe that's exactly what you meant, but I'm not sure.)
> >
> > Good idea.  Here's a patch like that.
>
> While I had meant a different approach, this is superior.

Thanks.  Pushed.