[committed,buildbot] Replace the aarch64 build slave

Message ID 5049068a-9243-e699-5dda-bde6d97f832c@arm.com
State Committed
Headers

Commit Message

Szabolcs Nagy Oct. 5, 2018, 10:22 a.m. UTC
  This one is a thunderx machine.

(the other one was down for a while now.)

i assume the slave will be able to connect once there is a server restart.
  

Comments

Tulio Magno Quites Machado Filho Oct. 8, 2018, 1:53 p.m. UTC | #1
Szabolcs Nagy <szabolcs.nagy@arm.com> writes:

> This one is a thunderx machine.
>
> (the other one was down for a while now.)
>
> i assume the slave will be able to connect once there is a server restart.

The server has just been restarted.

If the new slave doesn't reconnect in the following minutes, we'll have to
analyze its log.

Thanks!
  
Szabolcs Nagy Oct. 8, 2018, 2:46 p.m. UTC | #2
On 08/10/18 14:53, Tulio Magno Quites Machado Filho wrote:
> Szabolcs Nagy <szabolcs.nagy@arm.com> writes:

>> i assume the slave will be able to connect once there is a server restart.

> 

> The server has just been restarted.

> 

> If the new slave doesn't reconnect in the following minutes, we'll have to

> analyze its log.


sorry i stopped the slaves, since it could not connect previously.

now i restarted it and it fails with

2018-10-08 14:44:22+0000 [-] Connection to 144.217.14.79:9989 failed: [Failure instance: Traceback (failure with no frames): <class
'twisted.internet.error.TimeoutError'>: User timeout caused connection failure.
  
Tulio Magno Quites Machado Filho Oct. 8, 2018, 3:23 p.m. UTC | #3
Szabolcs Nagy <Szabolcs.Nagy@arm.com> writes:

> On 08/10/18 14:53, Tulio Magno Quites Machado Filho wrote:
>> Szabolcs Nagy <szabolcs.nagy@arm.com> writes:
>>> i assume the slave will be able to connect once there is a server restart.
>> 
>> The server has just been restarted.
>> 
>> If the new slave doesn't reconnect in the following minutes, we'll have to
>> analyze its log.
>
> sorry i stopped the slaves, since it could not connect previously.
>
> now i restarted it and it fails with
>
> 2018-10-08 14:44:22+0000 [-] Connection to 144.217.14.79:9989 failed: [Failure instance: Traceback (failure with no frames): <class
> 'twisted.internet.error.TimeoutError'>: User timeout caused connection failure.

That port is wrong.  It should have been 9991.
You have to change that in the buildbot.tac:

port = 9991
  
Szabolcs Nagy Oct. 8, 2018, 4:53 p.m. UTC | #4
On 08/10/18 16:23, Tulio Magno Quites Machado Filho wrote:
> Szabolcs Nagy <Szabolcs.Nagy@arm.com> writes:

> 

>> On 08/10/18 14:53, Tulio Magno Quites Machado Filho wrote:

>>> Szabolcs Nagy <szabolcs.nagy@arm.com> writes:

>>>> i assume the slave will be able to connect once there is a server restart.

>>>

>>> The server has just been restarted.

>>>

>>> If the new slave doesn't reconnect in the following minutes, we'll have to

>>> analyze its log.

>>

>> sorry i stopped the slaves, since it could not connect previously.

>>

>> now i restarted it and it fails with

>>

>> 2018-10-08 14:44:22+0000 [-] Connection to 144.217.14.79:9989 failed: [Failure instance: Traceback (failure with no frames): <class

>> 'twisted.internet.error.TimeoutError'>: User timeout caused connection failure.

> 

> That port is wrong.  It should have been 9991.

> You have to change that in the buildbot.tac:

> 

> port = 9991

> 


thanks, fixed, and updated the wiki to mention the nondefault port.
  
Szabolcs Nagy Oct. 9, 2018, 9:55 a.m. UTC | #5
On 08/10/18 17:53, Szabolcs Nagy wrote:
> On 08/10/18 16:23, Tulio Magno Quites Machado Filho wrote:

>> That port is wrong.  It should have been 9991.

>> You have to change that in the buildbot.tac:

>>

>> port = 9991

> 

> thanks, fixed, and updated the wiki to mention the nondefault port.


the first build is red, there are two failures, both are timeouts:

libio/tst-readline takes more than 80s
nss/tst-nss-files-hosts-multi takes about 30s

i assume it's ok to set TIMEOUTFACTOR=5 in the bot environment
or should we raise the TIMEOUT of these particular tests?

XPASS: elf/tst-protected1a
XPASS: elf/tst-protected1b
UNSUPPORTED: iconv/tst-gconv-init-failure
FAIL: libio/tst-readline
UNSUPPORTED: math/test-fesetexcept-traps
UNSUPPORTED: math/test-fexcept-traps
UNSUPPORTED: math/test-nearbyint-except-2
UNSUPPORTED: misc/tst-pkey
UNSUPPORTED: nptl/test-cond-printers
UNSUPPORTED: nptl/test-condattr-printers
UNSUPPORTED: nptl/test-mutex-printers
UNSUPPORTED: nptl/test-mutexattr-printers
UNSUPPORTED: nptl/test-rwlock-printers
UNSUPPORTED: nptl/test-rwlockattr-printers
FAIL: nss/tst-nss-files-hosts-multi
UNSUPPORTED: posix/tst-spawn4-compat
UNSUPPORTED: resolv/tst-resolv-ai_idn
UNSUPPORTED: resolv/tst-resolv-ai_idn-latin1
Summary of test results:
      2 FAIL
   5815 PASS
     14 UNSUPPORTED
     17 XFAIL
      2 XPASS
Makefile:401: recipe for target 'tests' failed
make[1]: *** [tests] Error 1
  
Joseph Myers Oct. 9, 2018, 11:44 a.m. UTC | #6
On Tue, 9 Oct 2018, Szabolcs Nagy wrote:

> the first build is red, there are two failures, both are timeouts:
> 
> libio/tst-readline takes more than 80s
> nss/tst-nss-files-hosts-multi takes about 30s
> 
> i assume it's ok to set TIMEOUTFACTOR=5 in the bot environment
> or should we raise the TIMEOUT of these particular tests?

If only a few tests are timing out, and there are good reasons for them to 
time out on slow systems (amount of processing or I/O involved), then I 
think raising those tests' TIMEOUT is appropriate.
  
Tulio Magno Quites Machado Filho Oct. 9, 2018, 12:53 p.m. UTC | #7
Joseph Myers <joseph@codesourcery.com> writes:

> On Tue, 9 Oct 2018, Szabolcs Nagy wrote:
>
>> the first build is red, there are two failures, both are timeouts:
>> 
>> libio/tst-readline takes more than 80s
>> nss/tst-nss-files-hosts-multi takes about 30s
>> 
>> i assume it's ok to set TIMEOUTFACTOR=5 in the bot environment
>> or should we raise the TIMEOUT of these particular tests?
>
> If only a few tests are timing out, and there are good reasons for them to 
> time out on slow systems (amount of processing or I/O involved), then I 
> think raising those tests' TIMEOUT is appropriate.

I agree with Joseph.

But answering your initial question: we can indeed change TIMEOUTFACTOR in the
bot.
We can tune it for each slave, if necessary.
  

Patch

diff --git a/master.cfg b/master.cfg
index 164d309..701def3 100644
--- a/master.cfg
+++ b/master.cfg
@@ -26,7 +26,7 @@  builder_map = {
   'glibc-ppc-linux': ['debian8-ppc-power8-1'],
   'glibc-ppc64le-linux': ['fedora25-ppc64le-power8-1'],
   'glibc-s390x-linux': ['marist-fedora-s390x'],
-  'glibc-aarch64-linux': ['reservedbit-xgene-ubuntu-aarch64'],
+  'glibc-aarch64-linux': ['tx1-ubuntu-aarch64'],
 }
 
 # Sets with all builders and all slaves.