Workaround for NFS issue when using cross-test-ssh.sh

Message ID 87ed1335-2bf5-4446-a313-15bf7b959246@BAMAIL02.ba.imgtec.org
State New, archived
Headers

Commit Message

Steve Ellcey Feb. 12, 2016, 10:12 p.m. UTC
  This is a followup to the email I sent out about issues I was having testing
on a remote machine with NFS and using cross-test-ssh.sh.

	https://sourceware.org/ml/libc-alpha/2016-02/msg00270.html

The problem is that a directory and a file (in that directory) are created
on the remote machine where tests are being run and then that file is being
accessed on the host machine where make is being run and the file is not
being seen on the host machine.  I found that if I touch the directory that
the new directory was created in before accessing the file I want, the problem
does not occur.

I would love to explain exactly why this is, but I don't know.  I am guessing
that it is some bug or limitation in NFS, possibly in the specific version
of NFS I am using but I really don't know.  Nevertheless, I was hoping I
could check in this patch since it fixes the problem for me and should not
cause any other users any problems and it would make it much easier for us
to run the glibc testsuite in our setup if this patch was in the glibc sources.

Tested by running the glibc testsuite on an x86 linux box, using
cross-test-ssh.sh to run programs on a MIPS box.

What say thee?

Steve Ellcey
sellcey@imgtec.com


2016-02-12  Steve Ellcey  <sellcey@imgtec.com>

 	* localedata/gen-locale.sh (generate_locale): Touch directory before
	touching file.
	* timezone/Makefile: Ditto.
  

Comments

Joseph Myers Feb. 12, 2016, 10:38 p.m. UTC | #1
I don't think it makes sense to put such workarounds for a fundamentally 
unreliable environment in particular tests.  You simply need to find 
appropriate NFS mount settings on all systems involved to ensure that no 
problematic caching occurs, or flush caches explicitly in 
cross-test-ssh.sh (and I think it will be a lot easier if the build system 
exports its filesystem to the test system, rather than both getting a 
filesystem from a third system).
  
Steve Ellcey Feb. 12, 2016, 11:12 p.m. UTC | #2
On Fri, 2016-02-12 at 22:38 +0000, Joseph Myers wrote:
> I don't think it makes sense to put such workarounds for a fundamentally 
> unreliable environment in particular tests.  You simply need to find 
> appropriate NFS mount settings on all systems involved to ensure that no 
> problematic caching occurs, or flush caches explicitly in 
> cross-test-ssh.sh (and I think it will be a lot easier if the build system 
> exports its filesystem to the test system, rather than both getting a 
> filesystem from a third system).

In an ideal world I would agree with you.  But to do that I must
restrict my builds and my testing to machines I have root access to so
that I can do the NFS mounts in the required manner.  I have some
machines like that but I also have access to a second set of machines
where I cannot change the NFS settings or make other root level changes
because I share them with other groups, these machines do all have
access to shared filesystems like /users that live on dedicated NFS
servers and I could use them for builds and testing if it were not for
this one problem.

The other option of course is to create a branch and put my changes
there for use locally, but I would also like to avoid that if possible
since I think doing builds and testing directly on the main branch is
preferable to using a local branch.  It is too easy to put patches or
other fixes on a local branch and never get them upstreamed if you do
all your work on local branches.

Steve Ellcey
sellcey@imgtec.com
  
Joseph Myers Feb. 12, 2016, 11:27 p.m. UTC | #3
On Fri, 12 Feb 2016, Steve Ellcey wrote:

> On Fri, 2016-02-12 at 22:38 +0000, Joseph Myers wrote:
> > I don't think it makes sense to put such workarounds for a fundamentally 
> > unreliable environment in particular tests.  You simply need to find 
> > appropriate NFS mount settings on all systems involved to ensure that no 
> > problematic caching occurs, or flush caches explicitly in 
> > cross-test-ssh.sh (and I think it will be a lot easier if the build system 
> > exports its filesystem to the test system, rather than both getting a 
> > filesystem from a third system).
> 
> In an ideal world I would agree with you.  But to do that I must
> restrict my builds and my testing to machines I have root access to so
> that I can do the NFS mounts in the required manner.  I have some
> machines like that but I also have access to a second set of machines
> where I cannot change the NFS settings or make other root level changes
> because I share them with other groups, these machines do all have
> access to shared filesystems like /users that live on dedicated NFS
> servers and I could use them for builds and testing if it were not for
> this one problem.

I very much doubt any local changes to a few tests can reliably help here.  
Requiring a coherent filesystem view is just like requiring the test 
system not to suffer from random memory corruption - if you don't have an 
environment that will actually execute the intended programs with the 
intended filesystem view, just about anything in the testsuite will fail 
at random, and putting in a few workarounds (as opposed to explicit hooks 
in cross-test-ssh.sh to insert cache flushes / barriers to force changes 
made on the clients to be visible on the file server, and to force changes 
the server has seen to be visible on the clients, if such operations are 
possible from the command line) is much like automatically retrying tests 
that segfault in case it was unreliable memory, based on a list of the 
tests with the greatest dependence on reliable memory (or like a hook that 
says "this test overheats my processor, so wait X time afterwards for it 
to cool down" - which is obviously a local-only hack).
  

Patch

diff --git a/localedata/gen-locale.sh b/localedata/gen-locale.sh
index d471086..8b22feb 100644
--- a/localedata/gen-locale.sh
+++ b/localedata/gen-locale.sh
@@ -37,6 +37,12 @@  generate_locale ()
     # The makefile checks the timestamp of the LC_CTYPE file,
     # but localedef won't have touched it if it was able to
     # hard-link it to an existing file.
+    #
+    # Touching the localedata directory, which may have been
+    # created on a remote system, before touching the LC_CTYPE
+    # file works around an NFS issue that can affect some systems
+    # when cross-testing.
+    touch ${common_objpfx}localedata
     touch ${common_objpfx}localedata/$out/LC_CTYPE
   else
     echo "Charmap: \"${charmap}\" Inputfile: \"${input}\"" \
diff --git a/timezone/Makefile b/timezone/Makefile
index dee7568..5cdb5ca 100644
--- a/timezone/Makefile
+++ b/timezone/Makefile
@@ -70,8 +70,14 @@  CFLAGS-zic.c = $(tz-cflags) -Wno-unused-variable
 # We have to make sure the data for testing the tz functions is available.
 # Don't add leapseconds here since test-tz made checks that work only without
 # leapseconds.
+#
+# Touching the testdata directory, which may have been created on a remote
+# system, before accessing the results in $(evaluate-test) works around an
+# NFS issue that can affect some systems when cross-testing.
+
 define build-testdata
 $(built-program-cmd) -d $(testdata) -y ./yearistype $<; \
+touch $(testdata)
 $(evaluate-test)
 endef