diff mbox

gnu: Add dlib.

Message ID 87eg5e4g4r.fsf@ike.i-did-not-set--mail-host-address--so-tickle-me
State New
Headers show

Commit Message

Marius Bakke Aug. 24, 2016, 10:26 a.m. UTC
Marius Bakke <mbakke@fastmail.com> writes:

>> Without OpenBLAS dlib will use an internal BLAS implementation. I'm
>> fairly certain that will at least fix the crash on x86_64, which was
>> a segfault in libopenblasp-r0.2.15.so when we had LAPACK in inputs, but
>> seems to consistently trigger on Hydra regardless.
>>
>> I got busy this weekend, but will try to reproduce the i686 errors this
>> week; and also check if the newer openblas in core-updates solves the
>> x86_64 segfault.
>>
>> Stay tuned...
>
> Short update: I can successfully reproduce the i686 test failures simply
> by `guix build --system=i686-linux` on x86_64. Removing OpenBLAS from
> inputs had no effect. Now to figure out what's going on..

There are a couple of things going on in this thread:

1. Segfault on x86_64. This seems to have been resolved simply by
updating OpenBLAS. At least, I'm no longer able to reproduce it even
with LAPACK in inputs. So, that should fix the Hydra x86_64 build.
Can the OpenBLAS update be cherry-picked to master?

2. i686 test failures. Updating OpenBLAS fixed 1/5 errors. The remaining
four are reproducible on 32-bit Ubuntu, so they do not seem Guix
related. Upstream has been notified.

3. ARM failures. I don't have ARM hardware to test on, but I'm guessing
it's similar to i686 (i.e. not directly Guix related).

Adding "#:parallel-build? #f" had no effect on tests, indeed the check
phase does not seem to use the previously built dlib; it builds it again
without parallel-build. I will try reproducing the non-reproducibility
on some higher end hardware, hopefully this week.

I've also found that FFTW is no longer used, apparently due to thread
safety issues. So I'd appreciate if the following patch can be added.
Apologies for not catching the missing reference earlier, I will be more
careful in the future (fftw was added in the last minute..).

Comments

Leo Famulari Aug. 24, 2016, 5:26 p.m. UTC | #1
On Wed, Aug 24, 2016 at 11:26:28AM +0100, Marius Bakke wrote:
> There are a couple of things going on in this thread:
> 
> 1. Segfault on x86_64. This seems to have been resolved simply by
> updating OpenBLAS. At least, I'm no longer able to reproduce it even
> with LAPACK in inputs. So, that should fix the Hydra x86_64 build.
> Can the OpenBLAS update be cherry-picked to master?

I'd say it depends on whether the OpenBLAS users are building
successfully on core-updates, but unfortunately core-updates is
currently failing early in the bootstrap process [0]. Can you take a
look at `guix refresh -l dlib` and pick some important looking
applications to test with the updated OpenBLAS?

[0]
https://hydra.gnu.org/jobset/gnu/core-updates
http://lists.gnu.org/archive/html/guix-devel/2016-08/msg01390.html

> 2. i686 test failures. Updating OpenBLAS fixed 1/5 errors. The remaining
> four are reproducible on 32-bit Ubuntu, so they do not seem Guix
> related. Upstream has been notified.
> 
> 3. ARM failures. I don't have ARM hardware to test on, but I'm guessing
> it's similar to i686 (i.e. not directly Guix related).

Maybe dlib is 64-bit only? If that's the case, we can disable it on
those architectures.

> Adding "#:parallel-build? #f" had no effect on tests, indeed the check
> phase does not seem to use the previously built dlib; it builds it again
> without parallel-build. I will try reproducing the non-reproducibility
> on some higher end hardware, hopefully this week.

It's weird that it rebuilds the application again for the tests. Which
build is actually installed in the case of success? Is it worth an
upstream bug report?

> I've also found that FFTW is no longer used, apparently due to thread
> safety issues. So I'd appreciate if the following patch can be added.
> Apologies for not catching the missing reference earlier, I will be more
> careful in the future (fftw was added in the last minute..).

Can you hold this patch locally in your "dlib fixes" queue? It would
trigger a rebuild of dlib on Hydra, but I don't see the point when we
expect the build to fail near the end anyways.
Marius Bakke Aug. 24, 2016, 7:08 p.m. UTC | #2
Leo Famulari <leo@famulari.name> writes:

> On Wed, Aug 24, 2016 at 11:26:28AM +0100, Marius Bakke wrote:
>> There are a couple of things going on in this thread:
>> 
>> 1. Segfault on x86_64. This seems to have been resolved simply by
>> updating OpenBLAS. At least, I'm no longer able to reproduce it even
>> with LAPACK in inputs. So, that should fix the Hydra x86_64 build.
>> Can the OpenBLAS update be cherry-picked to master?
>
> I'd say it depends on whether the OpenBLAS users are building
> successfully on core-updates, but unfortunately core-updates is
> currently failing early in the bootstrap process [0]. Can you take a
> look at `guix refresh -l dlib` and pick some important looking
> applications to test with the updated OpenBLAS?

I'm currently building the following openblas dependents: `libreoffice
bamm python-biom-format clipper shogun armadillo julia` and will try to
test BLAS functionality in some of them.

>> 2. i686 test failures. Updating OpenBLAS fixed 1/5 errors. The remaining
>> four are reproducible on 32-bit Ubuntu, so they do not seem Guix
>> related. Upstream has been notified.
>> 
>> 3. ARM failures. I don't have ARM hardware to test on, but I'm guessing
>> it's similar to i686 (i.e. not directly Guix related).
>
> Maybe dlib is 64-bit only? If that's the case, we can disable it on
> those architectures.

According to the developer[0], these targets should be supported.

0: https://github.com/davisking/dlib/issues/197

We could disable tests (at least the failing ones) on these platforms
until this issue is resolved. The mips64el target on Hydra times out
after 3600 seconds on one of the tests, but seems fine up to that point.
Some of these tests are fairly CPU heavy, so the timeout may be too low.

>> Adding "#:parallel-build? #f" had no effect on tests, indeed the check
>> phase does not seem to use the previously built dlib; it builds it again
>> without parallel-build. I will try reproducing the non-reproducibility
>> on some higher end hardware, hopefully this week.
>
> It's weird that it rebuilds the application again for the tests. Which
> build is actually installed in the case of success? Is it worth an
> upstream bug report?

We rely on cmake-build-system for everything but the check phase, which
doesn't have a "proper" check target. I'm no cmake expert, but we can
probably prevent re-build by not including the main application in
test/CmakeLists.txt and instead copy/link in our already-built version:

https://github.com/davisking/dlib/blob/master/dlib/test/CMakeLists.txt#L15

>> I've also found that FFTW is no longer used, apparently due to thread
>> safety issues. So I'd appreciate if the following patch can be added.
>> Apologies for not catching the missing reference earlier, I will be more
>> careful in the future (fftw was added in the last minute..).
>
> Can you hold this patch locally in your "dlib fixes" queue? It would
> trigger a rebuild of dlib on Hydra, but I don't see the point when we
> expect the build to fail near the end anyways.

Yep, hanging on to it for now.
Ben Woodcroft Aug. 24, 2016, 10:51 p.m. UTC | #3
On 24 August 2016 3:08:58 PM GMT-04:00, Marius Bakke <mbakke@fastmail.com> wrote:

>
>>> Adding "#:parallel-build? #f" had no effect on tests, indeed the
>check
>>> phase does not seem to use the previously built dlib; it builds it
>again
>>> without parallel-build. I will try reproducing the
>non-reproducibility
>>> on some higher end hardware, hopefully this week.
>>
>> It's weird that it rebuilds the application again for the tests.
>Which
>> build is actually installed in the case of success? Is it worth an
>> upstream bug report?
>
>We rely on cmake-build-system for everything but the check phase, which
>doesn't have a "proper" check target. I'm no cmake expert, but we can
>probably prevent re-build by not including the main application in
>test/CmakeLists.txt and instead copy/link in our already-built version:
>
>https://github.com/davisking/dlib/blob/master/dlib/test/CMakeLists.txt#L15

The cmake build system has an #:out-of-tree? option, maybe set that to #f?

Ben
diff mbox

Patch

From 714f38b31996e014ed0cc56391e379e2241ee26e Mon Sep 17 00:00:00 2001
From: Marius Bakke <mbakke@fastmail.com>
Date: Tue, 23 Aug 2016 21:17:14 +0100
Subject: [PATCH] gnu: dlib: Remove unused fftw from inputs.

* gnu/packages/machine-learning.scm (dlib)[inputs]: Remove fftw.
  (define-module): Don't include algebra.scm.
---
 gnu/packages/machine-learning.scm | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/gnu/packages/machine-learning.scm b/gnu/packages/machine-learning.scm
index 4332045..7669702 100644
--- a/gnu/packages/machine-learning.scm
+++ b/gnu/packages/machine-learning.scm
@@ -28,7 +28,6 @@ 
   #:use-module (guix build-system gnu)
   #:use-module (guix build-system r)
   #:use-module (gnu packages)
-  #:use-module (gnu packages algebra)
   #:use-module (gnu packages autotools)
   #:use-module (gnu packages boost)
   #:use-module (gnu packages compression)
@@ -515,8 +514,7 @@  single hidden layer, and for multinomial log-linear models.")
     (native-inputs
      `(("pkg-config" ,pkg-config)))
     (inputs
-     `(("fftw" ,fftw)
-       ("giflib" ,giflib)
+     `(("giflib" ,giflib)
        ;("lapack" ,lapack) XXX lapack here causes test failures in some setups.
        ("libjpeg" ,libjpeg)
        ("libpng" ,libpng)
-- 
2.9.3