Patchwork gnu: Add mash.

login
register
mail settings
Submitter Marius Bakke
Date Aug. 30, 2016, 5:54 p.m.
Message ID <87h9a2yweu.fsf@ike.i-did-not-set--mail-host-address--so-tickle-me>
Download mbox | patch
Permalink /patch/15083/
State New
Headers show

Comments

Leo Famulari - Aug. 31, 2016, 7:44 p.m.
On Tue, Aug 30, 2016 at 06:54:49PM +0100, Marius Bakke wrote:
> * gnu/packages/bioinformatics.scm (mash): New variable.

Thanks!

> +         (add-after 'unpack 'fix-includes
> +           (lambda _
> +             (substitute* '("src/mash/Sketch.cpp" "src/mash/CommandFind.cpp")
> +               (("^#include \"kseq\\.h\"")
> +                "#include \"htslib/kseq.h\""))
> +             #t))
> +         (add-before 'configure 'autoconf
> +           (lambda _ (zero? (system* "autoconf")))))))
> +    (native-inputs
> +     `(("autoconf" ,autoconf)
> +       ("capnproto" ,capnproto)
> +       ("htslib" ,htslib)))

Does it only need to use capnproto and htslib while building? Okay if
so.
Ricardo Wurmus - Aug. 31, 2016, 8:16 p.m.
Leo Famulari <leo@famulari.name> writes:

> On Tue, Aug 30, 2016 at 06:54:49PM +0100, Marius Bakke wrote:
>> * gnu/packages/bioinformatics.scm (mash): New variable.
>
> Thanks!
>
>> +         (add-after 'unpack 'fix-includes
>> +           (lambda _
>> +             (substitute* '("src/mash/Sketch.cpp" "src/mash/CommandFind.cpp")
>> +               (("^#include \"kseq\\.h\"")
>> +                "#include \"htslib/kseq.h\""))
>> +             #t))
>> +         (add-before 'configure 'autoconf
>> +           (lambda _ (zero? (system* "autoconf")))))))
>> +    (native-inputs
>> +     `(("autoconf" ,autoconf)
>> +       ("capnproto" ,capnproto)
>> +       ("htslib" ,htslib)))
>
> Does it only need to use capnproto and htslib while building? Okay if
> so.

Looking at the substitution in “fix-includes” htslib probably should be
a regular input.

~~ Ricardo
Marius Bakke - Sept. 1, 2016, 10 a.m.
Leo Famulari <leo@famulari.name> writes:

>> +         (add-after 'unpack 'fix-includes
>> +           (lambda _
>> +             (substitute* '("src/mash/Sketch.cpp" "src/mash/CommandFind.cpp")
>> +               (("^#include \"kseq\\.h\"")
>> +                "#include \"htslib/kseq.h\""))
>> +             #t))
>> +         (add-before 'configure 'autoconf
>> +           (lambda _ (zero? (system* "autoconf")))))))
>> +    (native-inputs
>> +     `(("autoconf" ,autoconf)
>> +       ("capnproto" ,capnproto)
>> +       ("htslib" ,htslib)))
>
> Does it only need to use capnproto and htslib while building? Okay if
> so.

I had these in inputs initially and was surprised to see no references.
Both seems to be compiled into the final program[0]: when running "mash
info" on an invalid file (the provided data/refseq.msh), a generic
capnproto exception is thrown (src/capnp/serialize.c++:159).

That raises another question: should the htslib and capnproto licenses
be listed too, since they are part of the binary output?

I'm not a bioinformatician (just a mere sysadmin for such), but have
been going through the tutorial and things appear to work fine.

0: https://github.com/marbl/Mash/blob/master/Makefile.in#L38
Leo Famulari - Sept. 6, 2016, 9:01 p.m.
On Thu, Sep 01, 2016 at 11:00:39AM +0100, Marius Bakke wrote:
> I had these in inputs initially and was surprised to see no references.
> Both seems to be compiled into the final program[0]: when running "mash
> info" on an invalid file (the provided data/refseq.msh), a generic
> capnproto exception is thrown (src/capnp/serialize.c++:159).

I wonder, does using native-inputs work when building mash for another
architecture?

> That raises another question: should the htslib and capnproto licenses
> be listed too, since they are part of the binary output?

Good question, I'm not sure. I'd guess "yes", along with a code comment
explaining what's going on.

> 
> I'm not a bioinformatician (just a mere sysadmin for such), but have
> been going through the tutorial and things appear to work fine.

Ah, bioinformatics software... all bets are off ;)

Patch

From 20974083333c8e94d10423d4a156caa5298d6dcb Mon Sep 17 00:00:00 2001
From: Marius Bakke <mbakke@fastmail.com>
Date: Tue, 30 Aug 2016 18:49:21 +0100
Subject: [PATCH 1/1] gnu: Add mash.

* gnu/packages/bioinformatics.scm (mash): New variable.
---
 gnu/packages/bioinformatics.scm | 53 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 53 insertions(+)

diff --git a/gnu/packages/bioinformatics.scm b/gnu/packages/bioinformatics.scm
index ed20b56..9b96d37 100644
--- a/gnu/packages/bioinformatics.scm
+++ b/gnu/packages/bioinformatics.scm
@@ -76,6 +76,7 @@ 
   #:use-module (gnu packages python)
   #:use-module (gnu packages readline)
   #:use-module (gnu packages ruby)
+  #:use-module (gnu packages serialization)
   #:use-module (gnu packages statistics)
   #:use-module (gnu packages tbb)
   #:use-module (gnu packages tex)
@@ -3046,6 +3047,58 @@  sequences).")
               "http://mafft.cbrc.jp/alignment/software/license.txt"
               "BSD-3 with different formatting"))))
 
+(define-public mash
+  (package
+    (name "mash")
+    (version "1.1.1")
+    (source (origin
+              (method url-fetch)
+              (uri (string-append
+                    "https://github.com/marbl/mash/archive/v"
+                    version ".tar.gz"))
+              (file-name (string-append name "-" version ".tar.gz"))
+              (sha256
+               (base32
+                "08znbvqq5xknfhmpp3wcj574zvi4p7i8zifi67c9qw9a6ikp42fj"))
+              (modules '((guix build utils)))
+              (snippet
+               ;; Delete bundled kseq.
+               ;; TODO: Also delete bundled murmurhash and open bloom filter.
+               '(delete-file "src/mash/kseq.h"))))
+    (build-system gnu-build-system)
+    (arguments
+     `(#:tests? #f ; No tests.
+       #:configure-flags
+       (list
+        (string-append "--with-capnp=" (assoc-ref %build-inputs "capnproto"))
+        (string-append "--with-gsl=" (assoc-ref %build-inputs "gsl")))
+       #:make-flags (list "CC=gcc")
+       #:phases
+       (modify-phases %standard-phases
+         (add-after 'unpack 'fix-includes
+           (lambda _
+             (substitute* '("src/mash/Sketch.cpp" "src/mash/CommandFind.cpp")
+               (("^#include \"kseq\\.h\"")
+                "#include \"htslib/kseq.h\""))
+             #t))
+         (add-before 'configure 'autoconf
+           (lambda _ (zero? (system* "autoconf")))))))
+    (native-inputs
+     `(("autoconf" ,autoconf)
+       ("capnproto" ,capnproto)
+       ("htslib" ,htslib)))
+    (inputs
+     `(("gsl" ,gsl)
+       ("zlib" ,zlib)))
+    (home-page "https://mash.readthedocs.io")
+    (synopsis "Fast genome and metagenome distance estimation using MinHash")
+    (description "Mash is a fast sequence distance estimator that uses the
+MinHash algorithm and is designed to work with genomes and metagenomes in the
+form of assemblies or reads.")
+    ;; Mash is distributed under 3-clause BSD, but includes software covered
+    ;; by other licenses.
+    (license (list license:bsd-3 license:public-domain license:cpl1.0))))
+
 (define-public metabat
   (package
     (name "metabat")
-- 
2.9.3