Patchwork gnu: Add seqtk.

login
register
mail settings
Submitter Ben Woodcroft
Date Sept. 9, 2016, 11:08 a.m.
Message ID <20160909110834.8581-1-donttrustben@gmail.com>
Download mbox | patch
Permalink /patch/15418/
State New
Headers show

Comments

Ben Woodcroft - Sept. 9, 2016, 11:08 a.m.
From: Ben J Woodcroft <donttrustben@gmail.com>

Well, despite the lightness of my touch, it seems the licensing is in now in
order.  I've updated the package, here's an updated patch.  Better?

Thanks,
ben

* gnu/packages/bioinformatics.scm (seqtk): New variable.
---
 gnu/packages/bioinformatics.scm | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)
Marius Bakke - Sept. 9, 2016, 12:37 p.m.
Ben Woodcroft <donttrustben@gmail.com> writes:

> Well, despite the lightness of my touch, it seems the licensing is in now in
> order.  I've updated the package, here's an updated patch.  Better?

I don't think this was intended to be a commit message? :)

The program seems to bundle {khash,kseq}.h from htslib. Could you try
replacing them with the files directly from htslib? There are quite a
few examples of doing this already in bioinformatics.scm.

I also think the original description from github is better:
"Toolkit for processing sequences in FASTA/Q formats".

Other than that LGTM.

Thanks!
Marius
Marius Bakke - Sept. 9, 2016, 12:54 p.m.
Marius Bakke <mbakke@fastmail.com> writes:

> The program seems to bundle {khash,kseq}.h from htslib. Could you try
> replacing them with the files directly from htslib? There are quite a
> few examples of doing this already in bioinformatics.scm.

The released version bundles a few unnecessary header files as well,
that are removed in git. I think you can remove all ".h" files in an
origin snippet and substitute references to khash.h and kseq.h before
building.

Cheers,
Marius
Ben Woodcroft - Sept. 10, 2016, 4:03 a.m.
On 09/09/16 22:37, Marius Bakke wrote:
> Ben Woodcroft <donttrustben@gmail.com> writes:
>
>> Well, despite the lightness of my touch, it seems the licensing is in now in
>> order.  I've updated the package, here's an updated patch.  Better?
> I don't think this was intended to be a commit message? :)

No indeed, I was responding to a thread so old I suspect it was before 
your time.

> The program seems to bundle {khash,kseq}.h from htslib. Could you try
> replacing them with the files directly from htslib? There are quite a
> few examples of doing this already in bioinformatics.scm.

I see your point, though I'm not sure that htslib is really the home of 
those files, and anyway our htslib doesn't provide them as an output 
since they are not a shared library (I believe).

I've always been a bit fuzzy on what the official policy is, to what 
extent we should remove bundled code, so I'm happy to be corrected. In 
this case since there is clear precedent I don't think we should bother 
removing the bundled files.

> I also think the original description from github is better:
> "Toolkit for processing sequences in FASTA/Q formats".
How about "Toolkit for processing biological sequences in FASTA/Q 
format"? I wanted to make it understandable in a more general context.

I'll push in the next day or two unless there are further comments.
Thanks for the review.
ben

Patch

diff --git a/gnu/packages/bioinformatics.scm b/gnu/packages/bioinformatics.scm
index f34acd1..4e296f5 100644
--- a/gnu/packages/bioinformatics.scm
+++ b/gnu/packages/bioinformatics.scm
@@ -4529,6 +4529,41 @@  BioPython in a convenient way.  Instead of having a big mess of scripts, there
 is one that takes arguments.")
     (license license:gpl3)))
 
+(define-public seqtk
+  (package
+    (name "seqtk")
+    (version "1.2")
+    (source (origin
+              (method url-fetch)
+              (uri (string-append
+                    "https://github.com/lh3/seqtk/archive/v"
+                    version ".tar.gz"))
+              (file-name (string-append name "-" version ".tar.gz"))
+              (sha256
+               (base32
+                "0ywdyzpmfiz2wp6ampbzqg4y8bj450nfgqarpamg045b8mk32lxx"))))
+    (build-system gnu-build-system)
+    (arguments
+     `(#:phases
+       (modify-phases %standard-phases
+         (delete 'configure)
+         (replace 'check
+           ;; There are no tests, so we just run a sanity check.
+           (lambda _ (zero? (system* "./seqtk" "seq"))))
+         (replace 'install
+           (lambda* (#:key outputs #:allow-other-keys)
+             (let ((bin (string-append (assoc-ref outputs "out") "/bin/")))
+               (install-file "seqtk" bin)))))))
+    (inputs
+     `(("zlib" ,zlib)))
+    (home-page "https://github.com/lh3/seqtk")
+    (synopsis "Toolkit for biological sequences in FASTA/Q formats")
+    (description
+     "Seqtk is a fast and lightweight tool for processing sequences in the
+FASTA or FASTQ format.  It parses both FASTA and FASTQ files which can be
+optionally compressed by gzip.")
+      (license license:expat)))
+
 (define-public snap-aligner
   (package
     (name "snap-aligner")