Patchwork gnu: Add freebayes

login
register
mail settings
Submitter Rob Syme
Date May 2, 2016, 9:25 a.m.
Message ID <CAEf4xgdco--bDzV0gw=FJEkEQ+dSPssmbosfA7XNHofRtJKm6w@mail.gmail.com>
Download mbox | patch
Permalink /patch/11964/
State New
Headers show

Comments

Rob Syme - May 2, 2016, 9:25 a.m.
A guix-friendly licensed variant caller.

From 78fb1be26ca1a0ac768ce5b98f7fd9f467870b84 Mon Sep 17 00:00:00 2001
From: Rob Syme <rob.syme@gmail.com>
Date: Mon, 2 May 2016 16:46:53 +0800
Subject: [PATCH] gnu: Add freebayes

* gnu/packages/bioinformatics.scm (freebayes): New variable.

---
 gnu/packages/bioinformatics.scm | 39 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

   (let ((util-commit "776ca85a18a47492af3794745efcb4a905113115"))
     (package
Ricardo Wurmus - May 2, 2016, 3:21 p.m.
Hi Rob,

> A guix-friendly licensed variant caller.
>
> From 78fb1be26ca1a0ac768ce5b98f7fd9f467870b84 Mon Sep 17 00:00:00 2001
> From: Rob Syme <rob.syme@gmail.com>
> Date: Mon, 2 May 2016 16:46:53 +0800
> Subject: [PATCH] gnu: Add freebayes
>
> * gnu/packages/bioinformatics.scm (freebayes): New variable.
>
> ---

thanks for the patch!  I see that freebayes has a couple of git
submodules, e.g. for bamtools, intervaltree, and vcflib.  I remember
Roel was working on this before, trying to untangle all the
dependencies.

See this discussion here:

    http://lists.gnu.org/archive/html/guix-devel/2016-03/msg00333.html

I don’t see any special treatment of these dependencies in your
package.  Is this not needed?  Or does the git checkout include all the
bundled dependencies?

I think we should use one of the release tarballs instead and make sure
to package the dependencies separately.  Maybe you can cooperate with
Roel, who has made a lot of progress on this end already.

What do you think?

~~ Ricardo
Rob Syme - May 3, 2016, 7:32 a.m.
Hi Ricardo

I'm sorry for not checking the list beforehand! Interestingly, we ended up
with very different solutions to the problem of including the freebayes
dependencies. Using the recursive git fetch compiles without issue for me
and *seems* to produce sensible results. Perhaps some non-guix packages are
bleeding in from my configuration? If so, any verification that it
works/breaks would be appreciated. If it *does* work, I'd argue that using
"(recursive? #t)" is a neater and more upgradable solution to the problem
of the freebayes git submodule problem, as we wouldn't need to update the
hashes and urls for bamtools-src, vcflib-src, tabixpp-src, etc.

On Mon, 2 May 2016 at 23:21 Ricardo Wurmus <ricardo.wurmus@mdc-berlin.de>
wrote:

>
> Hi Rob,
>
> > A guix-friendly licensed variant caller.
> >
> > From 78fb1be26ca1a0ac768ce5b98f7fd9f467870b84 Mon Sep 17 00:00:00 2001
> > From: Rob Syme <rob.syme@gmail.com>
> > Date: Mon, 2 May 2016 16:46:53 +0800
> > Subject: [PATCH] gnu: Add freebayes
> >
> > * gnu/packages/bioinformatics.scm (freebayes): New variable.
> >
> > ---
>
> thanks for the patch!  I see that freebayes has a couple of git
> submodules, e.g. for bamtools, intervaltree, and vcflib.  I remember
> Roel was working on this before, trying to untangle all the
> dependencies.
>
> See this discussion here:
>
>     http://lists.gnu.org/archive/html/guix-devel/2016-03/msg00333.html
>
> I don’t see any special treatment of these dependencies in your
> package.  Is this not needed?  Or does the git checkout include all the
> bundled dependencies?
>
> I think we should use one of the release tarballs instead and make sure
> to package the dependencies separately.  Maybe you can cooperate with
> Roel, who has made a lot of progress on this end already.
>
> What do you think?
>
> ~~ Ricardo
>
Roel Janssen - May 3, 2016, 7:45 a.m.
Hello Rob,

Actually, at the time I packaged freebayes, I intended to use the
recursive Git fetch, but there was a problem with it in Guix at that
moment.

So, the clutter in my package should do almost the same as the Git
recursive fetch :).

There are still some licensing problems with freebayes.  First, we need
to get vcflib in Guix, for which the following needs to be resolved:
- fastahack:  No free/open source license.
- smithwaterman: No free/open source license.
- tabixpp: No free/open source license.

For the other dependencies, I sent packages to the list.  Some made it
in upstream already (filevercmp), and other are still in review.  For
the three packages mentioned above we must first resolve the licensing
issues.

I sent Erik an e-mail a week ago asking to add licenses to these
projects, and he told me he will look into this soon.  Feel free to keep
reminding him to look into this :).

Kind regards,
Roel Janssen


Rob Syme writes:

> Hi Ricardo
>
> I'm sorry for not checking the list beforehand! Interestingly, we ended up
> with very different solutions to the problem of including the freebayes
> dependencies. Using the recursive git fetch compiles without issue for me
> and *seems* to produce sensible results. Perhaps some non-guix packages are
> bleeding in from my configuration? If so, any verification that it
> works/breaks would be appreciated. If it *does* work, I'd argue that using
> "(recursive? #t)" is a neater and more upgradable solution to the problem
> of the freebayes git submodule problem, as we wouldn't need to update the
> hashes and urls for bamtools-src, vcflib-src, tabixpp-src, etc.
>
> On Mon, 2 May 2016 at 23:21 Ricardo Wurmus <ricardo.wurmus@mdc-berlin.de>
> wrote:
>
>>
>> Hi Rob,
>>
>> > A guix-friendly licensed variant caller.
>> >
>> > From 78fb1be26ca1a0ac768ce5b98f7fd9f467870b84 Mon Sep 17 00:00:00 2001
>> > From: Rob Syme <rob.syme@gmail.com>
>> > Date: Mon, 2 May 2016 16:46:53 +0800
>> > Subject: [PATCH] gnu: Add freebayes
>> >
>> > * gnu/packages/bioinformatics.scm (freebayes): New variable.
>> >
>> > ---
>>
>> thanks for the patch!  I see that freebayes has a couple of git
>> submodules, e.g. for bamtools, intervaltree, and vcflib.  I remember
>> Roel was working on this before, trying to untangle all the
>> dependencies.
>>
>> See this discussion here:
>>
>>     http://lists.gnu.org/archive/html/guix-devel/2016-03/msg00333.html
>>
>> I don’t see any special treatment of these dependencies in your
>> package.  Is this not needed?  Or does the git checkout include all the
>> bundled dependencies?
>>
>> I think we should use one of the release tarballs instead and make sure
>> to package the dependencies separately.  Maybe you can cooperate with
>> Roel, who has made a lot of progress on this end already.
>>
>> What do you think?
>>
>> ~~ Ricardo
>>
Rob Syme - May 3, 2016, 7:52 a.m.
Ah. Good point Roel. Until the licencing is resolved, any discussion about
how to package freebayes is of no practical value. I'll ping Erik about the
licencing as well :)
-r

P.S. Erik's latest biorxiv preprint is worth a read (A Graph Extension of
the Positional Burrows-Wheeler Transform and its Applications):
http://biorxiv.org/content/early/2016/05/02/051409

On Tue, 3 May 2016 at 15:46 Roel Janssen <roel@gnu.org> wrote:

> Hello Rob,
>
> Actually, at the time I packaged freebayes, I intended to use the
> recursive Git fetch, but there was a problem with it in Guix at that
> moment.
>
> So, the clutter in my package should do almost the same as the Git
> recursive fetch :).
>
> There are still some licensing problems with freebayes.  First, we need
> to get vcflib in Guix, for which the following needs to be resolved:
> - fastahack:  No free/open source license.
> - smithwaterman: No free/open source license.
> - tabixpp: No free/open source license.
>
> For the other dependencies, I sent packages to the list.  Some made it
> in upstream already (filevercmp), and other are still in review.  For
> the three packages mentioned above we must first resolve the licensing
> issues.
>
> I sent Erik an e-mail a week ago asking to add licenses to these
> projects, and he told me he will look into this soon.  Feel free to keep
> reminding him to look into this :).
>
> Kind regards,
> Roel Janssen
>
>
> Rob Syme writes:
>
> > Hi Ricardo
> >
> > I'm sorry for not checking the list beforehand! Interestingly, we ended
> up
> > with very different solutions to the problem of including the freebayes
> > dependencies. Using the recursive git fetch compiles without issue for me
> > and *seems* to produce sensible results. Perhaps some non-guix packages
> are
> > bleeding in from my configuration? If so, any verification that it
> > works/breaks would be appreciated. If it *does* work, I'd argue that
> using
> > "(recursive? #t)" is a neater and more upgradable solution to the problem
> > of the freebayes git submodule problem, as we wouldn't need to update the
> > hashes and urls for bamtools-src, vcflib-src, tabixpp-src, etc.
> >
> > On Mon, 2 May 2016 at 23:21 Ricardo Wurmus <ricardo.wurmus@mdc-berlin.de
> >
> > wrote:
> >
> >>
> >> Hi Rob,
> >>
> >> > A guix-friendly licensed variant caller.
> >> >
> >> > From 78fb1be26ca1a0ac768ce5b98f7fd9f467870b84 Mon Sep 17 00:00:00 2001
> >> > From: Rob Syme <rob.syme@gmail.com>
> >> > Date: Mon, 2 May 2016 16:46:53 +0800
> >> > Subject: [PATCH] gnu: Add freebayes
> >> >
> >> > * gnu/packages/bioinformatics.scm (freebayes): New variable.
> >> >
> >> > ---
> >>
> >> thanks for the patch!  I see that freebayes has a couple of git
> >> submodules, e.g. for bamtools, intervaltree, and vcflib.  I remember
> >> Roel was working on this before, trying to untangle all the
> >> dependencies.
> >>
> >> See this discussion here:
> >>
> >>     http://lists.gnu.org/archive/html/guix-devel/2016-03/msg00333.html
> >>
> >> I don’t see any special treatment of these dependencies in your
> >> package.  Is this not needed?  Or does the git checkout include all the
> >> bundled dependencies?
> >>
> >> I think we should use one of the release tarballs instead and make sure
> >> to package the dependencies separately.  Maybe you can cooperate with
> >> Roel, who has made a lot of progress on this end already.
> >>
> >> What do you think?
> >>
> >> ~~ Ricardo
> >>
>
>
Pjotr Prins - May 3, 2016, 12:34 p.m.
On Tue, May 03, 2016 at 07:52:51AM +0000, Rob Syme wrote:
>    Ah. Good point Roel. Until the licencing is resolved, any discussion about
>    how to package freebayes is of no practical value. I'll ping Erik about
>    the licencing as well :)

To be taken as a serious alternative to GATK, freebayes should be
really free :). I also raised an issue on non-deterministic output:

  https://github.com/ekg/freebayes/issues/256

Pj.

Patch

diff --git a/gnu/packages/bioinformatics.scm b/gnu/packages/bioinformatics.scm
index 079fd46..db382d7 100644
--- a/gnu/packages/bioinformatics.scm
+++ b/gnu/packages/bioinformatics.scm
@@ -43,6 +43,7 @@ 
   #:use-module (gnu packages boost)
   #:use-module (gnu packages compression)
   #:use-module (gnu packages cpio)
+  #:use-module (gnu packages cmake)
   #:use-module (gnu packages curl)
   #:use-module (gnu packages doxygen)
   #:use-module (gnu packages datastructures)
@@ -1905,6 +1906,44 @@  genes in incomplete assemblies or complete genomes.")
     ;; GPL3+ according to private correspondense with the authors.
     (license license:gpl3+)))

+(define-public freebayes
+  (let ((commit "0cb269728b2db6307053cafe6f913a8b6fa1331e"))
+    (package
+      (name "freebayes")
+      (version "1.0.2")
+      (source (origin
+                (method git-fetch)
+                (uri (git-reference
+                      (url "https://github.com/ekg/freebayes.git")
+                      (commit commit)
+                      (recursive? #t)))
+                (sha256
+                 (base32
+                  "0z37ch3as3g8hx36l1lwy1v9cqahx72lb51yxrcmwymx0kcf39c5"))))
+      (build-system gnu-build-system)
+      (arguments '(#:phases
+                   (modify-phases %standard-phases
+                     (delete 'configure)
+                     (delete 'check) ; no "check" target
+                     (replace 'install
+                       (lambda* (#:key outputs #:allow-other-keys)
+                         (let* ((out (assoc-ref outputs "out"))
+                                (bin (string-append out "/bin")))
+                           (install-file "bin/freebayes" bin)
+                           (install-file "bin/bamleftalign" bin)
+                           #t))))))
+      (inputs
+       `(("cmake" ,cmake)
+         ("zlib" ,zlib)))
+      (home-page "https://github.com/ekg/freebayes")
+      (synopsis "Bayesian haplotype-based polymorphism discovery and
genotyping")
+      (description "FreeBayes is a Bayesian genetic variant detector
designed to
+find small polymorphisms, specifically SNPs (single-nucleotide polymorphisms),
+indels (insertions and deletions), MNPs (multi-nucleotide polymorphisms), and
+complex events (composite insertion and substitution events) smaller than the
+length of a short-read sequencing alignment.")
+      (license license:expat))))
+
 (define-public fxtract