gnu: Add eigensoft.

Message ID 20160712132919.20938-1-ricardo.wurmus@mdc-berlin.de
State New
Headers

Commit Message

Ricardo Wurmus July 12, 2016, 1:29 p.m. UTC
  * gnu/packages/bioinformatics.scm (eigensoft): New variable.
---
 gnu/packages/bioinformatics.scm | 72 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 72 insertions(+)
  

Comments

Ricardo Wurmus July 13, 2016, 6:38 a.m. UTC | #1
Ricardo Wurmus <ricardo.wurmus@mdc-berlin.de> writes:

> * gnu/packages/bioinformatics.scm (eigensoft): New variable.
[…]
> +      (description "This package provides tools for population genetics and
> +stratification correction.  The EIGENSTRAT method uses principal components
> +analysis to explicitly model ancestry differences between cases and controls
> +along continuous axes of variation; the resulting correction is specific to a
> +candidate marker’s variation in frequency across ancestral populations,
> +minimizing spurious associations while maximizing power to detect true
> +associations.  The EIGENSOFT package has a built-in plotting script and
> +supports multiple file formats and quantitative phenotypes.")

Here’s an alternative description, which is a little more general and
doesn’t emphasise EIGENSTRAT so much:

  The EIGENSOFT package provides tools for population genetics and
  stratification correction.  EIGENSOFT implements methods commonly used
  in population genetics analyses such as PCA, computation of
  Tracy-Widom statistics, and finding relateds in structured
  populations.  It comes with a built-in plotting script and supports
  multiple file formats and quantitative phenotypes.

WDYT?

~~ Ricardo
  
Ludovic Courtès July 13, 2016, 1:09 p.m. UTC | #2
Ricardo Wurmus <ricardo.wurmus@mdc-berlin.de> skribis:

> * gnu/packages/bioinformatics.scm (eigensoft): New variable.

LGTM.

> Here’s an alternative description, which is a little more general and
> doesn’t emphasise EIGENSTRAT so much:
>
>   The EIGENSOFT package provides tools for population genetics and
>   stratification correction.  EIGENSOFT implements methods commonly used
>   in population genetics analyses such as PCA, computation of
>   Tracy-Widom statistics, and finding relateds in structured
                                               ^
Typo.

>   populations.  It comes with a built-in plotting script and supports
>   multiple file formats and quantitative phenotypes.

This one is probably better, yes.

Thanks,
Ludo’.
  
Ricardo Wurmus July 14, 2016, 5:40 a.m. UTC | #3
Ludovic Courtès <ludo@gnu.org> writes:

> Ricardo Wurmus <ricardo.wurmus@mdc-berlin.de> skribis:
>
>> * gnu/packages/bioinformatics.scm (eigensoft): New variable.
>
> LGTM.
>
>> Here’s an alternative description, which is a little more general and
>> doesn’t emphasise EIGENSTRAT so much:
>>
>>   The EIGENSOFT package provides tools for population genetics and
>>   stratification correction.  EIGENSOFT implements methods commonly used
>>   in population genetics analyses such as PCA, computation of
>>   Tracy-Widom statistics, and finding relateds in structured
>                                                ^
> Typo.

It’s rather a jargon “neologism”; “related” is supposed to be a noun and
“relateds” is the plural.  I’ll replace it with “related individuals”.

>>   populations.  It comes with a built-in plotting script and supports
>>   multiple file formats and quantitative phenotypes.
>
> This one is probably better, yes.

Okay.  Thanks for checking!

~~ Ricardo
  

Patch

diff --git a/gnu/packages/bioinformatics.scm b/gnu/packages/bioinformatics.scm
index 7cffe26..7c95ec8 100644
--- a/gnu/packages/bioinformatics.scm
+++ b/gnu/packages/bioinformatics.scm
@@ -1691,6 +1691,78 @@  data and settings.")
     (license (license:non-copyleft "file://src/COPYING"
                                    "See src/COPYING in the distribution."))))
 
+(define-public eigensoft
+  (let ((revision "1")
+        (commit "b14d1e202e21e532536ff8004f0419cd5e259dc7"))
+    (package
+      (name "eigensoft")
+      (version (string-append "6.1.2-"
+                              revision "."
+                              (string-take commit 9)))
+      (source
+       (origin
+         (method git-fetch)
+         (uri (git-reference
+               (url "https://github.com/DReichLab/EIG.git")
+               (commit commit)))
+         (file-name (string-append "eigensoft-" commit "-checkout"))
+         (sha256
+          (base32
+           "0f5m6k2j5c16xc3xbywcs989xyc26ncy1zfzp9j9n55n9r4xcaiq"))
+         (modules '((guix build utils)))
+         ;; Remove pre-built binaries.
+         (snippet '(begin
+                     (delete-file-recursively "bin")
+                     (mkdir "bin")
+                     #t))))
+      (build-system gnu-build-system)
+      (arguments
+       `(#:tests? #f                    ; There are no tests.
+         #:make-flags '("CC=gcc")
+         #:phases
+         (modify-phases %standard-phases
+           ;; There is no configure phase, but the Makefile is in a
+           ;; sub-directory.
+           (replace 'configure
+             (lambda _
+               (chdir "src")
+               ;; The link flags are incomplete.
+               (substitute* "Makefile"
+                 (("-lgsl") "-lgsl -lm -llapack -llapacke -lpthread"))
+               #t))
+           ;; The provided install target only copies executables to
+           ;; the "bin" directory in the build root.
+           (add-after 'install 'actually-install
+             (lambda* (#:key outputs #:allow-other-keys)
+               (let* ((out (assoc-ref outputs "out"))
+                      (bin  (string-append out "/bin")))
+                 (mkdir-p bin)
+                 (for-each (lambda (file)
+                             (install-file file bin))
+                           (find-files "../bin" ".*"))
+                 #t))))))
+      (inputs
+       `(("gsl" ,gsl)
+         ("lapack" ,lapack)
+         ("lapack" ,lapack "lapacke")
+         ("openblas" ,openblas)
+         ("perl" ,perl)
+         ("gfortran" ,gfortran "lib")))
+      (home-page "https://github.com/DReichLab/EIG")
+      (synopsis "Tools for population genetics")
+      (description "This package provides tools for population genetics and
+stratification correction.  The EIGENSTRAT method uses principal components
+analysis to explicitly model ancestry differences between cases and controls
+along continuous axes of variation; the resulting correction is specific to a
+candidate marker’s variation in frequency across ancestral populations,
+minimizing spurious associations while maximizing power to detect true
+associations.  The EIGENSOFT package has a built-in plotting script and
+supports multiple file formats and quantitative phenotypes.")
+      ;; The license of the eigensoft tools is Expat, but since it's
+      ;; linking with the GNU Scientific Library (GSL) the effective
+      ;; license is the GPL.
+      (license license:gpl3+))))
+
 (define-public edirect
   (package
     (name "edirect")