Patchwork gnu: add xlsx2csv.

login
register
mail settings
Submitter Jan Nieuwenhuizen
Date Aug. 11, 2016, 12:11 p.m.
Message ID <871t1vzesb.fsf@gnu.org>
Download mbox | patch
Permalink /patch/14485/
State New
Headers show

Comments

Jan Nieuwenhuizen - Aug. 11, 2016, 12:11 p.m.
Ben Woodcroft writes:

Hi!

> Thanks for the package. I built it and used it to convert a file of
> mine, it seemed to work.

Good!

> However, I note that while the check phase passes, many of the tests
> are reported as being failed. Could you look into it?

That's trying to run them with non-installed python versions.  Patched
the test script to only use `python.'

>> +    (propagated-inputs
>> +     `(("expat" ,expat)))
> I removed this input and could still use the built package. Is it necessary?

Apparently not: Removed.

>> +    (synopsis "xlsx to csv converter")
>> +    (description
>> +     "Xls2csv converts xslx files to csv format.  Handles large
>> xlsx-files.")

> How about "Xlsx2csv is a program to convert Open Office XML (XLSX)
> format spreadsheets into plaintext @dfn{comma separated values} (CSV)
> files.  It was designed to be fast and to handle large input files."

Better, thanks.

> I think xml.scm might be a better home for this one.

Done.

Thanks!  Greetings,
Jan
Catonano - Aug. 11, 2016, 1:02 p.m.
2016-08-11 14:11 GMT+02:00 Jan Nieuwenhuizen <janneke@gnu.org>:

> Ben Woodcroft writes:
>
> Hi!
>
> > Thanks for the package. I built it and used it to convert a file of
> > mine, it seemed to work.
>
> Good!
>
> > However, I note that while the check phase passes, many of the tests
> > are reported as being failed. Could you look into it?
>
> That's trying to run them with non-installed python versions.  Patched
> the test script to only use `python.'
>
> >> +    (propagated-inputs
> >> +     `(("expat" ,expat)))
> > I removed this input and could still use the built package. Is it
> necessary?
>
> Apparently not: Removed.
>
> >> +    (synopsis "xlsx to csv converter")
> >> +    (description
> >> +     "Xls2csv converts xslx files to csv format.  Handles large
> >> xlsx-files.")
>
> > How about "Xlsx2csv is a program to convert Open Office XML (XLSX)
> > format spreadsheets into plaintext @dfn{comma separated values} (CSV)
> > files.  It was designed to be fast and to handle large input files."
>
> Better, thanks.
>

for what it matters, I would leave the original wording in place.

It's important that people know that this packages deals also with
Microsoft Excel files.

There are tons of such files around and people struggle with them

It happened to me recently. I had to convert a bunch of those files and the
tools I found were not adequate.

I did it by hand and then discovered xlsx2csv

If someone (especially from the data journalism or open data camps)
casually browses the Guix packages list, I think it's important that it's
clear that this package deals with xlsx files

The OpenOffce name could be misleading to not too tech oriented people.

I understand there's the problem of not steering people to non free
software. But this package is supposed to help people trying to bring data
outside of that realm.

They don't need not be steered to Excel, they are already trying to get
away, if they come to this package.

Of course, this is just my opinion
Ben Woodcroft - Aug. 12, 2016, 12:42 p.m.
Hi,


On 11/08/16 23:02, Catonano wrote:
>
>
> 2016-08-11 14:11 GMT+02:00 Jan Nieuwenhuizen <janneke@gnu.org 
> <mailto:janneke@gnu.org>>:
>
>     Ben Woodcroft writes:
>
[..]
>
>
>     > However, I note that while the check phase passes, many of the tests
>     > are reported as being failed. Could you look into it?
>
> That's trying to run them with non-installed python versions.  Patched
> the test script to only use `python.'

Ah, I see. Better.

[..]
>
>     >> +    (synopsis "xlsx to csv converter")
>     >> +    (description
>     >> +     "Xls2csv converts xslx files to csv format.  Handles large
>     >> xlsx-files.")
>
>     > How about "Xlsx2csv is a program to convert Open Office XML (XLSX)
>     > format spreadsheets into plaintext @dfn{comma separated values}
>     (CSV)
>     > files.  It was designed to be fast and to handle large input files."
>
>     Better, thanks.
>
>
> for what it matters, I would leave the original wording in place.
>
> It's important that people know that this packages deals also with 
> Microsoft Excel files.
>
> There are tons of such files around and people struggle with them
>
> It happened to me recently. I had to convert a bunch of those files 
> and the tools I found were not adequate.
>
> I did it by hand and then discovered xlsx2csv
>
> If someone (especially from the data journalism or open data camps) 
> casually browses the Guix packages list, I think it's important that 
> it's clear that this package deals with xlsx files
>
> The OpenOffce name could be misleading to not too tech oriented people.
>
> I understand there's the problem of not steering people to non free 
> software. But this package is supposed to help people trying to bring 
> data outside of that realm.
>
> They don't need not be steered to Excel, they are already trying to 
> get away, if they come to this package.

OK, I see your point, and we have other packages that mention Excel 
already. It is frustrating that the official name is misleading. Would 
either of you like to suggest an alternative wording?

Thanks,
ben
Catonano - Aug. 12, 2016, 1:22 p.m.
2016-08-12 14:42 GMT+02:00 Ben Woodcroft <b.woodcroft@uq.edu.au>:

> Hi,
>
> On 11/08/16 23:02, Catonano wrote:
>
>
>
> 2016-08-11 14:11 GMT+02:00 Jan Nieuwenhuizen <janneke@gnu.org>:
>
>> Ben Woodcroft writes:
>>
>> [..]
>
>
>> > However, I note that while the check phase passes, many of the tests
>> > are reported as being failed. Could you look into it?
>>
> That's trying to run them with non-installed python versions.  Patched
> the test script to only use `python.'
>
>
> Ah, I see. Better.
>
> [..]
>
> >> +    (synopsis "xlsx to csv converter")
>> >> +    (description
>> >> +     "Xls2csv converts xslx files to csv format.  Handles large
>> >> xlsx-files.")
>>
>> > How about "Xlsx2csv is a program to convert Open Office XML (XLSX)
>> > format spreadsheets into plaintext @dfn{comma separated values} (CSV)
>> > files.  It was designed to be fast and to handle large input files."
>>
>> Better, thanks.
>>
>
> for what it matters, I would leave the original wording in place.
>
> It's important that people know that this packages deals also with
> Microsoft Excel files.
>
> There are tons of such files around and people struggle with them
>
> It happened to me recently. I had to convert a bunch of those files and
> the tools I found were not adequate.
>
> I did it by hand and then discovered xlsx2csv
>
> If someone (especially from the data journalism or open data camps)
> casually browses the Guix packages list, I think it's important that it's
> clear that this package deals with xlsx files
>
> The OpenOffce name could be misleading to not too tech oriented people.
>
> I understand there's the problem of not steering people to non free
> software. But this package is supposed to help people trying to bring data
> outside of that realm.
>
> They don't need not be steered to Excel, they are already trying to get
> away, if they come to this package.
>
>
> OK, I see your point, and we have other packages that mention Excel
> already. It is frustrating that the official name is misleading.
>

Eh. People identify software projects/products with file formats.


> Would either of you like to suggest an alternative wording?
>
> Thanks,
> ben
>

I would say

> "Xlsx2csv is a program to convert xlsx
> format files into plaintext @dfn{comma separated values} (CSV)
> files.  It was designed to be fast and to handle large files."

In this way they could search on tne internet the meaning of xlsx and csv
and know exactly what ths package is about.

Even the original wording proposed by Jan isn't bad, in my opinion. But
admittedly I didn't check whether it conforms to synopses guidelines.

Patch

From 38922f282ea054eda572d575272bd91539ba53bc Mon Sep 17 00:00:00 2001
From: Jan Nieuwenhuizen <janneke@gnu.org>
Date: Wed, 13 Jul 2016 14:46:33 +0200
Subject: [PATCH] gnu: Add xlsx2csv.

* gnu/packages/xml.scm (xlsx2csv): New variable.
---
 gnu/packages/xml.scm | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/gnu/packages/xml.scm b/gnu/packages/xml.scm
index e97a0b0..74f5457 100644
--- a/gnu/packages/xml.scm
+++ b/gnu/packages/xml.scm
@@ -9,6 +9,7 @@ 
 ;;; Copyright © 2015 Raimon Grau <raimonster@gmail.com>
 ;;; Copyright © 2016 Mathieu Lirzin <mthl@gnu.org>
 ;;; Copyright © 2016 Leo Famulari <leo@famulari.name>
+;;; Copyright © 2016 Jan Nieuwenhuizen <janneke@gnu.org>
 ;;;
 ;;; This file is part of GNU Guix.
 ;;;
@@ -741,3 +742,35 @@  used to transform, query, validate, and edit XML documents.  XPath is used to
 match and extract data, and elements can be added, deleted or modified using
 XSLT and EXSLT.")
    (license license:x11)))
+
+(define-public xlsx2csv
+  (package
+    (name "xlsx2csv")
+    (version "0.7.2")
+    (source (origin
+             (method url-fetch)
+             (uri (string-append
+                   "https://github.com/dilshod/"
+                   name "/archive/release/" version ".tar.gz"))
+             (file-name (string-append name "-" version ".tar.gz"))
+             (sha256
+              (base32
+               "1gpn6kaa7l1ai8c9zx2j3acf04bvxq79pni8jjfjrk01smjbyyql"))))
+    (build-system python-build-system)
+    (arguments
+     `(#:python ,python-2
+       #:phases
+       (modify-phases %standard-phases
+         (replace 'check
+           (lambda _
+             (substitute* "test/run"
+               ;; Run tests with `python' only
+               (("^(PYTHON_VERSIONS = ).*" all m) (string-append m "['']")))
+             (zero? (system* "test/run")))))))
+    (home-page "https://github.com/dilshod/xlsx2csv")
+    (synopsis "xlsx to csv converter")
+    (description
+     "Xlsx2csv is a program to convert Open Office XML (XLSX) format
+spreadsheets into plaintext @dfn{comma separated values} (CSV) files.  It is
+designed to be fast and to handle large input files.")
+    (license license:gpl2+)))
-- 
2.9.2