Patchwork How to deal with UTF-8 filenames?

login
register
mail settings
Submitter Marius Bakke
Date Dec. 4, 2016, 7:56 p.m.
Message ID <8760mzbh9k.fsf@kirby.i-did-not-set--mail-host-address--so-tickle-me>
Download mbox | patch
Permalink /patch/18173/
State New
Headers show

Comments

Marius Bakke - Dec. 4, 2016, 7:56 p.m.
Hello Guix,

I'm trying to package the Norwegian Bokmål Aspell dictionary, which
contains a 'bokmål.alias' in the source archive. This causes 'readdir'
to throw a decoding-error. Here is the backtrace:

phase `unpack' succeeded after 0.1 seconds
starting phase `patch-usr-bin-file'
Backtrace:
In ice-9/boot-9.scm:
 160: 15 [catch #t #<catch-closure 8cac40> ...]
In unknown file:
   ?: 14 [apply-smob/1 #<catch-closure 8cac40>]
In ice-9/boot-9.scm:
  66: 13 [call-with-prompt prompt0 ...]
In ice-9/eval.scm:
 432: 12 [eval # #]
In ice-9/boot-9.scm:
2404: 11 [save-module-excursion #<procedure 8eb840 at ice-9/boot-9.scm:4051:3 ()>]
4056: 10 [#<procedure 8eb840 at ice-9/boot-9.scm:4051:3 ()>]
1727: 9 [%start-stack load-stack #<procedure 8fdcc0 at ice-9/boot-9.scm:4047:10 ()>]
1732: 8 [#<procedure 8fe990 ()>]
In unknown file:
   ?: 7 [primitive-load "/gnu/store/2f1ikq718l8q0lz7y771m503vdwjf2q9-aspell-dict-nb-0.50.1-0-guile-builder"]
In ice-9/eval.scm:
 387: 6 [eval # ()]
In srfi/srfi-1.scm:
 827: 5 [every1 #<procedure d661e0 at /gnu/store/ciqw5z470c8ihl1kfswj1j3ix6hs092d-module-import/guix/build/gnu-build-system.scm:623:9 (expr)> ...]
In /gnu/store/ciqw5z470c8ihl1kfswj1j3ix6hs092d-module-import/guix/build/gnu-build-system.scm:
 627: 4 [#<procedure d661e0 at /gnu/store/ciqw5z470c8ihl1kfswj1j3ix6hs092d-module-import/guix/build/gnu-build-system.scm:623:9 (expr)> #]
 166: 3 [patch-usr-bin-file #:native-inputs #f ...]
In /gnu/store/ciqw5z470c8ihl1kfswj1j3ix6hs092d-module-import/guix/build/utils.scm:
 336: 2 [find-files "." "^configure$" ...]
In ice-9/ftw.scm:
 481: 1 [loop "." "" ...]
In unknown file:
   ?: 0 [readdir #<directory stream e34a60>]

ERROR: In procedure readdir:
ERROR: Throw to key `decoding-error' with args `("scm_from_stringn" "input locale conversion error" 84 #vu8(98 111 107 109 229 108 46 97 108 105 97 115))'.
builder for `/gnu/store/fnhnrmshycf6qgfv6b9xsil3ppvracad-aspell-dict-nb-0.50.1-0.drv' failed with exit code 1
cannot build derivation `/gnu/store/419306s2mf3300906ikk39vjy0bqqs64-profile.drv': 1 dependencies couldn't be built

This also happens in the 'patch-source-shebangs' phase which is
necessary for the 'configure' script to work.

Any suggestions for how to deal with this?

Patch attached.
Marius Bakke - Dec. 5, 2016, 8:01 a.m.
Marius Bakke <mbakke@fastmail.com> writes:

> Hello Guix,
>
> I'm trying to package the Norwegian Bokmål Aspell dictionary, which
> contains a 'bokmål.alias' in the source archive. This causes 'readdir'
> to throw a decoding-error.

I'm sorry, the file name is actually ISO-8859-1 encoded. I've updated
the subject. From what I understand Guile should be able to handle
unicode file names just fine.

The easiest fix is probably to make the file name unicode upstream, but
it would be nice to know a workaround. I've tried overriding the
'install-locale' phase to set various iso-8859-1 locales instead of the
default "en_US.utf-8", but getting "invalid argument" on everything.

Are there any Latin1 locales available in the build environment, and
would setting it make any difference?
Ludovic Courtès - Dec. 5, 2016, 9:07 p.m.
Marius Bakke <mbakke@fastmail.com> skribis:

> Marius Bakke <mbakke@fastmail.com> writes:
>
>> Hello Guix,
>>
>> I'm trying to package the Norwegian Bokmål Aspell dictionary, which
>> contains a 'bokmål.alias' in the source archive. This causes 'readdir'
>> to throw a decoding-error.
>
> I'm sorry, the file name is actually ISO-8859-1 encoded. I've updated
> the subject. From what I understand Guile should be able to handle
> unicode file names just fine.

In Guile, file names are strings and not bytevectors like in C.  So they
have to be properly encoded.

The convention that Guile follows is to decode file names according to
its current locale encoding.  By default, all our builds run in the
en_US.utf8 locale.

However, you can change that: ‘gnu-build-system’ supports a #:locale
parameter, so you can do:

  #:locale "nb_NO.iso88591"

> Are there any Latin1 locales available in the build environment, and
> would setting it make any difference?

No!  So you’ll have to add a package containing that locale as an input
to the build process though.

To that end, I suggest that you abstract ‘glibc-utf8-locales’ in
base.scm so you can then write:

  (define glibc-latin1-locales
    (make-glibc-locale-package "ISO-8859-1" '("nb_NO" "fr_FR")))

That’s more than you wanted to know I guess.  Don’t expect to
spell-check your Norwegian text for the time being.  ;-)

HTH!

Ludo’.

Patch

From 6e6dd7144eecf9c04ba47c4a49207f17181259be Mon Sep 17 00:00:00 2001
From: Marius Bakke <mbakke@fastmail.com>
Date: Sun, 4 Dec 2016 19:56:20 +0100
Subject: [PATCH] =?UTF-8?q?gnu:=20Add=20Norwegian=20Bokm=C3=A5l=20Aspell?=
 =?UTF-8?q?=20dictionary.?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* gnu/packages/aspell.scm (aspell-dict-nb): New variable.
---
 gnu/packages/aspell.scm | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/gnu/packages/aspell.scm b/gnu/packages/aspell.scm
index d27e8090b..396e89b2c 100644
--- a/gnu/packages/aspell.scm
+++ b/gnu/packages/aspell.scm
@@ -4,6 +4,7 @@ 
 ;;; Copyright © 2016 John Darrington <jmd@gnu.org>
 ;;; Copyright © 2016 Efraim Flashner <efraim@flashner.co.il>
 ;;; Copyright © 2016 Christopher Andersson <christopher@8bits.nu>
+;;; Copyright © 2016 Marius Bakke <mbakke@fastmail.com>
 ;;;
 ;;; This file is part of GNU Guix.
 ;;;
@@ -150,6 +151,14 @@  dictionaries, including personal ones.")
                      (base32
                       "1gdf7bc1a0kmxsmphdqq8pl01h667mjsj6hihy6kqy14k5qdq69v")))
 
+(define-public aspell-dict-nb
+  (aspell-dictionary "nb" "Norwegian Bokmål"
+                     #:version "0.50.1-0"
+                     #:prefix "aspell-"
+                     #:sha256
+                     (base32
+                      "12i2bmgdnlkzfinb20j2a0j4a20q91a9j8qpq5vgabbvc65nwx77")))
+
 (define-public aspell-dict-nl
   (aspell-dictionary "nl" "Dutch"
                      #:version "0.50-2"
-- 
2.11.0