Adding wc to Bournish

Message ID 20160524184720.GA27449@debian-netbook
State New
Headers

Commit Message

Efraim Flashner May 24, 2016, 6:47 p.m. UTC
  GSoC just started and so far I have 13 tabs from the guile manual open
on my computer. I almost implemented `du' at the same time as `wc' but
at the time I was getting tripped up with functions returning functions
or returning values.

`wc-l' and `wc-w' are two separate functions ATM. It should be possible
to combine the two and increment the line-count if the delimiter is \n
(or EOF) but I didn't want to spend too much time implementing and
re-implementing the commands.

`wc-w' closes the file after it finishes but `wc-l' doesn't. Good
hygiene? Let the garbage collector take care of it? I also made liberal
use of `make-string', this seems like a memory leak nightmare to me.

I wrapped `file-size' and `wc-command' in checks that the file actually
exists and can be read by the user. One of the things drilled into me
from turning in assignments in school was to wrap the code in checks so
it'd pass the TAs checks.

`wc-command', it doesn't accept arguments and I'm not happy with the
method of printing out the results but it works.

bournish@(guile-user)> wc gpl-3.0.txt
674 5644 35147 gpl-3.0.txt

What should I add to my .guile to auto-load guix.scm from the repo?
(load /path/to/guix/guix.scm) didn't work for me.
  

Comments

Ricardo Wurmus May 25, 2016, 9:03 a.m. UTC | #1
Efraim Flashner <efraim@flashner.co.il> writes:

> +(define (wc-command file)
> +  (if (and (file-exists? file) (access? file 4))
> +     (let* ((wc-l ((@@ (guix build bournish) wc-l-command) file))
> +            (wc-w ((@@ (guix build bournish) wc-w-command) file))
> +            (wc-c ((@@ (guix build bournish) wc-c-command) file)))
> +       (begin 
> +         (display wc-l)(display #\space)
> +         (display wc-w)(display #\space)
> +         (display wc-c)(display #\space)
> +         (display file)
> +         (newline)))))

Have you considered using “format” instead of “display”?  The last
“begin” could just be this:

  (format #t "~a ~a ~a ~a\n"
    wc-l wc-w wc-c file)

or

  (format #t "~{~a ~}\n"
    (list wc-l wc-w wc-c file))

Why is it necessary to do “(@@ (guix build bournish) ...)”?  Can it not
be imported once?

~~ Ricardo
  
Ricardo Wurmus May 25, 2016, 9:26 a.m. UTC | #2
Efraim Flashner <efraim@flashner.co.il> writes:

> +(define (wc-l-command file)
> +  (let* ((input-file (open-file file "r"))
> +         (line       (read-line input-file))
> +         (line-count 0))
> +    (while (not (eof-object? line))
> +           (set! line-count (1+ line-count))
> +           (set! line (read-line input-file)))
> +    line-count))

It’s unusual for me to see the use of “while” and “set!” in Scheme code.
You could do this in a functional manner using a fold (see SRFI-1) or
with file streams (see SRFI-41), which also provides a stream-fold.

The idea with a fold is that you have a function that takes a value
(e.g. from a list or a stream) and an intermediate result.  The function
does something to the value and then returns a new intermediate result.

Here’s a fold over a list of symbols implementing a count:

   (fold
      (lambda (_ res) (+ res 1))  ; increase the result
      0                           ; start at 0
      '(hello world bye))         ; items to count

If you had a file stream, where each element represents one line, you
can fold over all lines in much the same way to get a count.  You could
use the same framework with a different stream element generator
(reading one word or byte at a time instead of one line at a time) to
implement the other features of “wc”.

There’s an example of how to define a file stream in the Guile manual in
the documentation for SRFI-41.

~~ Ricardo
  
Efraim Flashner May 25, 2016, 9:43 a.m. UTC | #3
On Wed, May 25, 2016 at 11:03:34AM +0200, Ricardo Wurmus wrote:
> 
> Efraim Flashner <efraim@flashner.co.il> writes:
> 
> > +(define (wc-command file)
> > +  (if (and (file-exists? file) (access? file 4))
> > +     (let* ((wc-l ((@@ (guix build bournish) wc-l-command) file))
> > +            (wc-w ((@@ (guix build bournish) wc-w-command) file))
> > +            (wc-c ((@@ (guix build bournish) wc-c-command) file)))
> > +       (begin 
> > +         (display wc-l)(display #\space)
> > +         (display wc-w)(display #\space)
> > +         (display wc-c)(display #\space)
> > +         (display file)
> > +         (newline)))))
> 
> Have you considered using “format” instead of “display”?  The last
> “begin” could just be this:
> 
>   (format #t "~a ~a ~a ~a\n"
>     wc-l wc-w wc-c file)
> 
> or
> 
>   (format #t "~{~a ~}\n"
>     (list wc-l wc-w wc-c file))

I looked at the manual more and I found for-each which does what I was
looking for before which gave me this:

(for-each display (list wc-l #\space wc-w #\space wc-c #\space file))

but I do like the (format #t "~{~a ~}\n" (list ...)) better. I'm working
on trying to get it to read flags and the variable size of the list
works better with that.

format vs display, I knew I wanted display over write but I hadn't come
across format yet. If I don't need to specify #\space to get a space
then that sounds great.

> 
> Why is it necessary to do “(@@ (guix build bournish) ...)”?  Can it not
> be imported once?

When you say it out loud then it seems obvious it shouldn't need to be
imported more than once. I've also read more about let vs let* and I've
changed some to let.

> 
> ~~ Ricardo

Thanks!
  
Efraim Flashner May 25, 2016, 10:05 a.m. UTC | #4
On Wed, May 25, 2016 at 11:26:11AM +0200, Ricardo Wurmus wrote:
> 
> Efraim Flashner <efraim@flashner.co.il> writes:
> 
> > +(define (wc-l-command file)
> > +  (let* ((input-file (open-file file "r"))
> > +         (line       (read-line input-file))
> > +         (line-count 0))
> > +    (while (not (eof-object? line))
> > +           (set! line-count (1+ line-count))
> > +           (set! line (read-line input-file)))
> > +    line-count))
> 
> It’s unusual for me to see the use of “while” and “set!” in Scheme code.
> You could do this in a functional manner using a fold (see SRFI-1) or
> with file streams (see SRFI-41), which also provides a stream-fold.

I'm going to put set! in the GOTO box. GOTO worked well enough when I
learned QBasic, but just because its implemented doesn't mean I should
still use it. (ie: its there but try not to use it)

> 
> The idea with a fold is that you have a function that takes a value
> (e.g. from a list or a stream) and an intermediate result.  The function
> does something to the value and then returns a new intermediate result.

I already like this much better

> 
> Here’s a fold over a list of symbols implementing a count:
> 
>    (fold
>       (lambda (_ res) (+ res 1))  ; increase the result
>       0                           ; start at 0
>       '(hello world bye))         ; items to count
> 
> If you had a file stream, where each element represents one line, you
> can fold over all lines in much the same way to get a count.  You could
> use the same framework with a different stream element generator
> (reading one word or byte at a time instead of one line at a time) to
> implement the other features of “wc”.

I'll take another look at streams and rework that. I still think asking
the filesystem how big the file is works well for `wc -c', but `wc -l'
and `wc -w' should really be part of the same function.

> 
> There’s an example of how to define a file stream in the Guile manual in
> the documentation for SRFI-41.

That looks like a good place to start then :)

> 
> ~~ Ricardo
  
Ludovic Courtès May 26, 2016, 8:46 a.m. UTC | #5
Hello!

Efraim Flashner <efraim@flashner.co.il> skribis:

> GSoC just started and so far I have 13 tabs from the guile manual open
> on my computer. I almost implemented `du' at the same time as `wc' but
> at the time I was getting tripped up with functions returning functions
> or returning values.

Nice!

Some comments to complement what Ricardo et al. wrote:

> +(define (wc-w-command file)
> +  (let* ((input-file (open-file file "r"))
> +         (line       (make-string 80))
> +         (word-size  (read-delimited! " \t\n" line input-file))
> +         (word-count 0))
> +    (while (not (port-closed? input-file))
> +           (if (not (zero? word-size))
> +             (set! word-count (1+ word-count)))
> +           (set! line (make-string 80))
> +           (if (not (eof-object? (peek-char input-file)))
> +             (set! word-size (read-delimited! " \t\n" line input-file))
> +             (close-input-port input-file)))
> +    word-count))

First, regarding the port not being closed, it’s a good idea to use
‘call-with-input-file’ instead of ‘open-file’: ‘call-with-input-file’
makes sure the port gets closed when its “dynamic extent” is left, that
is, regardless of whether we leave the procedure “normally” or via an
exception.

  (call-with-input-file file
    (lambda (port)
      …))

Also, imperative programming Should Be Avoided; see “Coding Style” in
the manual.  ;-)  We don’t use an i/o monad or any such thing, but
‘set!’ and string modifications is definitely frowned upon.

As Ricardo suggests, you could use ‘port->stream’ and ‘stream-fold’ to
iterate over the characters read from the port.  I suspect that’d be
rather slow though, at least on 2.0, so another option is something
like:

  (define (lines+chars port)
    ;; Return the number of lines and number of chars read from PORT.
    (let loop ((lines 1) (chars 0))
      (match (read-char port)
        ((? eof-object?)              ;done!
         (values lines port))
        (#\newline                    ;recurse
         (loop (+ 1 lines) (+ 1 chars)))
        (_                            ;recurse
         (loop lines (+ 1 chars))))))

  (define (wc-command file)
    (let-values (((lines chars)
                  (call-with-input-file file lines+chars)))
      (format #t "~a ~a ~a~%" lines chars file)))

> +(define (wc-command file)
> +  (if (and (file-exists? file) (access? file 4))

This check is not needed and is subject to a race condition (“TOCTTOU”);
just let ‘call-with-input-file’ error out if the file cannot be read.

Bonus point: catch ‘system-error’ exceptions and report the inability to
open the file in a nice user-friendly way (but really, don’t bother
about it for now.)

> +     (let* ((wc-l ((@@ (guix build bournish) wc-l-command) file))
> +            (wc-w ((@@ (guix build bournish) wc-w-command) file))
> +            (wc-c ((@@ (guix build bournish) wc-c-command) file)))
> +       (begin 
> +         (display wc-l)(display #\space)
> +         (display wc-w)(display #\space)
> +         (display wc-c)(display #\space)
> +         (display file)
> +         (newline)))))

Remember that Bournish is a compiler that compiles Bash to Scheme.
So we must distinguish the support functions that are used at run time,
such as ‘ls-command-implementation’, from what the Scheme code that the
compiler emits (compile time).

In the case of ‘ls’, when the compiler encounters ‘ls’ in the input, it
emits this code:

  ((@@ (guix build bournish) ls-command-implementation))

‘ls-command-implementation’ is the implementation that is called when we
run the compiled program.

Thus, you must similarly distinguish those two stages by providing:

  1. A ‘wc-command-implementation’ procedure that implements ‘wc’;

  2. A ‘wc-command’ procedure that emits the code that calls
     ‘wc-command-implementation’; so something like:

        (define (wc-command args)
          `((@@ (guix build bournish) wc-command-implementation)
            ,@args))

     Better yet, ‘wc-command’ could check for the presence of “-l” or
     “-c” at compile time and emit a call to the right thing.

HTH!

Ludo’.
  
Efraim Flashner May 26, 2016, 5:50 p.m. UTC | #6
On Thu, May 26, 2016 at 10:46:21AM +0200, Ludovic Courtès wrote:
> 
> As Ricardo suggests, you could use ‘port->stream’ and ‘stream-fold’ to
> iterate over the characters read from the port.  I suspect that’d be
> rather slow though, at least on 2.0, so another option is something
> like:
> 
>   (define (lines+chars port)
>     ;; Return the number of lines and number of chars read from PORT.
>     (let loop ((lines 1) (chars 0))
>       (match (read-char port)
>         ((? eof-object?)              ;done!
>          (values lines port))
>         (#\newline                    ;recurse
>          (loop (+ 1 lines) (+ 1 chars)))
>         (_                            ;recurse
>          (loop lines (+ 1 chars))))))
> 
>   (define (wc-command file)
>     (let-values (((lines chars)
>                   (call-with-input-file file lines+chars)))
>       (format #t "~a ~a ~a~%" lines chars file)))
> 

Are you suggesting just dropping the word count part of `wc'? I've been
thinking about it, and the simplest way I can think of to describe a
word is a space followed by a character, or, to use the char-sets from
the guile manual, a character from char-set:whitespace followed by a
character from char-set:graphic. I can compare (read-char port) and
(peek-char port) to get a word count (possibly).

> > +(define (wc-command file)
> > +  (if (and (file-exists? file) (access? file 4))
> 
> This check is not needed and is subject to a race condition (“TOCTTOU”);
> just let ‘call-with-input-file’ error out if the file cannot be read.
> 
> Bonus point: catch ‘system-error’ exceptions and report the inability to
> open the file in a nice user-friendly way (but really, don’t bother
> about it for now.)
> 

I'm still wrapping my head around the following part. My wife says when
I work I scowl at the computer a lot and mutter :)
> 
> Remember that Bournish is a compiler that compiles Bash to Scheme.
> So we must distinguish the support functions that are used at run time,
> such as ‘ls-command-implementation’, from what the Scheme code that the
> compiler emits (compile time).
> 
> In the case of ‘ls’, when the compiler encounters ‘ls’ in the input, it
> emits this code:
> 
>   ((@@ (guix build bournish) ls-command-implementation))
> 
> ‘ls-command-implementation’ is the implementation that is called when we
> run the compiled program.
> 
> Thus, you must similarly distinguish those two stages by providing:
> 
>   1. A ‘wc-command-implementation’ procedure that implements ‘wc’;
> 
>   2. A ‘wc-command’ procedure that emits the code that calls
>      ‘wc-command-implementation’; so something like:
> 
>         (define (wc-command args)
>           `((@@ (guix build bournish) wc-command-implementation)
>             ,@args))
> 
>      Better yet, ‘wc-command’ could check for the presence of “-l” or
>      “-c” at compile time and emit a call to the right thing.

I checked with coreutil's 'wc', and it emits in its particular order
whether you call 'wc -l -c' or 'wc -lc' or 'wc -cl'

> 
> HTH!
> 
> Ludo’.
  

Patch

diff --git a/guix/build/bournish.scm b/guix/build/bournish.scm
index 4022796..336a650 100644
--- a/guix/build/bournish.scm
+++ b/guix/build/bournish.scm
@@ -1,5 +1,6 @@ 
 ;;; GNU Guix --- Functional package management for GNU
 ;;; Copyright © 2016 Ludovic Courtès <ludo@gnu.org>
+;;; Copyright © 2016 Efraim Flashner <efraim@flashner.co.il>
 ;;;
 ;;; This file is part of GNU Guix.
 ;;;
@@ -103,6 +104,54 @@  characters."
        ((@ (guix build utils) dump-port) port (current-output-port))
        *unspecified*)))
 
+(define (file-contents file)
+  `(make-stream
+     (port->stream (open-input-file ,file) read-char)))
+
+(define (file-size file)
+  (if (and (file-exists? file)
+           (access? file 4))
+    (stat:size (stat file))
+    *unspecified*))
+
+(define (wc-c-command file)
+  ((@@ (guix build bournish) file-size) file))
+
+(define (wc-w-command file)
+  (let* ((input-file (open-file file "r"))
+         (line       (make-string 80))
+         (word-size  (read-delimited! " \t\n" line input-file))
+         (word-count 0))
+    (while (not (port-closed? input-file))
+           (if (not (zero? word-size))
+             (set! word-count (1+ word-count)))
+           (set! line (make-string 80))
+           (if (not (eof-object? (peek-char input-file)))
+             (set! word-size (read-delimited! " \t\n" line input-file))
+             (close-input-port input-file)))
+    word-count))
+
+(define (wc-l-command file)
+  (let* ((input-file (open-file file "r"))
+         (line       (read-line input-file))
+         (line-count 0))
+    (while (not (eof-object? line))
+           (set! line-count (1+ line-count))
+           (set! line (read-line input-file)))
+    line-count))
+
+(define (wc-command file)
+  (if (and (file-exists? file) (access? file 4))
+     (let* ((wc-l ((@@ (guix build bournish) wc-l-command) file))
+            (wc-w ((@@ (guix build bournish) wc-w-command) file))
+            (wc-c ((@@ (guix build bournish) wc-c-command) file)))
+       (begin 
+         (display wc-l)(display #\space)
+         (display wc-w)(display #\space)
+         (display wc-c)(display #\space)
+         (display file)
+         (newline)))))
+
 (define (help-command . _)
   (display "\
 Hello, this is Bournish, a minimal Bourne-like shell in Guile!
@@ -129,7 +178,8 @@  commands such as 'ls' and 'cd'; it lacks globbing, pipes---everything.\n"))
     ("help"   ,help-command)
     ("ls"     ,ls-command)
     ("which"  ,which-command)
-    ("cat"    ,cat-command)))
+    ("cat"    ,cat-command)
+    ("wc"     ,wc-command)))
 
 (define (read-bournish port env)
   "Read a Bournish expression from PORT, and return the corresponding Scheme