[PATCH} Add RAID devices.

Message ID 20160723140701.GB6873@solar
State New
Headers

Commit Message

Andreas Enge July 23, 2016, 2:07 p.m. UTC
  Hello,

On Fri, Jul 15, 2016 at 10:25:47AM -0400, myglc2 wrote:
> How do you plan to address mdadm and grub config?
> Will you be booting from RAID?

the attached patch adds RAID support to our mapped device mechanism.

To give more explanation:
Like for LUKS devices or file systems, the RAID itself needs to be created
during installation, after booting from the USB key:
   mdadm --create /dev/md0 --level=raid10 --layout=f2 \
      --raid-devices=2 /dev/sda2 /dev/sdb2
   (wait 12 hours or so for 4 TB disks)
   mkfs.ext4 /dev/md0

Then one adds a mapped-device of type raid-device-mapping, adds the
raid10 kernel module to the initrd and can boot from the RAID device.
The mapped-device mechanism calls "mdadm --assemble" and "mdadm --stop".

I might write a more detailed blog post about this; there is a little
subtlety with the non-automatic determination of dependencies between
devices, so one needs to make sure that the partitions to be assembled
are present before the mdadm command is executed.

Looking forward to comments on the patch,

Andreas
From fc2d8dc30c04677ebf553b02227dc10b0be49665 Mon Sep 17 00:00:00 2001
From: Andreas Enge <andreas@enge.fr>
Date: Thu, 14 Jul 2016 15:51:59 +0200
Subject: [PATCH] system: Add mapped devices for RAID.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* gnu/system/mapped-devices.scm (raid-device-mapping, open-raid-device,
close-raid-device): New variables.
* doc/guix.texi (Mapped Devices): Add documentation for RAID devices,
reorganize documentation for LUKS devices.

Co-authored-by: Ludovic Courtès <ludo@gnu.org>
---
 doc/guix.texi                 | 113 +++++++++++++++++++++++++++---------------
 gnu/system/mapped-devices.scm |  29 ++++++++++-
 2 files changed, 102 insertions(+), 40 deletions(-)
  

Comments

Chris Marusich July 24, 2016, 5:43 a.m. UTC | #1
Andreas Enge <andreas@enge.fr> writes:

>> How do you plan to address mdadm and grub config?
>> Will you be booting from RAID?
>
> the attached patch adds RAID support to our mapped device mechanism.

Cool!  Is it possible to use them in combination?  Using the example
From the documentation, would it possible to use LUKS to create an
encrypted /dev/mapper/home which uses /dev/md0 instead of /dev/sda3?

> +Alternatively, to become independent of device numbering, one may obtain
> +the LUKS UUID (@dfn{unique identifier}) of the source device by a
> +command like:

How difficult would it be to add support for specifying the sources
using UUIDs?  I know we could use use the device files in
/dev/disk/by-uuid/, but "(guix) File Systems" (footnote 2) recommends
against that: "Note that, while it is tempting to use
‘/dev/disk/by-uuid’ and similar device names to achieve the same result,
this is not recommended: These special device nodes are created by the
udev daemon and may be unavailable at the time the device is mounted."

> +The @file{/dev/md0} device can then be used as the @code{device} of a
> +@code{file-system} declaration (@pxref{File Systems}).
> +Note that the RAID level need not be given; it is chosen during the
> +initial creation and formatting of the RAID device and is determined
> +automatically later.

I understand that in Linux, the device file names (e.g., /dev/sda3) can
sometimes change unexpectedly.  If they change later on, will anything
bad happen?

> +  #~(let ((every (@ (srfi srfi-1) every)))

Can't you just use "every" on its own?  It looks like you've imported
the srfi-1 module earlier on.

> +        (unless (every file-exists? '#$source)
> +          (format #t "waiting a bit...~%")
> +          (sleep 1)
> +          (loop)))

Does the code in this gexp get invoked every time the system starts up?
Why is a loop better here than an error?  What if the source device
files never show up?

Also, will that string be properly localized?
  
Ludovic Courtès July 25, 2016, 8:59 p.m. UTC | #2
Hello,

Chris Marusich <cmmarusich@gmail.com> skribis:

> Andreas Enge <andreas@enge.fr> writes:

[...]

>> +  #~(let ((every (@ (srfi srfi-1) every)))
>
> Can't you just use "every" on its own?  It looks like you've imported
> the srfi-1 module earlier on.

I’m the one who suggested it as a “temporary hack”, as we call such
things.  ;-)

The story is that this expression here gets stages in non-top-level
position, where it cannot directly do ‘use-modules’, hence this hack.

This should be fixed eventually, possibly in gexp themselves.

>> +        (unless (every file-exists? '#$source)
>> +          (format #t "waiting a bit...~%")
>> +          (sleep 1)
>> +          (loop)))
>
> Does the code in this gexp get invoked every time the system starts up?
> Why is a loop better here than an error?  What if the source device
> files never show up?

Right, another super-temporary hack.  The right thing would be to do
like ‘canonicalize-device-spec’ in (gnu build file-systems) does, which
is to error out after a few iterations, with the effect of spawning an
emergency REPL.

The mechanism to wait for devices should be factorized.

> Also, will that string be properly localized?

No it won’t, indeed.  Currently message catalogs and locales are
unavailable in the initrd, and it would probably make the initrd pretty
big to add them, so I’d be tempted to ignore i18n for early boot
messages that hopefully few people will notice.

Thanks,
Ludo’.
  
Andreas Enge July 25, 2016, 9:17 p.m. UTC | #3
Hello!

On Sat, Jul 23, 2016 at 10:43:58PM -0700, Chris Marusich wrote:
> Cool!  Is it possible to use them in combination?  Using the example
> From the documentation, would it possible to use LUKS to create an
> encrypted /dev/mapper/home which uses /dev/md0 instead of /dev/sda3?

unfortunately not. This is an area where some work should be invested.
It would require to do things properly in order also. LVM support also
comes to mind. I think these devices can be staged in an arbitrarily
complex way, no? Encrypting a RAID device, or creating a RAID device
from encrypted partitions, for instance (of which the former sounds more
reasonable to me).

> I understand that in Linux, the device file names (e.g., /dev/sda3) can
> sometimes change unexpectedly.  If they change later on, will anything
> bad happen?

Yes, the RAID could not be assembled, and then the machine could not boot
up. But I have never experienced this kind of problem for hard disks.

> Does the code in this gexp get invoked every time the system starts up?
> Why is a loop better here than an error?

Yes. An error is too harsh: When I tried things out, my RAID assembling
was (tried to be) carried out before the hard disks were visible, so this
failed. Waiting a little solved the problem.

Andreas
  
Chris Marusich July 26, 2016, 7:43 a.m. UTC | #4
Hi Andreas,

Ludo's response clarified a lot of things for me.  The only remaining
feedback I have is that (1) to aid the reader, you should consider
adding a cross-reference from "(guix) Mapped Devices" (in the part where
you mention that certain modules must be added) to "(guix) Initial RAM
Disk", and (2) you might want to look into using the "Auto Assembly"
feature of mdadm (see below).

If you've tested the changes, though, and it works, I see no reason not
to commit this and enable everyone to enjoy the use of RAID arrays! :)

Andreas Enge <andreas@enge.fr> writes:

> On Sat, Jul 23, 2016 at 10:43:58PM -0700, Chris Marusich wrote:
>> Cool!  Is it possible to use them in combination?  Using the example
>> From the documentation, would it possible to use LUKS to create an
>> encrypted /dev/mapper/home which uses /dev/md0 instead of /dev/sda3?
>
> unfortunately not. This is an area where some work should be invested.
> It would require to do things properly in order also. LVM support also
> comes to mind. I think these devices can be staged in an arbitrarily
> complex way, no? Encrypting a RAID device, or creating a RAID device
> from encrypted partitions, for instance (of which the former sounds more
> reasonable to me).

I agree it would be a nice feature, but if your patch is working right
now to enable the use of RAID, then I think it would be fine to submit
your patch now and add such a feature later.

>> I understand that in Linux, the device file names (e.g., /dev/sda3) can
>> sometimes change unexpectedly.  If they change later on, will anything
>> bad happen?
>
> Yes, the RAID could not be assembled, and then the machine could not boot
> up. But I have never experienced this kind of problem for hard disks.

It occurs occasionally [1], but the fewer disks you have, the less
likely you are to observe it.  I wonder if you can use mdadm's "Auto
Assembly" feature (see "man 8 mdadm" for details) to avoid this issue
entirely?  It sounds like you might not even need to specify a source
device list, if the description of "Auto Assembly" is to be believed.

[1] For example: https://serverfault.com/questions/140071/hard-drive-device-names-are-different-from-one-reboot-to-another-in-ubuntu
  
myglc2 July 30, 2016, 11:05 p.m. UTC | #5
Andreas Enge <andreas@enge.fr> writes:

> the attached patch adds RAID support to our mapped device mechanism.
>
> To give more explanation:
> Like for LUKS devices or file systems, the RAID itself needs to be created
> during installation, after booting from the USB key:
>    mdadm --create /dev/md0 --level=raid10 --layout=f2 \
>       --raid-devices=2 /dev/sda2 /dev/sdb2
>    (wait 12 hours or so for 4 TB disks)
>    mkfs.ext4 /dev/md0
>
> Then one adds a mapped-device of type raid-device-mapping, adds the
> raid10 kernel module to the initrd and can boot from the RAID device.
> The mapped-device mechanism calls "mdadm --assemble" and "mdadm --stop".
>
> I might write a more detailed blog post about this; there is a little
> subtlety with the non-automatic determination of dependencies between
> devices, so one needs to make sure that the partitions to be assembled
> are present before the mdadm command is executed.

Thanks. Starting with baby steps, I am running GuixSD system from a
separate drive and adding raid1 array with 2 new disks. I created an
array that will assemble like so...

g1@g1 ~$ sudo mdadm --assemble /dev/md0 /dev/sdb1 /dev/sdc1
mdadm: /dev/md0 has been started with 2 drives.

g1@g1 ~$ cat /proc/mdstat
Personalities : [raid1] 
md0 : active raid1 sdc1[0] sdb1[1]
      239809536 blocks super 1.2 [2/2] [UU]
      bitmap: 0/2 pages [0KB], 65536KB chunk

I added this to my config (system25.scm attached) ...

(operating-system
[...]
  (initrd (lambda (fs . args)
	    (apply base-initrd fs
		   #:extra-modules '("raid1")
		   #:mapped-devices '((mapped-device
				       (source (list "/dev/sdb1" "/dev/sdc1"))
				       (target "/dev/md0")
				       (type raid-device-mapping)))				      
		   args)))
[...]

I rebooted and the array is not assembled ;-(

g1@g1 ~$ cat /proc/mdstat
Personalities : [raid1] 
unused devices: <none>

/var/log/messages contains ...
[...]
Jul 30 18:09:59 localhost vmunix: [    1.872801] md: raid1 personality registered for level 1
[...]

Any suggestions, corrections or a sample working config would be most welcome.

TIA - George
  
Andreas Enge July 31, 2016, 8:52 a.m. UTC | #6
On Sat, Jul 30, 2016 at 07:05:25PM -0400, myglc2 wrote:
> > I might write a more detailed blog post about this; there is a little
> > subtlety with the non-automatic determination of dependencies between
> > devices, so one needs to make sure that the partitions to be assembled
> > are present before the mdadm command is executed.
> I rebooted and the array is not assembled ;-(

Strange! But I will also tell you the subtlety ;-)  Here is a trick to use
(thanks to Ludovic):
(define md0
  (mapped-device
    (source (list "/dev/sda2" "/dev/sdb2"))
      (target "/dev/md0")
      (type raid-device-mapping)))
(operating-system
...
  (mapped-devices (list md0))
  (file-systems (cons (file-system
                        (title 'device)
                        (device "/dev/md0")
                        (dependencies (list md0))
                        (mount-point "/")
                        (type "ext4"))
                      %base-file-systems))

The "dependencies" field makes sure that the file system is only mounted
after the array is assembled; I am not sure that this is your problem,
but you might want to give it a try.

In the long run, this should be reprogrammed: Devices and file systems
should wait until all their "inputs" are present, or at least wait for
a reasonable time.


Andreas
  
myglc2 July 31, 2016, 4:12 p.m. UTC | #7
Andreas Enge <andreas@enge.fr> writes:

> On Sat, Jul 30, 2016 at 07:05:25PM -0400, myglc2 wrote:
>> > I might write a more detailed blog post about this; there is a little
>> > subtlety with the non-automatic determination of dependencies between
>> > devices, so one needs to make sure that the partitions to be assembled
>> > are present before the mdadm command is executed.
>> I rebooted and the array is not assembled ;-(
>
> Strange! But I will also tell you the subtlety ;-)  Here is a trick to use
> (thanks to Ludovic):
> (define md0
>   (mapped-device
>     (source (list "/dev/sda2" "/dev/sdb2"))
>       (target "/dev/md0")
>       (type raid-device-mapping)))
> (operating-system
> ...
>   (mapped-devices (list md0))
>   (file-systems (cons (file-system
>                         (title 'device)
>                         (device "/dev/md0")
>                         (dependencies (list md0))
>                         (mount-point "/")
>                         (type "ext4"))
>                       %base-file-systems))
>
> The "dependencies" field makes sure that the file system is only mounted
> after the array is assembled; I am not sure that this is your problem,
> but you might want to give it a try.
>
> In the long run, this should be reprogrammed: Devices and file systems
> should wait until all their "inputs" are present, or at least wait for
> a reasonable time.

Thanks, I tried that. the 'guix system reconfigure' succeeds and starts
the raid array (please see system35.scm & system35.log, attached).

But the reboot hangs at:

[...] clocksource: Switched to clocksource tsc
  
Andreas Enge July 31, 2016, 4:25 p.m. UTC | #8
On Sun, Jul 31, 2016 at 12:12:02PM -0400, myglc2 wrote:
> Thanks, I tried that. the 'guix system reconfigure' succeeds and starts
> the raid array (please see system35.scm & system35.log, attached).
> 
> But the reboot hangs at:
> 
> [...] clocksource: Switched to clocksource tsc

No idea. The only difference I saw with my setup is that you have the
#:mapped-devices clause in the initrd; I am just using this:

  ;; Add a kernel module for RAID-10.
  (initrd (lambda (file-systems . rest)
            (apply base-initrd file-systems
                   #:extra-modules '("raid10")
                                   rest)))

Andreas
  
myglc2 Aug. 2, 2016, 1:05 a.m. UTC | #9
Andreas Enge <andreas@enge.fr> writes:

>
> No idea. The only difference I saw with my setup is that you have the
> #:mapped-devices clause in the initrd; I am just using this:
>
>   ;; Add a kernel module for RAID-10.
>   (initrd (lambda (file-systems . rest)
>             (apply base-initrd file-systems
>                    #:extra-modules '("raid10")
>                                    rest)))
>
> Andreas

Unfortunately w/ this change the boot still hangs, so ... I will post it
to bug-guix. Thanks again! - George
  

Patch

diff --git a/doc/guix.texi b/doc/guix.texi
index 393efab..ddeeb71 100644
--- a/doc/guix.texi
+++ b/doc/guix.texi
@@ -6946,6 +6946,7 @@  and unmount user-space FUSE file systems.  This requires the
 @cindex mapped devices
 The Linux kernel has a notion of @dfn{device mapping}: a block device,
 such as a hard disk partition, can be @dfn{mapped} into another device,
+usually in @code{/dev/mapper/},
 with additional processing over the data that flows through
 it@footnote{Note that the GNU@tie{}Hurd makes no difference between the
 concept of a ``mapped device'' and that of a file system: both boil down
@@ -6955,42 +6956,14 @@  devices, like file systems, using the generic @dfn{translator} mechanism
 (@pxref{Translators,,, hurd, The GNU Hurd Reference Manual}).}.  A
 typical example is encryption device mapping: all writes to the mapped
 device are encrypted, and all reads are deciphered, transparently.
+Guix extends this notion by considering any device or set of devices that
+are @dfn{transformed} in some way to create a new device; for instance,
+RAID devices are obtained by @dfn{assembling} several other devices, such
+as hard disks or partitions, into a new one that behaves as one partition.
+Other examples, not yet implemented, are LVM logical volumes.
 
-Mapped devices are declared using the @code{mapped-device} form:
-
-@example
-(mapped-device
-  (source "/dev/sda3")
-  (target "home")
-  (type luks-device-mapping))
-@end example
-
-Or, better yet, like this:
-
-@example
-(mapped-device
-  (source (uuid "cb67fc72-0d54-4c88-9d4b-b225f30b0f44"))
-  (target "home")
-  (type luks-device-mapping))
-@end example
-
-@cindex disk encryption
-@cindex LUKS
-This example specifies a mapping from @file{/dev/sda3} to
-@file{/dev/mapper/home} using LUKS---the
-@url{http://code.google.com/p/cryptsetup,Linux Unified Key Setup}, a
-standard mechanism for disk encryption.  In the second example, the UUID
-(unique identifier) is the LUKS UUID returned for the device by a
-command like:
-
-@example
-cryptsetup luksUUID /dev/sdx9
-@end example
-
-The @file{/dev/mapper/home}
-device can then be used as the @code{device} of a @code{file-system}
-declaration (@pxref{File Systems}).  The @code{mapped-device} form is
-detailed below.
+Mapped devices are declared using the @code{mapped-device} form,
+defined as follows; for examples, see below.
 
 @deftp {Data Type} mapped-device
 Objects of this type represent device mappings that will be made when
@@ -6998,13 +6971,17 @@  the system boots up.
 
 @table @code
 @item source
-This string specifies the name of the block device to be mapped, such as
-@code{"/dev/sda3"}.
+This is either a string specifying the name of the block device to be mapped,
+such as @code{"/dev/sda3"}, or a list of such strings when several devices
+need to be assembled for creating a new one.
 
 @item target
-This string specifies the name of the mapping to be established.  For
-example, specifying @code{"my-partition"} will lead to the creation of
+This string specifies the name of the resulting mapped device.  For
+kernel mappers such as encrypted devices of type @code{luks-device-mapping},
+specifying @code{"my-partition"} leads to the creation of
 the @code{"/dev/mapper/my-partition"} device.
+For RAID devices of type @code{raid-device-mapping}, the full device name
+such as @code{"/dev/md0"} needs to be given.
 
 @item type
 This must be a @code{mapped-device-kind} object, which specifies how
@@ -7018,6 +6995,64 @@  command from the package with the same name.  It relies on the
 @code{dm-crypt} Linux kernel module.
 @end defvr
 
+@defvr {Scheme Variable} raid-device-mapping
+This defines a RAID device, which is assembled using the @code{mdadm}
+command from the package with the same name.  It requires a Linux kernel
+module for the appropriate RAID level to be loaded, such as @code{raid456}
+for RAID-4, RAID-5 or RAID-6, or @code{raid10} for RAID-10.
+@end defvr
+
+@cindex disk encryption
+@cindex LUKS
+The following example specifies a mapping from @file{/dev/sda3} to
+@file{/dev/mapper/home} using LUKS---the
+@url{http://code.google.com/p/cryptsetup,Linux Unified Key Setup}, a
+standard mechanism for disk encryption.
+The @file{/dev/mapper/home}
+device can then be used as the @code{device} of a @code{file-system}
+declaration (@pxref{File Systems}).
+
+@example
+(mapped-device
+  (source "/dev/sda3")
+  (target "home")
+  (type luks-device-mapping))
+@end example
+
+Alternatively, to become independent of device numbering, one may obtain
+the LUKS UUID (@dfn{unique identifier}) of the source device by a
+command like:
+
+@example
+cryptsetup luksUUID /dev/sda3
+@end example
+
+and use it as follows:
+
+@example
+(mapped-device
+  (source (uuid "cb67fc72-0d54-4c88-9d4b-b225f30b0f44"))
+  (target "home")
+  (type luks-device-mapping))
+@end example
+
+A RAID device formed of the partitions @file{/dev/sda1} and @file{/dev/sdb1}
+may be declared as follows:
+
+@example
+(mapped-device
+  (source (list "/dev/sda1" "/dev/sdb1"))
+  (target "/dev/md0")
+  (type raid-device-mapping))
+@end example
+
+The @file{/dev/md0} device can then be used as the @code{device} of a
+@code{file-system} declaration (@pxref{File Systems}).
+Note that the RAID level need not be given; it is chosen during the
+initial creation and formatting of the RAID device and is determined
+automatically later.
+
+
 @node User Accounts
 @subsection User Accounts
 
diff --git a/gnu/system/mapped-devices.scm b/gnu/system/mapped-devices.scm
index 732f73c..d0a9f02 100644
--- a/gnu/system/mapped-devices.scm
+++ b/gnu/system/mapped-devices.scm
@@ -1,5 +1,6 @@ 
 ;;; GNU Guix --- Functional package management for GNU
 ;;; Copyright © 2014, 2015, 2016 Ludovic Courtès <ludo@gnu.org>
+;;; Copyright © 2016 Andreas Enge <andreas@enge.fr>
 ;;;
 ;;; This file is part of GNU Guix.
 ;;;
@@ -22,6 +23,7 @@ 
   #:use-module (gnu services)
   #:use-module (gnu services shepherd)
   #:autoload   (gnu packages cryptsetup) (cryptsetup)
+  #:autoload   (gnu packages linux) (mdadm)
   #:use-module (srfi srfi-1)
   #:use-module (ice-9 match)
   #:export (mapped-device
@@ -38,7 +40,8 @@ 
             device-mapping-service-type
             device-mapping-service
 
-            luks-device-mapping))
+            luks-device-mapping
+            raid-device-mapping))
 
 ;;; Commentary:
 ;;;
@@ -127,4 +130,28 @@ 
    (open open-luks-device)
    (close close-luks-device)))
 
+(define (open-raid-device source target)
+  "Return a gexp that assembles SOURCE (a list of devices) to the RAID device
+TARGET, using 'mdadm'."
+  #~(let ((every (@ (srfi srfi-1) every)))
+      (let loop ()
+        (unless (every file-exists? '#$source)
+          (format #t "waiting a bit...~%")
+          (sleep 1)
+          (loop)))
+       (zero? (system* (string-append #$mdadm "/sbin/mdadm")
+                                      "--assemble" #$target
+                                      #$@source))))
+
+(define (close-raid-device source target)
+  "Return a gexp that stops the RAID device TARGET."
+  #~(zero? (system* (string-append #$mdadm "/sbin/mdadm")
+                    "--stop" #$target)))
+
+(define raid-device-mapping
+  ;; The type of RAID mapped devices.
+  (mapped-device-kind
+   (open open-raid-device)
+   (close close-raid-device)))
+
 ;;; mapped-devices.scm ends here