Ubuntu 16.10 LXC host on ZFS Root, with EFI and Time Machine
Categories
Tags
Recent articles
Ubuntu 16.10 LXC host on ZFS Root, with EFI and Time Machine
How to connect any serial device to the internet
If you're here you have probably already found this document: Ubuntu 16.10 Root on ZFS. This is a modified and expanded version of that, which aims to be a step by step guide to installing a root system to ZFS which boots from a modern EFI system.
Once installed, I show how to install LXC so it plays nicely with ZFS, and install a VM that serves as a Time Machine destination. We use this exact setup so the Macs in the office can back up their files over the network to a proper computer, with proper storage, rather than a slow, crufty and poorly supported NAS running insecure or out of date software. There are some caveats:
- These steps are for EFI boot only - I've finally made the jump away from MBR
- I presume a Raid-Z install to new disks (4 x SSD in my case). Raid-Z2 is almost identical.
- Root system is absolutely minimal and designed to run LXC and nothing else
- I prefer dynamic IPs and Zeroconf announcment of host names, but this aspect is easy enough to skip if you prefer it old-school
Part 1: Installing Ubuntu Root to ZFS
Download Ubuntu 16.10 Live CD and start it up. Open a Terminal (Ctrl-T), and if you
are connecting
remotely to run the setup, run apt --yes install openssh-server
then log in remotely with
ssh ubuntu@ubuntu.local
. Then run the below commands
sudo -s apt-add-repository universe apt update apt --yes install debootstrap gdisk zfs-initramfs
Now we partition the disks. You can do this using
/dev/disk-by-id/XXX
as in the original guide, although if you have
a lot it's easy enough to lose track. At this point it's simpler to
ls -l /dev/sd?
to see which disks are in your system,
identify which one's you're installing to, and run the following
commands for each one of those disks:
sgdisk -n9:-9M:0 -t9:BF07 /dev/sdX sgdisk -n3:1M:+512M -t3:EF00 /dev/sdX sgdisk -n1:0:0 -t1:BF01 /dev/sdX
I found I had to run the above in that order. They create an 8MB "Solaris Reserved" partition at the end of the disk, for a reason I have yet to determine beyond "that's what Solaris does". It creates an EFI partition of 512M at the start of the disk, and finally it creates the primary ZFS partition on the rest of the disk.
Now create the zpool. For this command, you should definitely use
the full /dev/disk/by-id/NNN
path, not the /dev/sdX
equivalent.
If you've ever pulled all the disks out of a RAID array then put them back in a different
order,
you'll understand why this is recommended.
You will be adding partition 1 of each of the newly-partitioned disks, so the path
for each will
end with "part1". I am creating this on 4 x Samsung Evo 850 SSDs,
so I will be specifying 8KB blocks by way of ashift=13
. Then
create a single ZFS filesystem tank/root
for the root system.
The original guide partitions root into sections for var, home etc.
which (given this is building a simple LXC host) is an unnecessary complexity.
zpool create -o ashift=13 -O atime=off -O canmount=off -O normalization=formD -O mountpoint=/ -R /mnt tank raidz /dev/disk/by-id/ata*Samsung_SSD_850*part1 zfs set xattr=sa tank # See https://github.com/zfsonlinux/zfs/issues/443 zfs create -o canmount=noauto -o mountpoint=/ tank/root zfs mount tank/root
The new ZFS filesystem is now mounted on /mnt
.
Install your OS of choice. I am using Ubuntu Yaketty rather than Xenial LTS because,
at the time of writing, Yaketty has a patch applied for a
ZFS/GRUB2 bug
which is not in Xenial.
debootstrap yakkety /mnt mount --rbind /dev /mnt/dev mount --rbind /proc /mnt/proc mount --rbind /sys /mnt/sys chroot /mnt ln -s /proc/self/mounts /etc/mtab
You're now in the new environment. Install what you need to, but I would suggest the minimum for now. I set the hostname, timezone, locale, network, create a user and install a minimal list of packages.
echo myhostname > /etc/hostname echo 127.0.1.1 myhostname >> /etc/hosts locale-gen en_US.UTF-8 dpkg-reconfigure tzdata cat <<EOF > /etc/apt/sources.list deb http://archive.ubuntu.com/ubuntu yakkety main universe deb-src http://archive.ubuntu.com/ubuntu yakkety main universe deb http://security.ubuntu.com/ubuntu yakkety-security main universe deb-src http://security.ubuntu.com/ubuntu yakkety-security main universe deb http://archive.ubuntu.com/ubuntu yakkety-updates main universe deb-src http://archive.ubuntu.com/ubuntu yakkety-updates main universe EOF apt update apt --yes --no-install-recommends install linux-image-generic apt --yes install zfs-initramfs openssh-server nfs-kernel-server gdisk dosfstools grub-efi-amd64 # essential apt --yes install zsh vim screen psmisc avahi-utils # your favourites here useradd -m -U -G sudo -s /bin/bash mike passwd mike
Now we come to the part where we try to fix the daft interface names given
to us by systemd, which has helpfully fixed the problem of predicable interface names
in an unpredictable order by giving us unpredictable names in a predictable order.
You can do this with a boot parameter, I prefer to rename
them with udev rules so they're useful AND predictable. Identify your network
interfaces with ifconfig -a
, note the MAC addresses then set up a
udev rule to give them the names you want. Also edit
/etc/network/interfaces
to ensure the interfaces comes up on boot:
cat <<EOF > /etc/udev/rules.d/70-net.rules SUBSYSTEM=="net", ACTION=="add", ATTR{address}=="XX:XX:XX:XX:XX:XX", NAME="eth0" EOF cat <<EOF > /etc/network/interfaces source-directory /etc/network/interfaces.d auto eth0 iface eth0 inet dhcp EOF
Next, a possibly optional step: if at some point you share any of the ZFS
fiesystems over NFS, be sure to edit
/etc/default/zfs
to set ZFS_SHARE='yes'
Now we need to install the bootloader. This was my first crack at EFI
booting but it's pretty simple to set up. You need a DOS partition to
store the EFI files which you'll mount in /boot/efi
. This
will be done identically on each disk, which means you can pull any
disk out of the array and it will still boot. I tested this, and so should
you. Unlike the original guide I am referring to the partition by its
label, which makes the process a bit simpler: the fstab
entry
will mount the first partition found with a label of "EFI". As they're all
identical, we don't care which one it is. Run the following:
mkdir /boot/efi echo LABEL=EFI /boot/efi vfat defaults 0 1 >> /etc/fstab
Then repeat the following steps for each disk in your pool. You can
reference them by /dev/sdX
here, as I have for brevity:
mkdosfs -F 32 -n EFI /dev/sdX3 mount /dev/sdX3 /boot/efi grub-install --target=x86_64-efi --efi-directory=/boot/efi --bootloader-id=ubuntu --recheck --no-floppy umount /boot/efi
Finally we come to configure grub. I have first edited
/etc/default/grub
to remove GRUB_HIDDEN_TIMEOUT, enable
GRUB_TERMINAL, remove "quiet splash"
from the kernal arguments and add
"ipv6.disable=1"
, to disable IPV6 (which I'm not familiar enough
with to secure to my satisfaction). These steps are useful for a server, but all
entirely optional. To update grub, snapshot the root FS (nice idea from
the original guide) and exit our chroot
environment:
update-initramfs -c -k all update-grub zfs snapshot tank/root@install exit
We're now back to our live-CD environment. Unmount cleanly and reboot:
mount | grep -v zfs | tac | awk '/\/mnt/ {print $3}' | xargs -i{} umount -lf {} zpool export tank reboot
...and relax. Make a cup of tea. Some thing to note at this stage:
Part 2: LXC on ZFS
We now have a very minimal system on a ZFS root, and I prefer to keep it that way. Anything beyond the basics I install in a Virtual Machine - the advantages of this are many, but for me the biggest is that I can leave the "root" system essentially untouched. Change introduces instability, and I don't like rebooting things. It's not just application software; I run a VPN in one VirtualMachine, filesharing in another (see below). The best part is that LXC and ZFS go together very nicely, but I find a few tweaks are required to Ubuntu out-of-the-box to make this go smoothly. First, the basics:
apt upgrade apt --yes install lxc lxcfs zfs create tank/lxc
Installing "lxc" will cause a bridge called "lxbr0" to be dynamically created, which
I dislike - I prefer to define
the bridge manually in /etc/network/interfaces
. This involves the following steps:
you can use a static IP here if you prefer. I also like to set the MAC address template
for my
virtual machines to something I know, so I can easily recognise them on the LAN. Finally,
I want
the default storage for LXC hosts to be in ZFS under the "tank/lxc":
ip link set lxcbr0 down cat <<EOF > /etc/network/interfaces source-directory /etc/network/interfaces.d auto br0 iface br0 inet dhcp bridge_ports eth0 bridge_stp off bridge_fd 0 EOF cat <<EOF > /etc/lxc/default.conf lxc.network.type = veth lxc.network.link = br0 lxc.network.flags = up lxc.network.hwaddr = 00:00:00:ff:ff:xx EOF cat <<EOF > /etc/lxc/lxc.conf lxc.lxcpath = /var/lib/lxc lxc.bdev.zfs.root = tank/lxc EOF sed -i'' 's/USE_LXC_BRIDGE="true"/USE_LXC_BRIDGE="false"' /etc/default/lxc-net sed -i'' '/rlimit-nproc=3/d' /etc/avahi/avahi-daemon.conf
The last line works around a long-standing bug in the way Avahi is setup, at least on Debian when virtual machines are in use. You'll have to do this on the guest systems too, as shown below, although obviously only if you've installed avahi.
Reboot and verify the br0 bridge comes up. Now you can create your LXC VMs as you like. For example, to create a brand new VM running Xenial:
lxc-create --template ubuntu --name $GUESTVM -- --release xenial
This will create a new VM with the configuration files stored under /var/lib/lxc/$GUESTVM
,
and a new ZFS filesystem called tank/lxc/$GUESTVM
which is mounted under
/var/lib/lxc/$GUESTVM/rootfs
. I find this works nicely for me. To configure the new VM for remote
access so I can run ssh $GUESTVM.local
, I run the following:
lxc-start --name $GUESTVM lxc-attach --name $GUESTVM apt --yes install openssh-server avahi-utils useradd -m -U -G sudo -s /bin/bash mike passwd mike sed -i'' '/rlimit-nproc=3/d' /etc/avahi/avahi-daemon.conf systemctl restart avahi-daemon userdel -r ubuntu exit
Another useful thing to do is to import a shared folder of some sort from the host system. The final part of this guide will show how to set up a Virtual Machine running Netatalk, which I will use for Time Machine backups for the Mac machines in the office.
Part 3: Time Machine backups to ZFS from a Virtual Machine
Here's where it gets fun. The supplied with Ubuntu is hopelessly outdated, so you have to build it from source. But why install (maintain, pollute...) all the build packages for this in your root system? Create a VM to build and run it. I have a VM for every major application we run at the office - this is broadly the same sort of vision as DropBox has, although I never got on with their implementation.
First we need so set up ZFS as a target for Time Machine. For sake of argument I'll
be mounting mine under
/home/timemachine/$USER
. Create a share for each user, as below, and set the quota.
Time Machine requires a quota, as it will expand to fill all the space available.
zfs create -o mountpoint=/home/timemachine tank/timemachine zfs create -o quota=100G tank/timemachine/mike chown -R mike:mike /home/timemachine/mike
Now we need to create a new VM to run Netatalk, and we need it to have access to these
folders. Here's
how you can do this from scratch using Xenial as the guest OS. I'm mounting /home/timemachine
under the same
location in the VM and setting the VM to autostart. Use "rbind" rather than "bind"
to ensure the ZFS filesystems
for each user are available too. As usual, I'm installing openssh, avahi and adding
a user. Installing
Avahi is mandatory for Time Machine
lxc-create --template ubuntu --name timemachine -- --release xenial cd /var/lib/lxc/timemachine echo "lxc.mount.entry = /home/timemachine home/timemachine none rbind 0 0" >> config echo "lxc.start.auto = 1" >> config mkdir rootfs/home/timemachine lxc-start --name timemachine lxc-attach --name timemachine apt --yes install openssh-server avahi-utils useradd -m -U -G sudo -s /bin/bash mike passwd mike sed -i'' '/rlimit-nproc=3/d' /etc/avahi/avahi-daemon.conf systemctl restart avahi-daemon userdel -r ubuntu
The next steps are to build Netatalk, which at the time of writing is 3.1.11. These steps are largely unmodified from the Netatalk wiki, although I have dropped the "tracker" package as it a) appears to require a package that depends on "gnome" to run, and b) is only used for spotlight, which we don't want for Time Machine (and in our office, we choose not to use for our other shares too). I've also set the install folders to be a bit more Linux and a bit less BSD. Run the below as root:
apt --yes install build-essential libevent-dev libssl-dev libgcrypt-dev libkrb5-dev libpam0g-dev libwrap0-dev libdb-dev libtdb-dev libmysqlclient-dev avahi-daemon libavahi-client-dev libacl1-dev libldap2-dev libcrack2-dev systemtap-sdt-dev libdbus-1-dev libdbus-glib-1-dev libglib2.0-dev libio-socket-inet6-perl wget wget https://downloads.sourceforge.net/project/netatalk/netatalk/3.1.11/netatalk-3.1.11.tar.bz2 tar jxvf netatalk-3.1.11.tar.bz2 cd netatalk-3.1.11 ./configure --with-init-style=debian-systemd --without-libevent --without-tdb --with-cracklib --enable-krbV-uam --with-pam-confdir=/etc/pam.d --with-dbus-daemon=/usr/bin/dbus-daemon --with-dbus-sysconf-dir=/etc/dbus-1/system.d --sysconfdir=/etc --localstatedir=/var --prefix=/usr make make install systemctl enable avahi-daemon systemctl enable netatalk
That's it. Configuration of Netatalk 3.x is a breeze compared to 2.x, there's only
one file which (thanks
to the flags you passed into configure, above) is at /etc/afp.conf
. Here's what mine looks
like:
[Global] spotlight = no [Time Machine] path = /home/timemachine/$u time machine = yes
That's really all there is to it. Ensure you have accounts for any users that will be using this for backup, ensure they have passwords set, ensure the UIDs are the same as that for the host system and ensure that they own their respective "/home/timemachine/$USER" folder.
The $u
in the configuration file is expanded to be the name of the current user,
which means that "mike" will get the folder /home/timemachine/mike
as a Time Machine
destination. This way each user only has access to their own backups, and they get
their own ZFS quotas
too.
There is one more critical step required due to a bug which exists in the current release of Netatalk. Given I only filed this report about two hours ago it's not fixed yet, but it might be by the time you read this. Until then, you'll need a simple workaround.
mkdir '/home/timemachine/$u'
Then run reboot
on your virtual machine and you should be able to back up to the
"Time Machine" share on the "timemachine" host.
Bonus: backup script
The joys of ZFS continue here, as backing this system up is pretty simple. I've got
a large USB disk
which I've formatted as EXT2 and given the label "backup" (actually I have two with
this label, which
I rotate). With this script in /etc/cron.daily
all the shares that don't have the
user property "backup:skip" set are backed up to that drive every night.
That means I can set that property by running zfs set backup:skip=true tank/NNN
on any filesystem
to prevent it being backed up. Of course you can reverse this test as you see fit.
#!/bin/sh # Backup all ZFS shares that don't have the "backup:skip" user property # set to true. Backs these up to $DESTDIR, which we assume to be the # set in /etc/fstab as the mountpoint for an external disk. The disk # will be mounted if it is not already, and unmounted at the end. DESTDIR=/backup TMP=/tmp/mnt.$$ mkdir "$TMP" echo `date +'%Y-%m-%d %H:%M:%S'` - Backing up to $DESTDIR zpool status grep -q " $DESTDIR " /proc/mounts || mount "$DESTDIR" grep -q " $DESTDIR " /proc/mounts || { echo "$DESTDIR not mounted" ; exit 1 ; } ALL=`zfs list -H -obackup:skip,name -tfilesystem | grep -v '^true' | cut -f 1 --complement` for FS in $ALL ; do zfs snapshot "$FS@backup" mount -o ro -t zfs "$FS@backup" "$TMP" mkdir -p "$DESTDIR/$FS" rsync -ax "$TMP/" "$DESTDIR/$FS" echo `date +'%Y-%m-%d %H:%M:%S'` - Completed $FS umount "$TMP" zfs destroy "$FS@backup" done rmdir "$TMP" umount "$DESTDIR" echo `date +'%Y-%m-%d %H:%M:%S'` - Backup complete zpool scrub tank
Notes
/etc/default/grub
as shown:
# GRUB_HIDDEN_TIMEOUT=0 # removed # GRUB_HIDDEN_TIMEOUT_QUIET=true # removed GRUB_TIMEOUT_STYLE=countdown # added GRUB_DISABLE_OS_PROBER=true # addedthen reran update-grub before rebooting. The OS probing can safely be disabled for a single OS system.
lxc.include = /usr/share/lxc/config/ubuntu.common.conf lxc.rootfs.path = /var/lib/lxc/myvm/rootfs lxc.uts.name = myvm lxc.arch = amd64 lxc.mount = /var/lib/lxc/myvm/fstab lxc.network.type = veth lxc.network.link = br0 lxc.network.flags = up lxc.network.hwaddr = 00:00:00:bf:0d:1fI had to change them to
lxc.include = /usr/share/lxc/config/common.conf lxc.rootfs.path = /var/lib/lxc/myvm/rootfs lxc.uts.name = myvm lxc.arch = amd64 lxc.mount.fstab = /var/lib/lxc/myvm/fstab lxc.net.0.type = veth lxc.net.0.link = br0 lxc.net.0.flags = up lxc.net.0.hwaddr = 00:00:00:bf:0d:1f lxc.apparmor.profile = unconfinedI'm sure apparmor security is a good thing in theory, but it prevents me from running any VMs with no explanation of why. It's the security equivalent of lock on my front door that I don't have the key for.