Showing posts with label linux. Show all posts
Showing posts with label linux. Show all posts


YubiKey U2F on Ubuntu

Basic walk through of setting up U2F with YubiKey on Ubuntu 23.04 (should work on recent versions, as well). This follows the official documentation closely, removing anything not necessary for my particular setup.

N.B. this is different from challenge response, a different multifactor method. YubiKeys support multiple protocols, U2F and challenge response being two of them.


  • Ubuntu 23.04
  • YubiKey
    • I used the YubiKey 5 series: 5 NFC, 5 C, and 5 Ci. Where necessary, I used an adapter to plug in the USB-C key into an standard USB-A port.
  • Associate YubiKey U2F with your account
    • Creates a line of text in a file containing your username and the 2nd factor string
    • Move the U2F file to a secure location readable only by root
  • Create PAM configs to require U2F for certain authentication operations, e.g. login, sudo

Create two PAM configs. Creating these configs will allow us to include them rather than copying and pasting the same config lines in multiple other PAM configs in /etc/pam.d.

In these configs, we add the “cue” and “interactive” options which will prompt the user to insert the YubiKey and to touch it.

/etc/pam.d/u2f-required will be the configuration to require the YubiKey:

auth required authfile=/etc/yubico/u2f_keys cue interactive

/etc/pam.d/u2f-sufficient will be the configuration which allows using only the YubiKey without a password:

auth sufficient authfile=/etc/yubico/u2f_keys cue interactive

For the initial setup, also add the following to the "auth" lines in the above config files:

debug debug_file=/var/log/pam_u2f.log

Then, create an empty debug log file to start: 

sudo touch /var/log/pam_u2f.log

CAUTION Best to have a root shell active, in case something goes awry, and you cannot sudo anymore:

normaluser$ sudo bash

DO NOT exit this terminal until you are sure at least sudo works. 

Basic idea: in each authentication scenario (i.e. PAM config file) where you want U2F, add the line

@include u2f-required

after the line 

@include common-auth

E.g. require U2F for sudo, modify the files
  • /etc/pam.d/sudo
  • /etc/pam.d/sudo-i
These are the PAM configs I updated in /etc/pam.d:
  • gdm-password -- prompts for YubiKey at GUI login screen
  • login -- prompts for YubiKey at console login
  • polkit-1 -- prompts for YubiKey when running GUI apps requiring sudo, e.g. synaptic
  • su -- prompts for YubiKey for su
  • sudo -- prompts for YubiKey for sudo
  • sudo-i -- prompts for YubiKey for sudo -i
The first one to try should be sudo since it is easy to test. Make the modification, then open a new terminal tab/window, and run a simple sudo command, e.g. "sudo ls -l /tmp". It should prompt you to insert the device, and then to touch it:

normaluser$ sudo ls -l /tmp
[sudo] password for normaluser: 
Insert your U2F device, then press ENTER.
Please touch the device. (The YubiKey should start flashing.)
total xx
[listing of files here]

If that did not work, examine the debug log /var/log/pam_u2f.log Make any adjustments, close out that sudo terminal tab/window, and launch a new one.

Once you are satisfied that everything works, you can remove the “debug debug_file=/var/log/pam_u2f.log” from /etc/pam.d/u2f_required and /etc/pam.d/u2f_sufficient

Minor annoyance: the GUI popup dialog for sudo authentication won’t accept just ENTER when it says “Insert your U2F device, then press ENTER”: you have to type in at least a SPACE for it to register that you have acknowledged the prompt, and are ready to touch the YubiKey.


RPM generation error “Missing build-id”

I came across this error while trying to build an RPM for apptainer on my system, which is older than what is supported by the authors. The compilation process seems to complete successfully, but at the end, no RPM is generated. These messages are shown:
    error: Missing build-id in /home/swbuild/rpmbuild/BUILDROOT/apptainer-1.1.3-1.el8.x86_64/usr/libexec/apptainer/bin/squashfuse_ll
    error: Generating build-id links failed

    RPM build errors:
        Macro expanded in comment on line 97: %{?el7}

        Missing build-id in /home/swbuild/rpmbuild/BUILDROOT/apptainer-1.1.3-1.el8.x86_64/usr/libexec/apptainer/bin/squashfuse_ll
        Generating build-id links failed
This is because older versions of GCC do not include the build-id by default. The fix is to add LDFLAGS="-Wl,--build-id" to the configure line in the file, e.g.
    FLAGS=-std=c99 LDFLAGS="-Wl,--build-id" ./configure --enable-multithreading


AlphaFold 2 on Singularity + Slurm

Deep Mind’s AlphaFold has been making waves recently. It is a new solution to a 50-year old Grand Challenge, to figure out how proteins fold. From Deep Mind’s blog:

Proteins are essential to life, supporting practically all its functions. They are large complex molecules, made up of chains of amino acids, and what a protein does largely depends on its unique 3D structure. Figuring out what shapes proteins fold into is known as the “protein folding problem”, and has stood as a grand challenge in biology for the past 50 years. In a major scientific advance, the latest version of our AI system AlphaFold has been recognised as a solution to this grand challenge by the organisers of the biennial Critical Assessment of protein Structure Prediction (CASP). This breakthrough demonstrates the impact AI can have on scientific discovery and its potential to dramatically accelerate progress in some of the most fundamental fields that explain and shape our world. 

From the journal Nature

DeepMind’s program, called AlphaFold, outperformed around 100 other teams in a biennial protein-structure prediction challenge called CASP, short for Critical Assessment of Structure Prediction. The results were announced on 30 November, at the start of the conference — held virtually this year — that takes stock of the exercise.

“This is a big deal,” says John Moult, a computational biologist at the University of Maryland in College Park, who co-founded CASP in 1994 to improve computational methods for accurately predicting protein structures. “In some sense the problem is solved.”

The full paper is published in Nature: Jumper, J., Evans, R., Pritzel, A. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).

Deep Mind has released the source on GitHub, including instructions for building a Docker container to run AlphaFold. 

One of the faculty at Drexel, where I work, requested AlphaFold be installed. However, in HPC, it is more common to use Singularity containers rather than Docker, as Singularity does not require a daemon nor root privileges. I was able to modify the Dockerfile and the Python wrapper to work with Singularity. Additionally, I added some integration with Slurm, querying the Slurm job environment for the available GPU devices, and the scratch/tmp directory for output. My fork was on GitHub, but since my pull request for the Singularity stuff was not accepted, I have split off the Singularity- and Slurm-specific stuff into its own repo.

UPDATE 2022-08-05 The Singularity code has been updated for AlphaFold 2.2.2

UPDATE 2022-10-02 Updated for AlphaFold 2.2.4; Singularity image now hosted on Sylabs Cloud. See new post.


Podman and Shorewall

Bright Cluster Manager uses Shorewall to manage the various firewall rules on the head/management node. By default, this seems to prevent Podman and Docker from working right.

I am working through a simple example of running a pod with PostgreSQL and PGAdmin but the connection to the host port that forwards to the pgadmin container seemed to be blocked. Connection attempts using both curl and web browsers would hang.

There is additional configuration that needs to be done for Shorewall to work with Podman. Shorewall has instructions on making it work with Docker, and it seems to work for podman with minor modifications.

First, modify the systemd service to not clear firewall rules on service stop. Do:

sudo systemctl edit shorewall.service

which gives a blank file. Add these contents:


# reset ExecStop


# set ExecStop to "stop" instead of "clear"

ExecStop=/sbin/shorewall $OPTIONS stop

Then activate the changes with

sudo systemctl daemon-reload

Next, we need to know the name of the Podman network interface. Use “ip link list” to see it. On my RHEL 8 system, the interface is 

10: cni-podman0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000

    link/ether xx:xx:xx:xx:xx:xx brd ff:ff:ff:ff:ff:ff

And make the following modifications to the appropriate config files.

Enable Docker mode in /etc/shorewall/shorewall.conf:


Define a zone for Podman in /etc/shorewall/zones:


pod     ipv4    # 'pod' is just an example -- call it anything you like

Define policies for this zone in /etc/shorewall/policy:

#SOURCE        DEST        POLICY        LEVEL 

pod            $FW          REJECT

pod            all          ACCEPT

And match the zone to the interface in /etc/shorewall/interfaces:

# Need to specify "?FORMAT 2" 



pod    cni-podman0  bridge   # Allow ICC (inter-container communication); bridge implies routeback=1

Then, restart shorewall. And start the pod; or restart if it was already running.

You many need additional rules to allow an external host to connect into the pod. E.g. a pod containing a pgadmin container and a postgresql container, where the pgadmin container serves on port 80. Say your administrative hosts will be in the address block Then, add the following to /etc/shorewall/rules:

# Accept connections from admin hosts to the pgadmin container


#                                            PORT(S)

ACCEPT    net:    pod    tcp     80


OpenLDAP local root access to OLC cn=config database

If, like me, you converted your OpenLDAP server installation from slapd.conf to OLC (On-Line Configuration), aka cn=config, you may find that local root privileges to modify your config are not configured; i.e. doing the following will fail:

ldapmodify -Y EXTERNAL -H ldapi:/// -f some_changes.ldif

This is because the olcRootDN for the cn=config database is probably not set up right. Mine looked something like:

dn: olcDatabase={0}config,cn=config

objectClass: olcDatabaseConfig

olcDatabase: {0}config

olcAccess: {0}to *  by * none

olcRootDN: cn=root,dc=example,dc=com


It may or may not also have an olcRootPW (root password) set.

You can query you you appear to be to the LDAP server by using ldapwhoami and specifying the SASL mechanism (-Y) and the LDAP URI (-H):

#  ldapwhoami -Y EXTERNAL -H ldapi:///

SASL/EXTERNAL authentication started

SASL username: gidNumber=0+uidNumber=0,cn=peercred,cn=external,cn=auth



In this case, the EXTERNAL mechanism is the Linux IPC (Inter-Process Communication), which gets the UID and GID of the client process. This is communicated via the domain socket transport (ldapi:).

The fix is straightforward. First, create a file to replace the olcRootDN field:

# replace_olcrootdn.ldif

dn: olcDatabase={0}config,cn=config

replace: olcRootDN

olcRootDN: gidNumber=0+uidNumber=0,cn=peercred,cn=external,cn=auth

If you have an olcRootPW field, add another operation to delete: it. Then, apply the changes:

# ldapmodify -D cn=root,dc=example,dc=cluster -w somepassword -H ldapi:/// -f replace_olcrootdn.ldif

And that should do it. From now on, you should be able to modify the OLC with “-Y EXTERNAL -H ldapi:///” if you are root. 

This post is expanded from this answer at Server Fault.


RHEL7 on VirtualBox graphics controller

I have a RHEL7 VM on VirtualBox 6.0 which crashed right after a full update to RHEL 7.7, including kernel. Just typed "startx" (I usually make VMs boot into console), and it crashed immediately with a "Guru Meditation" from VirtualBox.

Taking a stab, I changed the Graphics Controller for the VM from VMSVGA to VBoxVGA, and it launched the GUI just fine.


U2F USB key (Yubikey) for 2-factor authentication and Linux authentication

I just bought a pair of Yubikey U2F (Universal 2-Factor) devices (the 5 NFC model, because of the claims that it would work with iPhones). Mostly because I got tired of pulling out my phone, finding the authenticator app, searching for the entry for the appropriate website, and then typing in the number.

I'll get to the iPhone stuff at the end.

But first, using the Yubikey for the second factor works for only a few websites. Also, it depends on your web browser: I tested Chrome (on Linux, macOS, and Chromebook), and Firefox (on Linux and macOS). Chrome and Firefox can deal with reading a U2F key via USB just fine.

Yubico has clear instructions for how to set the keys up:

Among sites which accept U2F hardware keys are Facebook, Google, GitHub, GitLab,  Dropbox, and Twitter (though Twitter does not support multiple U2F keys, which sucks if you lose a key). You browse to the site as usual, type in your password, and it will prompt you to plug in your U2F key and tap the flashing bit with a gold contact sensor, and you're in.

For using the Yubikey as a U2F in Linux, to authenticate for logging in, unlocking the screensaver, and sudo, you will have to install Yubico's U2F PAM module: There is more detailed documentation geared towards developers here: The PAM module works fine in Ubuntu 19.10 Eoan Ermine.

And it works great: plug the Yubikey in first, type in your password and hit Enter, the key starts flashing, touch the flashing bit, and you are in.

On the downside, I would not use this on a server where you need to do management remotely, since you would not be able to plug in a U2F key on an SSH connection.

As for using NFC and iOS: it does not work like I expected it to, nor how the Yubico website led me to expect. If you tap the Yubikey to the iPhone, it will pop up an alert, which if you tap, will open Safari on a "validation" web page hosted at

The websites which work within Chrome and Firefox on a computer (Google, GitHub, etc) do not seem to have a way to read the Yubikey via NFC on iPhone. There is a Lightning + USB-C key (the 5 Ci) but it's expensive ($70 ea.) and I do not know for sure if it will work, since the Google and GitHub mobile websites viewed on iPhone and Android do not even present the option for using U2F keys.

So, at this point, I feel I should have just bought the cheaper non-NFC, and I would have been at the same point.

UPDATE 1 If you use KeePassXC for storing passwords, it can be configured to require a YubiKey. This uses the "challenge-response" feature, which has to be manually set up using the YubiKey Personalization Tool (also available at GitHub). Yubico has a video walkthrough here:


OpenLDAP annoyance

The OpenLDAP 2.4 administrator's manual is missing a section:

18.1. Monitor configuration via cn=config(5)
This section has yet to be written.


Migrating LDAP server to a different machine and changing to OLC and from bdb to mdb (lmdb)

At our site, we have LDAP (openldap 2.4) running on one server. It uses the old slapd.conf configuration, and the Berkeley DB (bdb) backend.

As part of planning for the future, I want to move this LDAP server to a different machine. I wanted to also migrate to using on-line configuration (OLC), where the static slapd.conf file is replaced with the cn=config online LDAP "directory". This allows configuration changes to be made at runtime without restarting slapd.

I also wanted to change from using the Berkeley DB backend to the Lightning Memory-Mapped DB (LMDB; known as just "mdb" in the configs). LMDB is what OpenLDAP recommends as it is quick (everything in memory) and easier to manage (fewer tuning options, no db files to mess with). From here, I will refer to this as "mdb" per the slapd.conf line.

After doing the migration once, leaving the backend as bdb, I found out it was easier to do all three things at once: migrate to a different server, convert from slapd.conf to OLC, and change backend to mdb.

This is a multi-stage process, but nothing too strenuous.
  1. Dump the directory data to an LDIF:  slapcat -n 1 -l n1.ldif
  2. Copy n1.ldif to new machine
  3. Copy slapd.conf to new machine
  4. Edit new slapd.conf on new machine: change the line "database bdb" to "database mdb"
    1. Remove any bdb-specific options: idletimeout, cachesize
  5. Import n1.ldif: slapadd -f /etc/openldap/slapd.conf -l n1.ldif
  6. Convert slapd.conf to OLC: slaptest -f slapd.conf -F slapd.d ; chown -R ldap:ldap slapd.d
  7. Move slapd.conf out of the way: mv /etc/openldap/slapd.conf /etc/openldap/slapd.conf.old
Other complications:
  • You will probably need to generate a new SSL certificate for the new server
  • That may mean signing the new cert with your own (or established) CA
    • Or, you can set all your client nodes to not require the cert: in /etc/openldap/ldap.conf add “TLS_REQCERT never”. Fix up the /etc/sssd/sssd.conf file similarly: add ldap_tls_reqcert = never
  • Fix up your /etc/sssd/sssd.conf
Note that it is pretty easy to back things out and start from scratch. To restore the new server to a "blank slate" condition, delete everything in /etc/openldap/slapd.d/

     service slapd stop
     cd /etc/openldap/slapd.d
     rm -rf *

CAUTION: This process seems to create the n0 db with an olcRootDN of “cn=config” where it should be “cn=admin,dc=example,dc=com” (or whatever your LDAP rootDN should be; for Bright-configured clusters, that would be “cn=root,dc=cm,dc=cluster). I.e. you need to have:

dn: olcDatabase={0}config,cn=config
olcRootDN: cn=root,dc=cm,dc=cluster

but for olcRootDN, you have cn=config, instead.

To fix it, I dumped n0 and n1, deleted /etc/openldap/slapd.d, and “restored” from the dumped n0 and n1. Basically, emulating a restore from backup.

  • Dump {0} to n0.ldif
  • Shut down slapd
  • Modify n0.ldif to have the needed olcRootDN (as above)
  • Move away the old /etc/openldap/slapd.d/ directory: mv /etc/openldap/slapd.d /etc/openldap/slapd.d.BAK
  • Create a new slapd.d directory: mkdir /etc/openldap/slapd.d 
  • Add the dumped n0.ldif as the new config: slapadd -n 0 -F /etc/openldap/slapd.d -l n0.ldif
  • Fix permissions: chown -R ldap:ldap /etc/openldap/slapd.d ; chmod 700 /etc/openldap/slapd.d

These are the websites I found useful in figuring things out:


Notes on building HDF5

HDF5 is a suite of data-centered technology: structures, file formats, APIs, applications.

There are two ways to build HDF5: using the traditional "configure and make", or using CMake.

Using the configure-make method, a few of the tests may fail if run on an NFS filesystem. These tests are use_append_chunk and use_append_mchunks. The test programs first create a file (successfully), and then try to open them for read, where they fail. The error output looks like:

    157778: continue as the writer process
    dataset rank 3, dimensions 0 x 256 x 256
    157778: child process exited with non-zero code (1)
    Error(s) encountered
    HDF5-DIAG: Error detected in HDF5 (1.10.2) thread 0:
      #000: H5F.c line 511 in H5Fopen(): unable to open file
        major: File accessibilty
        minor: Unable to open file
      #001: H5Fint.c line 1604 in H5F_open(): unable to read superblock
        major: File accessibilty
        minor: Read failed
      #002: H5Fsuper.c line 630 in H5F__super_read(): truncated file: eof = 479232, sblock->base_addr = 0, stored_eof = 33559007
        major: File accessibilty
        minor: File has been truncated
    H5Fopen failed
    read_uc_file encountered error

The "Error detected in … thread 0" first led me to think that it was a threading issue. So, I re-configured with thread-safety on, which meant that C++ and Fortran APIs were not built, nor the high-level library. The tests still failed.

However, running the tests (with original config, i.e. without thread-safety but with C++, Fortran, and high-level library) on a local disk resulted in success.

Using CMake to build, all tests pass, even when doing them on an NFS volume.

UPDATE This fact that some tests fail on NFS mounts is documented at the HDF5 downloads page: "Please be aware! On UNIX platforms the HDF5 tests must be run on a local file system or a parallel file system running GPFS or Lustre in order for the SWMR tests to complete properly."


Facebook's new suite of open source Linux kernel components and tools

Facebook has just announced a bunch of useful Linux kernel components and tools; specifically, useful for shared servers, which may include HPC servers. I could see oomd and cgroup2, in particular, being useful. Oomd takes out-of-memory handling into userspace and tries to take corrective action before an OOM occurs in kernel. Cgroup2 seems to be an successor to cgroups, which allows controlling the amount of system resources assigned to groups of workloads.


XFS group and project quotas

XFS supports a quota system, that can handle quotas by user, group, or project. The project quota system is meant for setting quotas on directory hierarchies, e.g. you want to set quotas on users' home directories, but allow them a different quota in some shared directory.

What the documentation and the man page for xfs_quota(8) do not mention is that group quotas and project quotas are mutually exclusive. I.e. if you turn on group quotas on filesystem, you cannot turn on project quotas, and vice versa.

Here are my simple scripts for setting up XFS project quotas to set quotas on users' home directories:


Docker containers for high performance computing

I have started to see more science application authors/groups provide their applications as Docker images. Having only a passing acquaintance with Docker (and with containers in general), I found this article at The New Stack useful: Containers for High Performance Computing (Joab Jackson).

The vendors of the tools I use -- Bright Cluster Manager, Univa Grid Engine -- have incorporated support for containers. It is good to read some independent information about the role of containers in HPC.

Christian Kniep of Docker points out some issues with the interaction of HPC and Docker, and came up with a preliminary solution (a proxy for Docker Engine) to address the issues. HPC commonly makes use of specific hardware (e.g. GPUs, InfiniBand), and this is counter to Docker's hardware-agnostic approach. Also, HPC workflows may rely on shared resources (e.g. i/o to a shared filesystem).


Python multiprocessing ignores cgroups

At work, I noticed a Python job that was causing a large overload condition. The job requested 16 CPU cores, and was running gensim.models.ldamulticore requesting 15 workers. However, the load on that server indicated an overload of many times over. This is despite a cgroups cpuset restricting the number of cores for that process to 16.

It turns out, gensim.models.ldamulticore uses Python multiprocessing. That module decides how many threads to run based on the number of CPU cores read directly from /proc/cpuinfo. This completely bypasses the limitations imposed by cgroups.

There is currently an open enhancement request to add a new function to multiprocessing for requesting the number of usable CPU cores rather than the total number of CPU cores.


Setting up a software RAID with sgdisk and mdadm

I wanted to set up a RAID0 (striped array) on two HDDs to servce as cache for Duplicity backups. And I wanted to use GPT and only command line tools: sgdisk(8) for partitioning, and mdadm(8) for creating the software RAID. (I have usually just used Gparted, a GUI partitioning tool.)

All of this was done on Red Hat Enterprise Linux 6.5.

So, I have two (spinning disc) HDDs, each 931 GB, mounted as
  • /dev/sda
  • /dev/sdb
First, zap any partitioning information they may have. (In all the examples below, "X" should be replaced by "a" or "b".)

# sgdisk -Z /dev/sdX

Next, partition them. The partitions have to be of type 0xFD00 "Linux RAID". You can do "sgdisk -L" to see a list of all available types. These type codes are not the same as the type codes used by fdisk(8).

The partitions will be 512 GB, leaving some for other uses.

# sgdisk -n 0:0:+512G -c 0:"cache" -t 0:0xFD00 /dev/sdX
# sgdisk -n 0:0:0 -c 0:"misc" /dev/sdX

The "0" first digit of the argument to "-n", "-c", and "-t" is shorthand for the first available partition number. In this case, the first line would be "1" and the second line would be "2". (N.B. this is automatically set; "1" and "2" do not need to be used in the commandline.)

In the second line, note that "-n 0:0:0" uses the default of starting at the first unallocated sector, and ending at the last allocateable sector on the drive, thereby using up the rest of the HDD for the "misc" partition 2.  Leaving out the type specification, "-t", gives the default 0x8300 "Linux filesystem."

Print out the partition info to check:

# sgdisk -p /dev/sdX
Disk /dev/sdX: 1953525168 sectors, 931.5 GiB
Logical sector size: 512 bytes
Disk identifier (GUID): xxxxxxxxxxxxxxx-xxx-xxxxxxxxxxxxxxxx
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 1953525134
Partitions will be aligned on 2048-sector boundaries
Total free space is 2014 sectors (1007.0 KiB)

Number  Start (sector)    End (sector)  Size       Code  Name
   1            2048      1073743871   512.0 GiB   FD00  cache
   2      1073743872      1953525134   419.5 GiB   8300  misc

And we see that it has done what we expected.

Next, we create the RAID0 from /dev/sda1 and /dev/sdb1:

# mdadm -v -C /dev/md/mycomputer:cache -l stripe -n 2 /dev/sda1 /dev/sdb1

This creates a device /dev/md127 with a symbolic link for the readable name:

lrwxrwxrwx 1 root root 8 2017-10-18 18:11:31 -0400 /dev/md/mycomputer:cache -> ../md127

In mdadm v3.3.4 in RHEL6, I found that the name given to the "-C/--create" option would always end up being "mycomputer:something", where "something" was determined by what you actually give it. This name comes up after reboot.

The "-v" is for verbose output, "-l" is for the RAID level (which can be specified by integer, or string), "-n" is the number of devices, and the positional arguments are a list of the devices to be used.

Also, the integer N in /dev/mdN is determined by the system. It seems to start with 127.

For instance, doing

# mdadm -C /dev/md0

after rebooting gave this:

lrwxrwxrwx 1 root root 8 2017-10-18 18:11:31 -0400 /dev/md/mycomputer:0 -> ../md127 

And doing

# mdadm -C /dev/md/cache 


lrwxrwxrwx 1 root root 8 2017-10-18 18:11:31 -0400 /dev/md/mycomputer:cache -> ../md127

I wised up on my third time through, and named it what it was going to pick, anyway.

The RAID needs to be "assembled" and activated at boot time. This is not done by default. To do this, a file /etc/mdadm.conf must be created. (Other distros may have a different location for this file.)

Assuming there is no such file, start by using mdadm(8) to output the array specification to the file:

# mdadm -Ds /dev/md/mycomputer:cache > /etc/mdadm.conf
# cat /etc/mdadm.conf
ARRAY /dev/md/mycomputer:cache metadata=1.2 name=mycomputer:cache UUID=xxxxxxxx

Very important: this UUID will not be the same as the UUID of the filesystem we will create later.

Add DEVICE, MAILADDR, and AUTO lines to /etc/mdadm.conf, resulting in:

DEVICE /dev/sda1 /dev/sdb1
AUTO +all

ARRAY /dev/md/mycomputer:cache metadata=1.2 name=mycomputer:cache UUID=xxxxxxxx

I did the next bit in single-user mode as I wanted this to be mounted as /var/cache, which is also used by several other things. Also, since it gets tiresome writing out the whole device name, I used the short name /dev/md127.

# telinit 1
# mkfs.ext4 /dev/md127

Next, I mounted the device in a temporary location to transfer the existing contents:

# mkdir /mnt/tmpmnt
# mount /dev/md127 /mnt/tmpmnt
# cd /var/cache
# tar cf - * | ( cd /mnt/tmpmnt ; tar xvf - )

Get the UUID of this new filesystem for use in /etc/fstab:

# blkid /dev/md127

And create an entry in /etc/fstab:

UUID=yyyyyyyyy-yyyyyyyy-yyyyyyyyyy-yyyyyyyyy  /var/cache  ext4   defaults    0 2

And reboot!


Why kernel development still uses email

Good post on why email is ideal for kernel development process:

As Rusty Russell once said, if you want to get smarter, the thing to do is to hang out with smart people. An email-based workflow lets developers hang out with a project's smart people, making them all smarter. Greg wants Linux to last a long time, so wants to see the kernel project use tools that help to bring in new developers. Email, for all its flaws, is still better than anything else in that regard.


Shorewall setup for VirtualBox host-only interface

VirtualBox has a networking mode called "host-only" which allows guests to communicate with each other, and the host to communicate with the guests.

To do this, a host-only network (interface) must be defined on the host. It can be done via GUI:

or via the commandline (needs sudo because this creates a new network interface on the host):

$ sudo vboxmanage hostonlyif create

This creates a host-only virtual interface on the host, named vboxnetN (N starts at 0 and increments for each new one):

$ ip addr list
12: vboxnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether ...
    inet brd scope global vboxnet0
    inet6 fe80::800:27ff:fe00:0/64 scope link 
       valid_lft forever preferred_lft forever

There are three things to do in Shorewall: define a zone, place the host-only interface into that zone, and write a rule.

In /etc/shorewall/zones define the new zone:

# /etc/shorewall/zones
#ZONE    TYPE   OPTIONS    IN                OUT
#                          OPTIONS           OPTIONS
vh       ipv4

In /etc/shorewall/interfaces put the host-only interface vboxnet0 in that zone:

# /etc/shorewall/interfaces
vh       vboxnet0       detect       dhcp

And finally, in /etc/shorewall/rules allow all traffic in the vh zone:

# /etc/shorewall/rules
ACCEPT    vh:    fw    all

On the guest, create a new adapter, and either use DHCP or assign it a static IP in (excluding, which is the host's IP address).  Attach the adapter to the Host-only Adapter:

Or use the command line:

$ vboxmanage modifyvm myguest --nic2 hostonly

Restart the shorewall service, and that should do it. Test it out by ssh'ing into the guest from the host.


Proposed fix for duplicity Azure backend breakage

At work, I just got a Microsoft Azure Cool Blob Storage allocation for doing off-site backups. The Python-based duplicity software is supposed to be able to use Azure Blob storage as a backend. It does this by using the azure-storage Python module provided by Microsoft.

Unfortunately, a recent update of azure-storage broke duplicity. The fix was not to hard to implement; mostly minor changes in class names, and one simplification in querying blob properties. It took me a few hours to make a fix, and I just submitted my changes as a merge request to duplicityThe proposed merge can be found at Launchpad.

UPDATE Unfortunately, I made a mistake and made my changes against the 0.7.14 release rather than trunk. It looks like there is already a lot of work in trunk to deal with the current azure-storage version.  So, I withdrew the merge request. I'll work from the 0.8 series branch, instead. Currently, it looks like 0.8 all works as is.


Linux daemon using Python daemon with PID file and logging

The python-daemon package (PyPI listing, Pagure repo) is very useful. However, I feel it has suffered a bit from sparse documentation, and the inclusion of a "runner" example, which is in the process of being deprecated as of 2 weeks ago (2016-10-26).

There are several questions about it on StackOverflow, going back a few years:  2009, 20112012, and 2015. Some refer to the included as an example, which is being deprecated.

So, I decided to figure it out myself. I wanted to use the PID lockfile mechanism provided by python-daemon, and also the Python logging module. The inline documentation for python-daemon mention the files_preserve parameter, a list of file handles which should be held open when the daemon process is forked off. However, there wasn't an explicit example, and one StackOverflow solution for logging under python-daemon mentions that the file handle for logging objects may not be obvious:

  • for a StreamHandler, it's logging.root.handlers[0].stream.fileno()
  • for a SyslogHandler, it's logging.root.handlers[1].socket.fileno()

After a bunch of experiments, I think I have sorted it out to my own satisfaction. My example code is in GitHub: prehensilecode/python-daemon-example. It also has a SysV init script. 

The daemon itself is straigtforward, doing nothing but logging timestamps to the logfile. The full code is pasted here:

#!/usr/bin/env python3.5
import sys
import os
import time
import argparse
import logging
import daemon
from daemon import pidfile

debug_p = False

def do_something(logf):
    ### This does the "work" of the daemon

    logger = logging.getLogger('eg_daemon')

    fh = logging.FileHandler(logf)

    formatstr = '%(asctime)s - %(name)s - %(levelname)s - %(message)s'
    formatter = logging.Formatter(formatstr)



    while True:
        logger.debug("this is an DEBUG message")"this is an INFO message")
        logger.error("this is an ERROR message")

def start_daemon(pidf, logf):
    ### This launches the daemon in its context

    global debug_p

    if debug_p:
        print("eg_daemon: entered run()")
        print("eg_daemon: pidf = {}    logf = {}".format(pidf, logf))
        print("eg_daemon: about to start daemonization")

    ### XXX pidfile is a context
    with daemon.DaemonContext(
        ) as context:

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Example daemon in Python")
    parser.add_argument('-p', '--pid-file', default='/var/run/')
    parser.add_argument('-l', '--log-file', default='/var/log/eg_daemon.log')

    args = parser.parse_args()
    start_daemon(pidf=args.pid_file, logf=args.log_file)