Linux Follies

2019-04-18

Notes on building HDF5

HDF5 is a suite of data-centered technology: structures, file formats, APIs, applications.

There are two ways to build HDF5: using the traditional "configure and make", or using CMake.

Using the configure-make method, a few of the tests may fail if run on an NFS filesystem. These tests are use_append_chunk and use_append_mchunks. The test programs first create a file (successfully), and then try to open them for read, where they fail. The error output looks like:

157778: continue as the writer process dataset rank 3, dimensions 0 x 256 x 256 157778: child process exited with non-zero code (1) Error(s) encountered HDF5-DIAG: Error detected in HDF5 (1.10.2) thread 0: #000: H5F.c line 511 in H5Fopen(): unable to open file major: File accessibilty minor: Unable to open file #001: H5Fint.c line 1604 in H5F_open(): unable to read superblock major: File accessibilty minor: Read failed #002: H5Fsuper.c line 630 in H5F__super_read(): truncated file: eof = 479232, sblock->base_addr = 0, stored_eof = 33559007 major: File accessibilty minor: File has been truncated H5Fopen failed read_uc_file encountered error

The "Error detected in … thread 0" first led me to think that it was a threading issue. So, I re-configured with thread-safety on, which meant that C++ and Fortran APIs were not built, nor the high-level library. The tests still failed.

However, running the tests (with original config, i.e. without thread-safety but with C++, Fortran, and high-level library) on a local disk resulted in success.

Using CMake to build, all tests pass, even when doing them on an NFS volume.

UPDATE This fact that some tests fail on NFS mounts is documented at the HDF5 downloads page: "Please be aware! On UNIX platforms the HDF5 tests must be run on a local file system or a parallel file system running GPFS or Lustre in order for the SWMR tests to complete properly."

Blogger still pushing Google+

2019-03-19

U.S. DoE unvails $500m supercomputer

From the New York Times: DoE is unveiling a new $500m exascale supercomputer, produced by a collaboration between Intel and Cray. This is part of the computing arms race with China, which has been making big strides in supercomputing in the past decade.

The supercomputer, called Aurora, is a retooling of a development effort first announced in 2015 and is scheduled to be delivered to the Argonne National Laboratory near Chicago in 2021. Lab officials predict it will be the first American machine to reach a milestone called “exascale” performance, surpassing a quintillion calculations per second.

Annoyance with Google+

A few years ago, I migrated Blogger comments here to Google+ comments, so that comments would be linked to commenters' Google+ profiles, and appear in their Google+ feeds. That, unhelpfully, deleted all existing old comments.

Now, the reverse is happening. Since Google+ is going away, all Google+ comments will disappear with it, with no attempt being made to port existing comments to Blogger comments.

2018-10-31

Facebook's new suite of open source Linux kernel components and tools

Facebook has just announced a bunch of useful Linux kernel components and tools; specifically, useful for shared servers, which may include HPC servers. I could see oomd and cgroup2, in particular, being useful. Oomd takes out-of-memory handling into userspace and tries to take corrective action before an OOM occurs in kernel. Cgroup2 seems to be an successor to cgroups, which allows controlling the amount of system resources assigned to groups of workloads.

2018-08-31

XFS group and project quotas

XFS supports a quota system, that can handle quotas by user, group, or project. The project quota system is meant for setting quotas on directory hierarchies, e.g. you want to set quotas on users' home directories, but allow them a different quota in some shared directory.

What the documentation and the man page for xfs_quota(8) do not mention is that group quotas and project quotas are mutually exclusive. I.e. if you turn on group quotas on filesystem, you cannot turn on project quotas, and vice versa.

Here are my simple scripts for setting up XFS project quotas to set quotas on users' home directories: https://github.com/prehensilecode/xfs-project-quotas

2018-07-03

Docker containers for high performance computing

I have started to see more science application authors/groups provide their applications as Docker images. Having only a passing acquaintance with Docker (and with containers in general), I found this article at The New Stack useful: Containers for High Performance Computing (Joab Jackson).

The vendors of the tools I use -- Bright Cluster Manager, Univa Grid Engine -- have incorporated support for containers. It is good to read some independent information about the role of containers in HPC.

Christian Kniep of Docker points out some issues with the interaction of HPC and Docker, and came up with a preliminary solution (a proxy for Docker Engine) to address the issues. HPC commonly makes use of specific hardware (e.g. GPUs, InfiniBand), and this is counter to Docker's hardware-agnostic approach. Also, HPC workflows may rely on shared resources (e.g. i/o to a shared filesystem).