2019-10-08

Still more on SSSD in Bright Cluster Manager - cache file

It has been a few years since I got SSSD to work in Bright Cluster Manager 6, and I just figured out one little thing that has been an annoyance for a few years. There has been a spurious group hanging on: it has the same GID as an existing group, but a different group name.

Since Bright CM 6 did not handle SSSD out of the box, it also did not handle the SSSD cache file. More accurately, it did not ignore the file in the software image, and the grabimage command would grab the image to the provisioning server and then propagate it to nodes in the category.

The fix is simple: add /var/lib/sss/db/* to the various exclude list settings in the category.

To reset the cache:
    service sssd stop
    /bin/rm -f /var/lib/sss/db/cache_default.ldb
    service sssd start

I did try "sss_cache -E" which is supposed to clear the cache, but found that it did not work as I expected: the spurious group still appeared with "getent group".

2019-05-15

New Intel speculative execution vulnerability: "Microarchitectural Data Sampling" (MDS)

Another day, another hardware bug that has security implications. Like the recent Meltdown and Spectre bugs, this new bug called Microarchitectural Data Sampling (MDS) leaks data. Ars Technica has a nice write-up: "MDS attacks perform speculation based on a stale value from one of these [CPU] buffers."

Red Hat's more technical summary, and more detailed video explainers here and here.

2019-05-08

Cray and AMD will build 1.5 exaFLOPS supercomputer for DOE

From Ars Technica: AMD and Cray have announced that they're building "Frontier," a new supercomputer for the Department of Energy at Oak Ridge National Laboratory. The goal is to deliver a system that can perform 1.5 exaflops: 1.5×1018 floating point operations per second.

The current (Nov 2018) Top 500 leader, Summit at ORNL, runs at 0.2 exaflops.

2019-04-18

Notes on building HDF5

HDF5 is a suite of data-centered technology: structures, file formats, APIs, applications.

There are two ways to build HDF5: using the traditional "configure and make", or using CMake.

Using the configure-make method, a few of the tests may fail if run on an NFS filesystem. These tests are use_append_chunk and use_append_mchunks. The test programs first create a file (successfully), and then try to open them for read, where they fail. The error output looks like:

    157778: continue as the writer process
    dataset rank 3, dimensions 0 x 256 x 256
    157778: child process exited with non-zero code (1)
    Error(s) encountered
    HDF5-DIAG: Error detected in HDF5 (1.10.2) thread 0:
      #000: H5F.c line 511 in H5Fopen(): unable to open file
        major: File accessibilty
        minor: Unable to open file
      #001: H5Fint.c line 1604 in H5F_open(): unable to read superblock
        major: File accessibilty
        minor: Read failed
      #002: H5Fsuper.c line 630 in H5F__super_read(): truncated file: eof = 479232, sblock->base_addr = 0, stored_eof = 33559007
        major: File accessibilty
        minor: File has been truncated
    H5Fopen failed
    read_uc_file encountered error


The "Error detected in … thread 0" first led me to think that it was a threading issue. So, I re-configured with thread-safety on, which meant that C++ and Fortran APIs were not built, nor the high-level library. The tests still failed.

However, running the tests (with original config, i.e. without thread-safety but with C++, Fortran, and high-level library) on a local disk resulted in success.

Using CMake to build, all tests pass, even when doing them on an NFS volume.

UPDATE This fact that some tests fail on NFS mounts is documented at the HDF5 downloads page: "Please be aware! On UNIX platforms the HDF5 tests must be run on a local file system or a parallel file system running GPFS or Lustre in order for the SWMR tests to complete properly."

Blogger still pushing Google+


2019-03-19

U.S. DoE unvails $500m supercomputer

From the New York Times: DoE is unveiling a new $500m exascale supercomputer, produced by a collaboration between Intel and Cray. This is part of the computing arms race with China, which has been making big strides in supercomputing in the past decade.

The supercomputer, called Aurora, is a retooling of a development effort first announced in 2015 and is scheduled to be delivered to the Argonne National Laboratory near Chicago in 2021. Lab officials predict it will be the first American machine to reach a milestone called “exascale” performance, surpassing a quintillion calculations per second.

Annoyance with Google+

A few years ago, I migrated Blogger comments here to Google+ comments, so that comments would be linked to commenters' Google+ profiles, and appear in their Google+ feeds. That, unhelpfully, deleted all existing old comments.

Now, the reverse is happening. Since Google+ is going away, all Google+ comments will disappear with it, with no attempt being made to port existing comments to Blogger comments.