Showing posts with label hpc. Show all posts
Showing posts with label hpc. Show all posts


Slurm associations

An association is a tuple of (cluster, account, user, partition). The partition may or may not be specified, so an association could just be (cluster, account, user).

To delete the association, one can do:

sacctmgr delete user where name=userfoo account=accountbar

Reference: this post by Samuel Fulcomer in slurm-users.


Linus Tech Tips takes a look at the Nvidia Grace CPU and the Hopper GPU

Nvidia has a new ARM-based CPU which they announced some time ago. Here, Linus Tech Tips takes a look at it at COMPUTEX Taipei 2023. The design is similar to Apple silicon, where CPU and memory are on the same chip. Nvidia does split out the GPU, connected via Nvlink.



In a chat recently, I heard that computational fluid dynamics (CFD) can’t take advantage of GPUs. That seemed a bit doubtful to me, so I looked it up. Seems like there has been some work recently that showed how use of GPUs greatly accelerate CFD workloads.

This press release on OpenACC’s website talks about how a private company (AeroDynamic Solutions, Inc. (ADSCFD)) used OpenACC to give their proprietary CFD solver Code LEO GPU capabilities, with very good speedup.

By using OpenACC to GPU-accelerate their commercial flow solver, ADSCFD achieved significant value. They realized dramatically improved performance across multiple use cases with speed-ups ranging from 20 to 300 times, reductions in cost to solution of up to 70%, and access to analyses that were once deemed infeasible to instead being achieved within a typical design cycle.

Similar blog posts from Nvidia and ANSYS+Nvidia last year also show significant speedups (between 12x and 33x) and significant power consumption savings, as well.

Nvidia’s blog post show results from a beta version of ANSYS Fluent and Simcenter STAR-CCM+. 

Figure 2 shows the performance of the first release of Simcenter STAR-CCM+ 2022.1 against commonly available CPU-only servers. For the tested benchmark, an NVIDIA GPU-equipped server delivers results almost 20x faster than over 100 cores of CPU.

The performance of the Ansys Fluent 2022 beta1 server compared to CPU-only servers shows that Intel Xeon, AMD Rome, and AMD Milan had ~1.1x speedups compared to the NVIDIA A100 PCIe 80GB, which had speedups from 5.2x (one GPU) to an impressive 33x (eight GPUs). 

ANSYS’s blog post covers the same result as Nvidia, showing 33x speedup using 8 A100 GPUs. They also do a cost comparison of equal-speed clusters, one using GPUs and the other purely CPUs:

1 NVIDIA A100 GPU ≈ 272 Intel® Xeon® Gold 6242 Cores

Comparing the older V100 GPUs with Intel® Xeon® Gold 6242, the 6x V100 GPU cluster would cost $71,250 while the equivalent CPU-only cluster would cost $500,000, i.e. about one seventh the price.


Industry Out of Phase With Supercomputers (IEEE Spectrum)

An article in IEEE Spectrum covers a recent report by the National Nuclear Security Administration (NNSA): 

NNSA has developed massive and sophisticated codes that run on supercomputers to verify the continued security and performance of nuclear weapons designed decades ago. Keeping them up to date requires new generations of supercomputers that can run more complex models faster than the months required on today’s machines. But industry, which has shelled out big bucks for state-of-the-art fabs, is targeting big, profitable markets like cloud computing.

Read the article here: 


Singularity recipe and image for AlphaFold

An update to a previous post: I have updated my Singularity recipe for AlphaFold to use AlphaFold 2.2.4. The pre-built Singularity image is also uploaded onto Sylabs Cloud. I will try to keep this updated to track AlphaFold, but Sylabs Cloud has only enough space for one image at a time, so it will always have the latest build only.


AlphaFold 2 on Singularity + Slurm

Deep Mind’s AlphaFold has been making waves recently. It is a new solution to a 50-year old Grand Challenge, to figure out how proteins fold. From Deep Mind’s blog:

Proteins are essential to life, supporting practically all its functions. They are large complex molecules, made up of chains of amino acids, and what a protein does largely depends on its unique 3D structure. Figuring out what shapes proteins fold into is known as the “protein folding problem”, and has stood as a grand challenge in biology for the past 50 years. In a major scientific advance, the latest version of our AI system AlphaFold has been recognised as a solution to this grand challenge by the organisers of the biennial Critical Assessment of protein Structure Prediction (CASP). This breakthrough demonstrates the impact AI can have on scientific discovery and its potential to dramatically accelerate progress in some of the most fundamental fields that explain and shape our world. 

From the journal Nature

DeepMind’s program, called AlphaFold, outperformed around 100 other teams in a biennial protein-structure prediction challenge called CASP, short for Critical Assessment of Structure Prediction. The results were announced on 30 November, at the start of the conference — held virtually this year — that takes stock of the exercise.

“This is a big deal,” says John Moult, a computational biologist at the University of Maryland in College Park, who co-founded CASP in 1994 to improve computational methods for accurately predicting protein structures. “In some sense the problem is solved.”

The full paper is published in Nature: Jumper, J., Evans, R., Pritzel, A. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).

Deep Mind has released the source on GitHub, including instructions for building a Docker container to run AlphaFold. 

One of the faculty at Drexel, where I work, requested AlphaFold be installed. However, in HPC, it is more common to use Singularity containers rather than Docker, as Singularity does not require a daemon nor root privileges. I was able to modify the Dockerfile and the Python wrapper to work with Singularity. Additionally, I added some integration with Slurm, querying the Slurm job environment for the available GPU devices, and the scratch/tmp directory for output. My fork was on GitHub, but since my pull request for the Singularity stuff was not accepted, I have split off the Singularity- and Slurm-specific stuff into its own repo.

UPDATE 2022-08-05 The Singularity code has been updated for AlphaFold 2.2.2

UPDATE 2022-10-02 Updated for AlphaFold 2.2.4; Singularity image now hosted on Sylabs Cloud. See new post.


US DoE (Argonne) to acquire AMD+Nvidia supercomputer as testbed for delayed Intel-based exascale supercomputer

From Reuters: The Nvidia and AMD machine, to be called Polaris, will not be a replacement for the Intel-based Aurora machine slated for the Argonne National Lab near Chicago, which was poised to be the nation's fastest computer when announced in 2019.

Instead, Polaris, which will come online this year, will be a test machine for Argonne to start readying its software for the Intel machine, the people familiar with the matter said.


Google DeepMind's AlphaFold protein folding software officially open-sourced with early release paper in Nature

From Ars Technica:

For decades, researchers have tried to develop software that can take a sequence of amino acids and accurately predict the structure it will form. Despite this being a matter of chemistry and thermodynamics, we've only had limited success—until last year. That's when Google's DeepMind AI group announced the existence of AlphaFold, which can typically predict structures with a high degree of accuracy.

At the time, DeepMind said it would give everyone the details on its breakthrough in a future peer-reviewed paper, which it finally released yesterday. In the meantime, some academic researchers got tired of waiting, took some of DeepMind's insights, and made their own. The paper describing that effort also was released yesterday.

Academic researchers implemented some of the ideas from AlphaFold themselves, and produced RoseTTAFold. 


  • Jumper, J., Evans, R., Pritzel, A. et al. Highly accurate protein structure prediction with AlphaFold. Nature (2021). DOI: 10.1038/s41586-021-03819-2
  • Baek M., et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science (2021). DOI: 10.1126/science.abj8754


Julia vs. C++

When I first heard about Julia, various claims that it was as fast as a compiled language like C or Fortran. However, here are two articles going into some detail: 

  • This first from 2016 by Victor Zverovich talks about startup times and memory footprint. The examples used are small, so may not be indicative of performance for larger more numerically-intensive code. 
  • This second from 2019 by Eduardo Alvarez takes a very close look at what is needed to get C++-like speed from Julia. The way to do that is not to take advantage of Python-like syntax, and instead code Julia like it was C++.
It seems like Julia would be good for a Python scientific programmer, who wants to increase performance greatly, but not switch to the somewhat more complicated programming involved in using C++.


Building NCBI NGS tools (NGS SDK, VDB, SRA Tools) on RHEL 8

I had some trouble building SRA Tools from source on RHEL8. After short message thread on GitHub, I went back and tried again, this time setting the “--relative-build-out-dir” option on configure for all components of the NCBI NGS suite. That fixed it.

My full write-up of the build process is a GitHub gist.


Newly discovered malware called Kobalos targeting HPC

 From Ars Technica:

High-performance computer networks, some belonging to the world’s most prominent organizations, are under attack by a newly discovered backdoor that gives hackers the ability to remotely execute commands of their choice, researchers said on Tuesday.

Kobalos, as researchers from security firm Eset have named the malware, is a backdoor that runs on Linux, FreeBSD, and Solaris, and code artifacts suggest it may have once run on AIX and the ancient Windows 3.11 and Windows 95 platforms. The backdoor was released into the wild no later than 2019, and the group behind it was active throughout last year.


New Cerebras wafer-scale single server outperforms Joule supercomputer

From HPC Wire:

Cerebras Systems, a pioneer in high performance artificial intelligence (AI) compute, today announced record-breaking performance on a scientific compute workload. In collaboration with the Department of Energy’s National Energy Technology Laboratory (NETL), Cerebras demonstrated its CS-1 delivering speeds beyond what either CPUs or GPUs are currently able to achieve. Specifically, the CS-1 was 200 times faster than the Joule Supercomputer on the key workload of Computational Fluid Dynamics (CFD).

While Cerebras’s CS-1 system is billed as an AI-focused machine, it outdid the Joule Supercomputer (number 82 in the TOP500) on a non-AI workload. While the Joule cost 10’s of millions of dollars, occupies dozens of racks, and consumes 450 kW of power, the CS-1 fits in only one-third of a rack.

Cerebras has a good write-up on their blog. Gory detail in the preprint: arXiv:2010.03660 [cs.DC].


Switching to SSSD (from nslcd) in Bright Cluster Manager 9

In Bright Cluster Manager 9.0, cluster nodes still use nslcd for LDAP authentication. Since we have sssd working in Bright CM 6 (by necessity due to an issue with Univa Grid Engine and nslcd; see previous posts), we might as well change things over to sssd on Bright CM 9, too. The cluster now runs RHEL8.

First, we disable the nslcd service on all nodes. It was a little non-obvious how to do this since trying to remove it in the device services did nothing: the service just kept coming back enabled. I.e. do “remove nslcd ; commit” and then “list” and nslcd just reappears.

Examining that service in the device view showed that it “belongs to a role,” but it is not listed in any role, nor in the category of that node.

[foocluster]% category use compute-cat

[foocluster->category[compute-cat]]% services

[foocluster->category[compute-cat]->services]% list

Service (key)            Monitored  Autostart

------------------------ ---------- ----------

It turns out that nslcd is part of a hidden role which is not visible to the user. So, you have to write a loop to disable nslcd on each node. Within cmsh:

[foocluster]% device

[foocluster]% foreach -v -n node001..node099 (services; use nslcd; set monitored no ; set autostart no)

[foocluster]% commit

To modify the node image, I modify the image on one node, and then do “grabimage -w” in cmsh on the head node.

You will need to install these packages:

  • openldap-clients
  • sssd
  • sssd-ldap 
  • openssl-perl

Next, sssd setup. This may depend on your installation. The installation here uses the LDAP server set up by Bright CM, which uses SSL for encryption with both server and client certificates. (All self-signed with a dummy CA in the usual way.) The following /etc/sssd/sssd.conf shows only the non-empty sections. Your configuration may need to be different depending on your environment. 


id_provider = ldap

autofs_provider = ldap

auth_provider = ldap

chpass_provider = ldap

ldap_uri = ldaps://

ldap_search_base = dc=cm,dc=cluster

ldap_id_use_start_tls = False

ldap_tls_reqcert = demand

ldap_tls_cacertdir = /cm/local/apps/openldap/etc/certs

cache_credentials = True

enumerate = False

entry_cache_timeout = 600

ldap_network_timeout = 3

ldap_connection_expire_timeout = 60


config_file_version = 2

services = nss, pam

domains = default


homedir_substring = /home


# chown root:root /etc/sssd/sssd.conf 

# chmod 600 /etc/sssd/sssd.conf

I did not have to change /etc/openldap/ldap.conf

The next step is to switch to using sssd for authentication. But first, stop and disable the nslcd service: 

# systemctl stop nslcd

# systemctl disable nslcd

The old authconfig-tui utility is gone. The new one is authselect: you will have to force it to overwrite existing authentication configurations.

# authselect select sssd --force

There are other options to authselect, e.g. “with-mkhomedir”. See authselect(8) and authselect-profiles(5) for details. Other options may also require other packages to be installed.

Then, start and enable the sssd service. Check that user ID info can be retrieved:

# id someuser

Back on the head node, do “grabimage -w”. 

Then, modify the node category to add the sssd service, setting it to autostart and to be monitored.


Scripting Bright Cluster Manager 9.0 with Python

It has been more than 6 years since the previous post about using the Python API to script Bright Cluster Manager (CM). Time for an update.

I have to do the same as before: change the “category” on a whole bunch of nodes.

NB the Developer Manual has some typos, where it makes it look like you can specify categories as strings of their names, e.g. cluster.get_by_type('Node')


Still more on SSSD in Bright Cluster Manager - cache file

It has been a few years since I got SSSD to work in Bright Cluster Manager 6, and I just figured out one little thing that has been an annoyance for a few years. There has been a spurious group hanging on: it has the same GID as an existing group, but a different group name.

Since Bright CM 6 did not handle SSSD out of the box, it also did not handle the SSSD cache file. More accurately, it did not ignore the file in the software image, and the grabimage command would grab the image to the provisioning server and then propagate it to nodes in the category.

The fix is simple: add /var/lib/sss/db/* to the various exclude list settings in the category.

To reset the cache:
    service sssd stop
    /bin/rm -f /var/lib/sss/db/cache_default.ldb
    service sssd start

I did try "sss_cache -E" which is supposed to clear the cache, but found that it did not work as I expected: the spurious group still appeared with "getent group".


Cray and AMD will build 1.5 exaFLOPS supercomputer for DOE

From Ars Technica: AMD and Cray have announced that they're building "Frontier," a new supercomputer for the Department of Energy at Oak Ridge National Laboratory. The goal is to deliver a system that can perform 1.5 exaflops: 1.5×1018 floating point operations per second.

The current (Nov 2018) Top 500 leader, Summit at ORNL, runs at 0.2 exaflops.


Notes on building HDF5

HDF5 is a suite of data-centered technology: structures, file formats, APIs, applications.

There are two ways to build HDF5: using the traditional "configure and make", or using CMake.

Using the configure-make method, a few of the tests may fail if run on an NFS filesystem. These tests are use_append_chunk and use_append_mchunks. The test programs first create a file (successfully), and then try to open them for read, where they fail. The error output looks like:

    157778: continue as the writer process
    dataset rank 3, dimensions 0 x 256 x 256
    157778: child process exited with non-zero code (1)
    Error(s) encountered
    HDF5-DIAG: Error detected in HDF5 (1.10.2) thread 0:
      #000: H5F.c line 511 in H5Fopen(): unable to open file
        major: File accessibilty
        minor: Unable to open file
      #001: H5Fint.c line 1604 in H5F_open(): unable to read superblock
        major: File accessibilty
        minor: Read failed
      #002: H5Fsuper.c line 630 in H5F__super_read(): truncated file: eof = 479232, sblock->base_addr = 0, stored_eof = 33559007
        major: File accessibilty
        minor: File has been truncated
    H5Fopen failed
    read_uc_file encountered error

The "Error detected in … thread 0" first led me to think that it was a threading issue. So, I re-configured with thread-safety on, which meant that C++ and Fortran APIs were not built, nor the high-level library. The tests still failed.

However, running the tests (with original config, i.e. without thread-safety but with C++, Fortran, and high-level library) on a local disk resulted in success.

Using CMake to build, all tests pass, even when doing them on an NFS volume.

UPDATE This fact that some tests fail on NFS mounts is documented at the HDF5 downloads page: "Please be aware! On UNIX platforms the HDF5 tests must be run on a local file system or a parallel file system running GPFS or Lustre in order for the SWMR tests to complete properly."


U.S. DoE unvails $500m supercomputer

From the New York Times: DoE is unveiling a new $500m exascale supercomputer, produced by a collaboration between Intel and Cray. This is part of the computing arms race with China, which has been making big strides in supercomputing in the past decade.

The supercomputer, called Aurora, is a retooling of a development effort first announced in 2015 and is scheduled to be delivered to the Argonne National Laboratory near Chicago in 2021. Lab officials predict it will be the first American machine to reach a milestone called “exascale” performance, surpassing a quintillion calculations per second.


Docker containers for high performance computing

I have started to see more science application authors/groups provide their applications as Docker images. Having only a passing acquaintance with Docker (and with containers in general), I found this article at The New Stack useful: Containers for High Performance Computing (Joab Jackson).

The vendors of the tools I use -- Bright Cluster Manager, Univa Grid Engine -- have incorporated support for containers. It is good to read some independent information about the role of containers in HPC.

Christian Kniep of Docker points out some issues with the interaction of HPC and Docker, and came up with a preliminary solution (a proxy for Docker Engine) to address the issues. HPC commonly makes use of specific hardware (e.g. GPUs, InfiniBand), and this is counter to Docker's hardware-agnostic approach. Also, HPC workflows may rely on shared resources (e.g. i/o to a shared filesystem).