2021-09-22

AlphaFold 2 on Singularity + Slurm

Deep Mind’s AlphaFold has been making waves recently. It is a new solution to a 50-year old Grand Challenge, to figure out how proteins fold. From Deep Mind’s blog:

Proteins are essential to life, supporting practically all its functions. They are large complex molecules, made up of chains of amino acids, and what a protein does largely depends on its unique 3D structure. Figuring out what shapes proteins fold into is known as the “protein folding problem”, and has stood as a grand challenge in biology for the past 50 years. In a major scientific advance, the latest version of our AI system AlphaFold has been recognised as a solution to this grand challenge by the organisers of the biennial Critical Assessment of protein Structure Prediction (CASP). This breakthrough demonstrates the impact AI can have on scientific discovery and its potential to dramatically accelerate progress in some of the most fundamental fields that explain and shape our world. 

From the journal Nature

DeepMind’s program, called AlphaFold, outperformed around 100 other teams in a biennial protein-structure prediction challenge called CASP, short for Critical Assessment of Structure Prediction. The results were announced on 30 November, at the start of the conference — held virtually this year — that takes stock of the exercise.

“This is a big deal,” says John Moult, a computational biologist at the University of Maryland in College Park, who co-founded CASP in 1994 to improve computational methods for accurately predicting protein structures. “In some sense the problem is solved.”

The full paper is published in Nature: Jumper, J., Evans, R., Pritzel, A. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021). https://doi.org/10.1038/s41586-021-03819-2

Deep Mind has released the source on GitHub, including instructions for building a Docker container to run AlphaFold. 

One of the faculty at Drexel, where I work, requested AlphaFold be installed. However, in HPC, it is more common to use Singularity containers rather than Docker, as Singularity does not require a daemon nor root privileges. I was able to modify the Dockerfile and the Python wrapper to work with Singularity. Additionally, I added some integration with Slurm, querying the Slurm job environment for the available GPU devices, and the scratch/tmp directory for output. My fork was on GitHub, but since my pull request for the Singularity stuff was not accepted, I have split off the Singularity- and Slurm-specific stuff into its own repo.

UPDATE 2022-08-05 The Singularity code has been updated for AlphaFold 2.2.2

UPDATE 2022-10-02 Updated for AlphaFold 2.2.4; Singularity image now hosted on Sylabs Cloud. See new post.

2021-08-24

US DoE (Argonne) to acquire AMD+Nvidia supercomputer as testbed for delayed Intel-based exascale supercomputer

From Reuters: The Nvidia and AMD machine, to be called Polaris, will not be a replacement for the Intel-based Aurora machine slated for the Argonne National Lab near Chicago, which was poised to be the nation's fastest computer when announced in 2019.

Instead, Polaris, which will come online this year, will be a test machine for Argonne to start readying its software for the Intel machine, the people familiar with the matter said.


2021-07-18

Google DeepMind's AlphaFold protein folding software officially open-sourced with early release paper in Nature

From Ars Technica:

For decades, researchers have tried to develop software that can take a sequence of amino acids and accurately predict the structure it will form. Despite this being a matter of chemistry and thermodynamics, we've only had limited success—until last year. That's when Google's DeepMind AI group announced the existence of AlphaFold, which can typically predict structures with a high degree of accuracy.

At the time, DeepMind said it would give everyone the details on its breakthrough in a future peer-reviewed paper, which it finally released yesterday. In the meantime, some academic researchers got tired of waiting, took some of DeepMind's insights, and made their own. The paper describing that effort also was released yesterday.

Academic researchers implemented some of the ideas from AlphaFold themselves, and produced RoseTTAFold. 

Articles:

  • Jumper, J., Evans, R., Pritzel, A. et al. Highly accurate protein structure prediction with AlphaFold. Nature (2021). DOI: 10.1038/s41586-021-03819-2
  • Baek M., et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science (2021). DOI: 10.1126/science.abj8754

2021-07-16

Podman and Shorewall

Bright Cluster Manager uses Shorewall to manage the various firewall rules on the head/management node. By default, this seems to prevent Podman and Docker from working right.

I am working through a simple example of running a pod with PostgreSQL and PGAdmin but the connection to the host port that forwards to the pgadmin container seemed to be blocked. Connection attempts using both curl and web browsers would hang.

There is additional configuration that needs to be done for Shorewall to work with Podman. Shorewall has instructions on making it work with Docker, and it seems to work for podman with minor modifications.

First, modify the systemd service to not clear firewall rules on service stop. Do:

sudo systemctl edit shorewall.service

which gives a blank file. Add these contents:

[Service]

# reset ExecStop

ExecStop=

# set ExecStop to "stop" instead of "clear"

ExecStop=/sbin/shorewall $OPTIONS stop

Then activate the changes with

sudo systemctl daemon-reload

Next, we need to know the name of the Podman network interface. Use “ip link list” to see it. On my RHEL 8 system, the interface is 

10: cni-podman0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000

    link/ether xx:xx:xx:xx:xx:xx brd ff:ff:ff:ff:ff:ff

And make the following modifications to the appropriate config files.

Enable Docker mode in /etc/shorewall/shorewall.conf:

DOCKER=Yes

Define a zone for Podman in /etc/shorewall/zones:

#ZONE    TYPE    OPTIONS

pod     ipv4    # 'pod' is just an example -- call it anything you like

Define policies for this zone in /etc/shorewall/policy:

#SOURCE        DEST        POLICY        LEVEL 

pod            $FW          REJECT

pod            all          ACCEPT

And match the zone to the interface in /etc/shorewall/interfaces:

# Need to specify "?FORMAT 2" 

?FORMAT 2

#ZONE  INTERFACE    OPTIONS

pod    cni-podman0  bridge   # Allow ICC (inter-container communication); bridge implies routeback=1

Then, restart shorewall. And start the pod; or restart if it was already running.

You many need additional rules to allow an external host to connect into the pod. E.g. a pod containing a pgadmin container and a postgresql container, where the pgadmin container serves on port 80. Say your administrative hosts will be in the address block 10.0.10.0/23. Then, add the following to /etc/shorewall/rules:

# Accept connections from admin hosts to the pgadmin container

# ACTION  SOURCE              DEST   PROTO   DEST

#                                            PORT(S)

ACCEPT    net:10.0.10.0/23    pod    tcp     80

2021-07-01

Julia vs. C++

When I first heard about Julia, various claims that it was as fast as a compiled language like C or Fortran. However, here are two articles going into some detail: 

  • This first from 2016 by Victor Zverovich talks about startup times and memory footprint. The examples used are small, so may not be indicative of performance for larger more numerically-intensive code. 
  • This second from 2019 by Eduardo Alvarez takes a very close look at what is needed to get C++-like speed from Julia. The way to do that is not to take advantage of Python-like syntax, and instead code Julia like it was C++.
It seems like Julia would be good for a Python scientific programmer, who wants to increase performance greatly, but not switch to the somewhat more complicated programming involved in using C++.

2021-03-01

Building NCBI NGS tools (NGS SDK, VDB, SRA Tools) on RHEL 8

I had some trouble building SRA Tools from source on RHEL8. After short message thread on GitHub, I went back and tried again, this time setting the “--relative-build-out-dir” option on configure for all components of the NCBI NGS suite. That fixed it.

My full write-up of the build process is a GitHub gist.

2021-02-03

Newly discovered malware called Kobalos targeting HPC

 From Ars Technica:

High-performance computer networks, some belonging to the world’s most prominent organizations, are under attack by a newly discovered backdoor that gives hackers the ability to remotely execute commands of their choice, researchers said on Tuesday.

Kobalos, as researchers from security firm Eset have named the malware, is a backdoor that runs on Linux, FreeBSD, and Solaris, and code artifacts suggest it may have once run on AIX and the ancient Windows 3.11 and Windows 95 platforms. The backdoor was released into the wild no later than 2019, and the group behind it was active throughout last year.