2015-05-15

pylab confusions

There are three pylabs that one may encounter in using Python. Two have been around for a while, and the third just showed up less than a month ago.

The “real” pylab is the procedural interface to matplotlib, i.e. a MATLAB-like command line interface. It imports matplotlib.pyplot and numpy into a single namespace. You can use it from ipython’s prompt by calling the magic function “%pylab”. It is no longer recommended by the matplotlib people. The recommended way is to import with abbreviated namespace names, and use the qualified functions. For example:

import matplotlib.pyplot as pltimport numpy as np
x = np.linspace(0, 2, 100)
plt.plot(x, x, label='linear')plt.plot(x, x**2, label='quadratic')plt.plot(x, x**3, label='cubic')
plt.xlabel('x label')plt.ylabel('y label')
plt.title("Simple Plot")
plt.legend()
plt.show()
Then there is the idea/proposal by Keir Mierle to improve on the pylab idea of a single package one might use to utilize Python for interactive analysis. This is written up in the SciPy wiki, but does not seem to have been updated since 2012.

And finally, if you are like me, and have not been thinking too hard, and typed “pip install pylab” you get this new package from PyPI, first added on 2015-04-23. It does nothing but pull in several other Python packages, i.e. it serves as a metapackage. You can see the source is basically a dummy, with all the action in the requirements defined in setup.py.

2015-03-17

Ganglia procstat.py fix to handle process names containing underscores

UPDATE: Accepted. Current version here.

This has bugged me for a while: Ganglia's Python module procstat.py which monitors process CPU and memory usage did not show any data for Grid Engine's qmaster, which has a process name of "sge_qmaster". Turns out, this is because it tries to parse out the process name by assuming it does not have underscores in it. This snippet is from the get_stat(name) function in procstat.py:

if name.startswith('procstat_'):
    fir = name.find('_')
    sec = name.find('_', fir + 1)
    proc = name[fir + 1:sec]
    label = name[sec + 1:]
I just submitted a pull request to change this to something which handles process names with some number of underscores. The snippet to replace the above:

if name.startswith('procstat_'):
    nsp = name.split('_')
    proc = '_'.join(nsp[1:-1])
    label = nsp[-1
]

2015-02-11

FlexLM and host names

If you ever get an error with lmstat like:

lmgrd is not running: License server machine is down or not responding. (-96,7:2 "No such file or directory")

but only from some machines outside your domain, check that the SERVER line in your license file specifies the FQDN of the license server. The default is to use just the hostname.

SERVER myserver.mydom.com XXXXXXXXXXXX NNNNN

Most search hits on that error message say things about firewalls.

2015-02-09

Grid Engine PE script (prologue) for Abaqus

UPDATE: Well, a closer look at some of the files Abaqus generates during its run indicates that Abaqus (or, technically, Platform MPI) is aware of Grid Engine and can figure out the host list by itself.

Abaqus 6.13 uses Platform MPI, but also uses its own "environment file" for the MPI hostfile. (Search for "mp_host_list" at the official documentation.) So, I cooked up this PE script (aka prologue) to write the abaqus_v6.env file in the job directory:

#!/usr/bin/env python
import sys, os ### PE startup script to set up Abaqus MPI "hostfile"
### Based on documented env file format machinefile = os.environ['PE_HOSTFILE']
abaqenvfile = "abaqus_v6.env" machinelines = []
with open(machinefile, "ro") as mf:
for l in mf:
lsplit = l.split()
machinelines.append( [lsplit[0], int(lsplit[1])] ) with open(abaqenvfile, "wo") as envfile:
envfile.write("mp_mode=MPI\n")
envfile.write("mp_host_list=%s\n" % (str(machinelines)))

2014-12-30

Storage Manager(s) software

I am really curious about a storage device management GUI named "SMclient". At my previous job at Wake Forest, we used an IBM GPFS storage system. The software used to manage it was SMclient, with SM, I assume, meaning "Storage Manager". It's a Java-based GUI. There is also a command line interface, SMcli.

Here at Drexel's URCF, we have a Dell High Availability NFS storage system, using a MD3260 storage device and HA front-ends using Red Hat's High Availability Add-On and XFS. Anyway, the GUI used to manage the MD3260 storage device is also SMclient, which looks identical to IBM's SMclient. Nothing in the "About" window mentions the history of the software.

Anyone out there know the history of this software?

2014-12-29

Mellanox Infiniband network cards on Linux

Sometimes, when one updates the firmware for Mellanox Infiniband cards, the MAC/hardware address gets changed. This usually happens if the IB card is OEM, i.e. made by Mellanox but stamped with a different company's name.

When the MAC gets changed, the network interface will not come up. The fix is to update the HWADDR field in /etc/sysconfig/network-scripts/ifcfg-ib0 and /etc/sysconfig/network-scripts/ifcfg-ib1. Use "ip link list" to display the new MAC.

2014-12-16

RHEL 6.4 kernel 2.6.32-358.23.2, Mellanox OFED 2.1-1.0.6, and Lustre client 2.5.0

I am planning some upgrades for the cluster that I manage. As part of the updates, it would be good to have MVAPICH2 with GDR (GPU-Direct RDMA -- yes, that's an acronym of an acronym). MVAPICH2-GDR, which is provided only as binary RPMs, only supports Mellanox OFED 2.1.

Now, our cluster runs RHEL6.4, but with most non-kernel and non-glibc packages updated to whatever is in RHEL6.5. The plan is to update everything to whatever is in RHEL6.6, except for the kernel, leaving that at 2.6.32-358.23.2 which is the last RHEL6.4 kernel update. The reason for staying with that version of the kernel is because of Lustre.

We have a Terascala Lustre filesystem appliance. The latest release of TeraOS uses Lustre 2.5.0. Upgrading the server is pretty straightforward, according to the Terascala engineers. Updating the client is a bit trickier. Currently, the Lustre support matrix says that Lustre 2.5.0 is supported only on RHEL6.4.

The plan of attack is this:

  1. Update a base node with all RHEL packages, leaving the kernel at 2.6.32-358.23.2
  2. Upgrade Mellanox OFED from 1.9 to 2.1
  3. Build lustre-client-2.5.0 and upgrade the Lustre client packages

Updating the base node is straightforward. Just use "yum update", after commenting out the exclusions in /etc/yum.conf. If you had updated the <tt>redhat-release-server-6server<tt> package, which defines which RHEL release you have, you can downgrade it. (See RHEL Knowledgebase, subscription required.) First, install the last (as of 2014-12-15) RHEL6.4 kernel, and then do the downgrade:
# yum install kernel-2.6.32-358.23.2.el6
# reboot
# yum downgrade redhat-release-server-6Server

Check with "cat /etc/redhat-release".

Next, install Mellanox OFED 2.1-1.0.6. You can install it directly using the provided installation script, or if you are paranoid like me, you can use the provided script to build RPMs against the exact kernel update you have installed.

Get the tarball directly from Mellanox. Extract, and make new RPMs:
# tar xf MLNX_OFED_LINUX-2.1-1.0.6-rhel6.4-x86_64.tgz
# cd MLNX_OFED_LINUX-2.1-1.0.6-rhel6.4-x86_64
# ./mlnx_add_kernel_support.sh -m .
...
# cp /tmp/MLNX_OFED_LINUX-2.1-1.0.6-rhel6.4-x86_64-ext.tgz .
# tar xf MLNX_OFED_LINUX-2.1-1.0.6-rhel6.4-x86_64-ext.tgz
# cd MLNX_OFED_LINUX-2.1-1.0.6-rhel6.4-x86_64-ext
# ./mlnxofedinstall
# reboot

Strictly speaking, the reboot is unnecessary: you can stop and restart a couple of services and the new OFED will load.

Next, for Lustre. Get the SRPM from Intel (who bought WhamCloud). You will notice that it is for kernel 2.6.32-358.18.1. Not mentioned is the fact that by default, it uses the generic OFED that RedHat rolls into its distribution. To use the Mellanox OFED, a slightly different installation method must be used.

# rpm -Ivh lustre-client-2.5.0-2.6.32_358.18.1.el6.x86_64.src.rpm
# cd ~/rpmbuild/SOURCES
# cp lustre-2.5.0.tar.gz ~/tmp
# cd ~/tmp
# tar xf lustre-2.5.0.tar.gz
# cd lustre-2.5.0
# ./configure --disable-server --with-o2ib=/usr/src/ofa_kernel/default
# make rpms
# cd ~/rpmbuild/RPMS/x86_64
# yum install lustre-client-2.5.0-2.6.32_358.23.2.el6.x86_64.x86_64.rpm \
lustre-client-modules-2.5.0-2.6.32_358.23.2.el6.x86_64.x86_64.rpm \
lustre-client-tests-2.5.0-2.6.32_358.23.2.el6.x86_64.x86_64.rpm \
lustre-iokit-2.5.0-2.6.32_358.23.2.el6.x86_64.x86_64.rpm
To make the lustre module load at boot, I have a kludge: to /etc/init.d/netfs right after the line
STRING=$"Checking network-atttached filesystems"
add
modprobe lustre
Reboot, and then check:
# lsmod | grep lustre
lustre                921744  0
lov                   516461  1 lustre
mdc                   199005  1 lustre
ptlrpc               1295397  6 mgc,lustre,lov,osc,mdc,fid
obdclass             1128062  41 mgc,lustre,lov,osc,mdc,fid,ptlrpc
lnet                  343705  4 lustre,ko2iblnd,ptlrpc,obdclass
lvfs                   16582  8 mgc,lustre,lov,osc,mdc,fid,ptlrpc,obdclass
libcfs                491320  11 mgc,lustre,lov,osc,mdc,fid,ko2iblnd,ptlrpc,obdclass,lnet,lvfs