Showing posts with label infiniband. Show all posts
Showing posts with label infiniband. Show all posts


Mellanox Infiniband network cards on Linux

Sometimes, when one updates the firmware for Mellanox Infiniband cards, the MAC/hardware address gets changed. This usually happens if the IB card is OEM, i.e. made by Mellanox but stamped with a different company's name.

When the MAC gets changed, the network interface will not come up. The fix is to update the HWADDR field in /etc/sysconfig/network-scripts/ifcfg-ib0 and /etc/sysconfig/network-scripts/ifcfg-ib1. Use "ip link list" to display the new MAC.


RHEL 6.4 kernel 2.6.32-358.23.2, Mellanox OFED 2.1-1.0.6, and Lustre client 2.5.0

I am planning some upgrades for the cluster that I manage. As part of the updates, it would be good to have MVAPICH2 with GDR (GPU-Direct RDMA -- yes, that's an acronym of an acronym). MVAPICH2-GDR, which is provided only as binary RPMs, only supports Mellanox OFED 2.1.

Now, our cluster runs RHEL6.4, but with most non-kernel and non-glibc packages updated to whatever is in RHEL6.5. The plan is to update everything to whatever is in RHEL6.6, except for the kernel, leaving that at 2.6.32-358.23.2 which is the last RHEL6.4 kernel update. The reason for staying with that version of the kernel is because of Lustre.

We have a Terascala Lustre filesystem appliance. The latest release of TeraOS uses Lustre 2.5.0. Upgrading the server is pretty straightforward, according to the Terascala engineers. Updating the client is a bit trickier. Currently, the Lustre support matrix says that Lustre 2.5.0 is supported only on RHEL6.4.

The plan of attack is this:

  1. Update a base node with all RHEL packages, leaving the kernel at 2.6.32-358.23.2
  2. Upgrade Mellanox OFED from 1.9 to 2.1
  3. Build lustre-client-2.5.0 and upgrade the Lustre client packages

Updating the base node is straightforward. Just use "yum update", after commenting out the exclusions in /etc/yum.conf. If you had updated the <tt>redhat-release-server-6server<tt> package, which defines which RHEL release you have, you can downgrade it. (See RHEL Knowledgebase, subscription required.) First, install the last (as of 2014-12-15) RHEL6.4 kernel, and then do the downgrade:
# yum install kernel-2.6.32-358.23.2.el6
# reboot
# yum downgrade redhat-release-server-6Server

Check with "cat /etc/redhat-release".

Next, install Mellanox OFED 2.1-1.0.6. You can install it directly using the provided installation script, or if you are paranoid like me, you can use the provided script to build RPMs against the exact kernel update you have installed.

Get the tarball directly from Mellanox. Extract, and make new RPMs:
# tar xf MLNX_OFED_LINUX-2.1-1.0.6-rhel6.4-x86_64.tgz
# cd MLNX_OFED_LINUX-2.1-1.0.6-rhel6.4-x86_64
# ./ -m .
# cp /tmp/MLNX_OFED_LINUX-2.1-1.0.6-rhel6.4-x86_64-ext.tgz .
# tar xf MLNX_OFED_LINUX-2.1-1.0.6-rhel6.4-x86_64-ext.tgz
# cd MLNX_OFED_LINUX-2.1-1.0.6-rhel6.4-x86_64-ext
# ./mlnxofedinstall
# reboot

Strictly speaking, the reboot is unnecessary: you can stop and restart a couple of services and the new OFED will load.

Next, for Lustre. Get the SRPM from Intel (who bought WhamCloud). You will notice that it is for kernel 2.6.32-358.18.1. Not mentioned is the fact that by default, it uses the generic OFED that RedHat rolls into its distribution. To use the Mellanox OFED, a slightly different installation method must be used.

# rpm -Ivh lustre-client-2.5.0-2.6.32_358.18.1.el6.x86_64.src.rpm
# cd ~/rpmbuild/SOURCES
# cp lustre-2.5.0.tar.gz ~/tmp
# cd ~/tmp
# tar xf lustre-2.5.0.tar.gz
# cd lustre-2.5.0
# ./configure --disable-server --with-o2ib=/usr/src/ofa_kernel/default
# make rpms
# cd ~/rpmbuild/RPMS/x86_64
# yum install lustre-client-2.5.0-2.6.32_358.23.2.el6.x86_64.x86_64.rpm \
lustre-client-modules-2.5.0-2.6.32_358.23.2.el6.x86_64.x86_64.rpm \
lustre-client-tests-2.5.0-2.6.32_358.23.2.el6.x86_64.x86_64.rpm \
To make the lustre module load at boot, I have a kludge: to /etc/init.d/netfs right after the line
STRING=$"Checking network-atttached filesystems"
modprobe lustre
Reboot, and then check:
# lsmod | grep lustre
lustre                921744  0
lov                   516461  1 lustre
mdc                   199005  1 lustre
ptlrpc               1295397  6 mgc,lustre,lov,osc,mdc,fid
obdclass             1128062  41 mgc,lustre,lov,osc,mdc,fid,ptlrpc
lnet                  343705  4 lustre,ko2iblnd,ptlrpc,obdclass
lvfs                   16582  8 mgc,lustre,lov,osc,mdc,fid,ptlrpc,obdclass
libcfs                491320  11 mgc,lustre,lov,osc,mdc,fid,ko2iblnd,ptlrpc,obdclass,lnet,lvfs