HPC Cluster


Since 2013, the LSS maintains a 36-CPU High-Performance-Cluster. In the following, general information and hints about the usage of the machine is given. For problem reports, questions, suggestions etc. please mail to cs10-support@fau.de.

Hardware

The cluster consists of 8 compute nodes, 1 file server (11 TB), and 1 visualization node. Each compute node has the following specifications in common:

  • 4 x Intel(R) Xeon(R) CPU E7-4830, 2.13 GHz - 2.4GHz (max. turbo) (8 cores + SMT), SSE 4.1/4.2, 24 MB shared cache
  • 256 GB RAM
  • 2 x 300 GB SAS internal disks
  • NVIDIA GeForce GTX 680
  • QDR Infiniband network

The visualization node contains an additional NVIDIA Quadro K5000 graphics card.

Access

In order to login to the front end i10hpc.informatik.uni-erlangen.de you need a valid account at LSS. Access to the cluster is granted via authentication by ssh-keys.

Environment

Software

Because there are multiple combinations of compilers and libraries available, the Environment Modules Package has been provided for easy switching between different packages. You can find a full description of the modules project on the sourceforge modules homepage.

To get a list of available modules type: module avail

To load a module, simply issue the command: module load <name_of_module>

To unload the module use: module unload <name_of_module>

To see what a module is doing when it gets loaded you can use the command: module show <name_of_module>

A list of all modules currently loaded can be received by: module list

Filesystems

When logging in you will find yourself in your LSS home directory. Besides, all cluster nodes have a shared filesystem mounted on /scratch containing directories /scratch/<login> Running over Infiniband this filesystem provides larger bandwith and more space than other shares at LSS. Therefore, simulations should store data in this directory. Besides, there is also limited local disk space mounted on /local on each node which is not shared between the nodes.

Home directory

Please consider that your home directoy on the compute nodes is /scratch/<IdM-account> and not your normal home. Please adapt your job scripts accordingly.

Usage

Interactive work

The front-end or login node is i10hpc.informatik.uni-erlangen.de. Here you can compile your code, make short test runs and submit jobs to the queueing system. For long running jobs, please use the queueing system. In case of abuse, jobs will be terminated without warning.

Batch-System

All jobs with moderate or high computation times have to be submitted to the batch system. The batch system is able to handle serial and parallel jobs. Jobs can be submitted only from the login node i10hpc. For that purpose a job script is neccessary, in which all requirements of the job are specified.

This script is submitted to the queueing system by: qsub <name_of_script>

The qualifier of the job is returned by the queueing system. In order to see all queued jobs, the command qstat -a can be used. Queued jobs may also be deleted by the user who submitted the job. In this case the first number of the job identifier has to be used: qdel <job_id>

By using the command qsub -W depend=afterany:<wait for job_id> <name_of_script> you can make your submitted jobs script wait for the job with id <wait for job_id> to be finished. There are different queues configured:

Queue default walltime max. walltime max. number of nodes type of nodes remark
devel 10 min 1 h 4 compute-nodes

only 1 job per user can run at a time

2 Nodes reserved for devel
from Mon - Fri 8:00 to 20:00

normal 1 h 24 h 5 compute-nodes no interactive jobs allowed
big 1 h 48 h 9 compute-nodes only runs at night and on the weekends
no interactive jobs allowed

If you do not specify a queue your job will be automatically routed to the most appropriate queue.

In the follwing you can find examples of job scripts for different applications.

 

 
#!/bin/bash
#PBS -l nodes=1:ppn=32
#PBS -l walltime=02:00:00
#PBS -q normal
#PBS -M <login>@fau.de -m abe
#PBS -N test

export OMP_NUM_THREADS=1

cd /scratch/<login>/example
./test
This is a job script for starting a serial job. After 2 hours of wall clock time, the job will be terminated by the queueing system. The default is 1 hour. Notifcation by mail at start, finish or abort of the job will be sent to address <login>@fau.de. All essential environment variables have to be set before the executable is started.

 

Parallel Job
#!/bin/bash
#PBS -l nodes=2:ppn=32
#PBS -l walltime=08:00:00
#PBS -q normal
#PBS -M <login>@fau.de -m abe
#PBS -N test

. /usr/share/modules/init/bash
module load openmpi/1.10.2-gnu

cd /scratch/<login>/example
mpirun -np 64 ./test
The number of nodes is set to 2 (with 32 cores each). After 8 hours of wall clock time, the job will be terminated by the queueing system. Do not forget to source the module init command in order to be able to use the module command in batch scripts.
Honouring NUMA / Pinning / Hybrid jobs
 
Each of the four processors in a node has its own memory bus, so that placing data adjacent to cores processing them is a prerequisite for ensuring good performance. Consequently, Linux tries to allocate RAM at the core who caused the respective page fault. But as threads and processes can migrate, this beneficial setup is generally not guaranteed to last.
There are various ways to prevent threads from roaming. Code compiled with Intel compilers can be instructed to do so through the KMP_AFFINITY environment variable. Alternatives that do not need compiler support are likwid, numactl and taskset (all of them being available on the cluster). See the respective man-pages for details.
OpenMPI provides some pinning support as command line arguments to mpirun via the --bind-to-node and --bind-to-socket options.
Hybrid jobs (OMP+MPI) are usually run best when one MPI process is used per node. OpenMPI can do all the job of starting the right number of jobs and pinning them to sockets if it knows the cluster's topology (see example below), which must be given as command line options to mpirun. Setting the number of OpenMP worker threads manually may be a good idea, even if most OpenMP runtimes are likely to use as many worker threads as logical processors are in its cpuset, in this case the socket.
mpirun -np 64 --bind-to-socket ./pure-mpi-program # pinning processes to a socket is usually sufficient
OMP_NUM_THREADS=8 mpirun --npersocket 1 \
    -mca orte_num_sockets 4 -mca orte_num_cores 8 \
    ./hybrid-program
# Running a hybrid job with one process per socket and one worker per cores.
# --npersocket implies --bind-to-socket.
# Some prefer to "export OMP_NUM_THREADS=8" separately.
Activating virtualgl mode (X-server) on the reserved nodes, useful for paraview
#!/bin/bash
#PBS -l nodes=2:ppn=32:virtualgl
#PBS -l walltime=01:00:00
#PBS -q normal
#PBS -M <login>@fau.de -m abe
#PBS -N paraview


. /usr/share/modules/init/bash
module load paraview/5.0.0

mpirun pvserver
This job script will start an X-Server on initialization (and kill it at exit) so that you can use the GPUs for OpenGL rendering.
Starting paraview in reverse-connection mode
#!/bin/bash
#PBS -l nodes=1:ppn=32:virtualgl
#PBS -l walltime=01:00:00
#PBS -q express
#PBS -M <login>@fau.de -m abe
#PBS -N paraview


. /usr/share/modules/init/bash
module load paraview/5.0.0

mpirun pvserver -rc -ch=<client_host>
This job script will start the paraview server and try to connect to the client on host <client_host> (e.g. i10staffXX.informatik.uni-erlangen.de a.k.a. your workstation)
Setting up paraview to automatically start pvserver on the cluster and wait for a connection
pv_remote_step1.png Open the server connection dialog.
pv_remote_step2.png Add a new server.
pv_remote_step3.png Give the server a name (i.e. "i10hpc"), and choose "Client / Server (reverse connection)" as connection type.
pv_remote_step4.png Change the startup type to "Command" and enter a command to submit your paraview server jobscript (from above) into the cluster queue. The client may then immediatly wait for a connection attempt from the server. Click "Save" to store the configuration. For this to work your ssh key has to be loaded in the authentication agent. On Windows you should use Putty's "plink.exe" instead of "ssh".