Introduction to Erwin


Overview | Key points | Logging in | Passwords | Files | Busy? | Process Map | Serial | Parallel | Editing | Printing | Documentation | Compiling | Libraries | Broken | Erwin?
This page describes erwin, the cluster of Athlon PCs purchased from Aspen Systems, and the basics of using it.

Erwin

Overview

Erwin is a cluster of PC-like machines running the Scyld-Beowulf version of Linux. We currently have 1 two-processor master node and 9 two-processor slave nodes. All nodes have identical computational hardware: two Athlon 1800s with 3.5Gbyte total. Deep in BLAS3, each processor can do >2GFlop, approx three times Isaac's computational throughput.

In theory we have a lot of compute power, but erwin is not a parallel machine in the traditional (Isaac, Carter) sense. The nodes are linked by a slow interconnection network (100Mbit ethernet) so only loosely coupled parallel jobs will run well. Serial, single processor, jobs will run very well, so it is best to think of Erwin as an easily utilized cluster of workstations.

Key points

Logging in

Erwin will only permit ssh access from inside NREL. There is no access from outside of NREL due to the NREL firewall. You cannot login to the slave nodes.

Passwords

The password is the same as for Isaac and friends, due to the "network" (NIS) password system. To change your password you should run "yppasswd" which will change your password on all UNIX machines.

Files

Erwin NFS mounts your usual homespace from either Isaac or u80csi3,  therefore inheriting all of your disk space limits. Files are shared over 100Mb/sec ethernet, so large file transfers will be slow.

Note however, that you cannot run from your usual homespace - these files are not visible to the slave nodes, where you need to run. Erwin gives you a temporary workspace, $ETMP, which is probably /erwin/cat/your_user_name. This is a large local disk on Erwin, visible to the slave nodes. You can store up to ~10GByte, but note that your files are not backed up!

How busy is the system?

A special version of top is installed. This shows - for the first few seconds - the programs running on the master node, and then updates to show everything running on the cluster

Top running on Erwin

Here Paul is running a great many escan jobs. Note that they are all running on slave nodes - the size (SIZE) and resident size (RSS) is listed a zero.

For a graphical view of what is happening on the system, run "beostatus". This shows the processor, memory, disk and network usage of each node. In the snapshot below, there are 2 CPUs free on the slaves - node 2 CPU 1 and node 8 CPU 0. Note that the master node is listed as "-1" - we can see some network traffic and disk I/O but no CPU activity. (This is how it should be)

Beostatus running on Erwin

Which nodes is which job running on?

ps xf | bpstat -P

is a highly useful command which gives a graphical overview of the jobs running on the system - who is the parent/sibling of who, and which nodes are they running on.
This is particularly useful when you want to make sure that one of your MPI jobs has arrived at the correct destination!

Running serial (single processor) programs

To run on the slave nodes, use the following incantation, remembering that your programs and data files must be on $ETMP.
cd $ETMP/my_run_directory
To run on the less busy node:
mpprun -np 1 -nolocal ./my_program_name
To run on a specific node:
bpsh "node number" ./my_program_name
Note that you can background, foreground, renice, kill, ps (etc) programs as per any normal UNIX system, even if your programs are running on the slave nodes. You can redirect standard input and output as you would any conventional program. This is one of the benefits of Scyld Beowulf over conventional cluster systems.

Short running programs, such as an analysis script or gnuplot, can be run directly on the master node per any normal UNIX system.

How do I run in parallel?

We have successfully compiled MPI (mpich version 1.2.3_Scyld from a source rpm provided by Scyld, for the curious) for use with the Portland Group's compilers. Linking MPI programs using mpif90 and/or mpif77 should work.

Our current setup no longer allows to link MPI programs using gcc, g77 and the likes. If this is a problem, please contact Volker.

The magic words to run MPI-enabled code are

mpirun -np [number of desired processors] [your_job_name] .

However, this will likely distribute your calculation onto CPUs that are scattered across different nodeboards, generating much network traffic and slow output. It should help throughput to keep your job restricted to CPUs on as few nodeboards as possible. Use, e.g.

mpirun -np 2 -beowulf_job_map 5:5 [your_job_name]
mpirun -np 4 -beowulf_job_map 5:5:6:6 [your_job_name]
etc.

to shepherd your run onto the two CPUs on nodeboard 5 or the four CPUs on nodeboards 5 and 6, respectively.

Feedback on performance with different settings would be greatly appreciated!

Editing

vi and emacs are installed

Printing

You can print postscript files to our HP 5000 using "lp"

Documentation

In addition to the system man pages, the compiler documentation is here (local copy, remote up to date copy). We have one (1) printed copy of the Scyld documentation, in case you wish to try something fancy. For everything else, try google.com or ask your local user expert.

Compiling

Erwin has f77 (pgf77, g77), f90  (pgf90), c (pgcc, gcc), and c++ (pgCC, g++) compilers installed. If everything has worked (your shell has sourced /etc/profile) then they will be on your PATH. If not, try
declare -x "PGI=/usr/local/pgi"
declare -x "PATH=$PGI/linux86/bin:$PATH"
declare -x "MANPATH=$MANPATH:$PGI/man"
declare -x "LM_LICENSE_FILE=$PGI/license.dat"
declare -x "LD_LIBRARY_PATH=$PGI/linux86/lib"
The Portland Group compilers also come with a debugger (pbdbg) and a profiler (pgprof). Use them.

The compilers accept quite conventional compile options (-g, -c, -O2, etc.). You might wish to try "-O2 -tp athlonxp -byteswapio". This should give a good level of optimisation, and also enables (byteswapio) reading and writing of binary files in the same format as Isaac and the SUNs. "-fast" has been found to cause problems with some codes. There are many compile options for optimisation - experiment and let us know what you find. Note that unnecessarily high levels of optimisation can slow down code. Be sure to benchmark your code, and preferably profile it.
Erwin also has two versions of gcc (the Gnu Compiler Collection) installed. The system compiler which you will get by default is gcc 2.96 (old). However you may wish to try the up-to-date version gcc 3.1.1. The compiler binaries can be found in /usr/local/gcc-3.1.1/bin (gcc for C, g++ for C++, g77 for f77, but not Fortran 90). You must include /usr/local/gcc-3.1.1/lib in your library path to compile dynamically linked binaries:
export LD_LIBRARY_PATH=/usr/local/gcc-3.1.1/lib/ (for bash syntax)

You may simply wish to compile a static binary without any extra calls but this can results in large executables and is not recommended.
Documentation on gcc can be found by typing "info gcc", for g77 by typing "info g77", or at the GNU web page (follow the links for g77). Contact Gabriel for further questions.

Libraries

BLAS, LAPACK and FFTW are found in /usr/local/bin. Use "-L/usr/local/lib -llapack -lf77blas -latlas -lfftw" to use them all.
The blas library are in libatlas.a and need the fortran wrapper libf77blas.a to be called from fortran code. To call blas or lapack libraries from C or C++ you might need to include libg2c.a in your link line. I had to recompile the blas and lapack using g77 to be able to use them from C++ (g++)... You might be able to use the system lapack and blas if you compile with the portland group c and c++ compilers (I heard there are buggy though). If someone is having troubles this particular issue, contact Gabriel.
Ensure that you do not use the BLAS or LAPACK in /usr/lib. These libraries are slow (unoptimised), while those in /usr/local/lib are bullet-like in comparison.

Applications

Some useful applications are installed: TeX, gnuplot, xmgrace, gv etc. are present. Please make suggestions if you would like anything else installed.

What is broken?

mail is broken. mail was not intended to be configured, but sent mail is currently bounced back to your NREL address due to a bad send address ("-1", the master node, not erwin.nrel.gov). Perhaps we will fix this.

Why erwin?

You should know this
Last update: 20th November 2002
gabriel_bester@nrel.gov