This document describes the steps which I took in order to install a 32-bit version of the UKMO Portable Unified Model version 4.5 on a Beowulf cluster at Rutherford Laboratory and get a HadAM3 MPP test job running, but is written in the format of instructions in order for you to replicate the installation on another similar cluster. This document is quite long because of a certain amount of explanatory text, but the symbol is used to draw your attention to places where the required actions are described.
If the Unified Model has already been installed according to these instructions, you can skip to the section on running jobs.
NB the system in question ("tolkien") has the following characteristics (see also this information about clusters from Compusys):
You can install the UM in an arbitrary directory tree. However, these instructions assume that you have created a username called um
, and that you are running the installation process as the um
user, in which case the installation will default to using $UM_HOME=~um
and $UMDIR=~um/um
for the directory paths. If you choose not to do this, further tweaks will be needed as described in the note about setvars below.
Create a um
username on the system, setting the login shell to a Bourne-shell derivative (ksh
/ bash
). Log in as that user for the installation instructions which follow.
Having logged in as user um
, create a file called $HOME/compiler.setup
containing any commands which are locally required in order to run the Portland Compiler and access the MPI libraries (in Bourne-shell syntax). The following lines are examples:
export PGI=/usr/local/pgi MPI_HOME=/usr/local/mpich-gm-pgroup121-7 export PATH=$PGI/linux86/bin:$MPI_HOME/bin:$PATH export LM_LICENSE_FILE=7496@host.name.goes.hereThen type "
. $HOME/compiler.setup
" to load in the settings for this session. Having done so, you ought to have the commands pgf90
, mpif90
and mpicc
in your $PATH
; check by typing "which pgf90 mpif90 mpicc
".
Extract the main tarfile from the first CD:
mount /mnt/cdrom cd tar xvf /mnt/cdrom/um_system.tar
Now unpack the model by typing:
um/vn4.5/scripts/Install/unpackmodelGive the default answers to all the questions which are asked by "unpackmodel", except the following (which are encountered in the order listed):
7
(linux)pgf90
-r4 -i4
-O2 -Munroll -Mnoframe -Mvect=sse
mpif90
true
Now type:
. $HOME/setvars echo '. $HOME/setvars' >> $HOME/.profileThis will ensure that the variables set in
setvars
are set both in this session and in subsequent sessions.
Type the following commands, which will add lines to $HOME/setvars
echo >> $HOME/setvars echo '. $UM_HOME/compiler.setup' >> $HOME/setvarsThis will ensure that UM compilation jobs can find the compiler commands.
Now build the GCOM library, which is needed in order to run MPP jobs, as follows:
Type:
cd $UMDIR tar xvf /mnt/cdrom/gcom.tar cd gcom/rel_1m1s5x5/build/
Edit the Makefile
; there are a number of changes to be made, so here is a Makefile
which you can drop in place. Alternatively, here are the differences between that and the original makefile. (NB: download the Makefile rather than using copy and paste from your browser, as the difference between tabs and spaces is crucial.)
Now type: "make
"
Now edit $UMDIR/vn4.5/source/compile_vars
and change two of the lines in the "load options" section as follows (in order to add link paths for GCOM):
@load LCOM_PATH=-L. -L$(UMDIR)/gcom/rel_1m1s5x5 @load LCOM_LIBS=-lgcom1m1s5x5_mpi
bin
" directory.
cd $UMDIR/vn4.5/scripts/Install ./configure_execs cd $UMDIR/bin for i in bcreconf convpp cumf fieldop makebc mergeum pptoanc pumf; do ln -s ../vn4.5/utils/$i . done
cd $UMDIR/vn4.5/scripts/Install ./configure_all_sects
The data files which you have unpacked in the model installation are 64-bit, with big-endian byte ordering. You want 32-bit, with little-endian byte ordering. The second CD contains some 32-bit datafiles, but they are big-endian byte-ordering. So the aim is to use them in conjunction with the "bigend
" program to generate what we need.
Start by removing the old 64-bit files:
cd $UMDIR/vn4.5 rm -fr ancil cd ../PUM_Input/vn4.5 rm -fr ancil dumps lbcs
Now extract the 32-bit files from the CD:
umount /mnt/cdrom eject (load 2nd CD) mount /mnt/cdrom cd $UM_HOME tar xvf /mnt/cdrom/data32.tar
Now run "bigend
" on the 32-bit files and move them to the correct paths:
cd $UM_HOME find data32 -type f -not -name basin.index -print -exec sh -c "bigend -32 {} {}.tmp; mv -f {}.tmp {}" \; mv data32/um/PUM_Input/vn4.5/ancil32 um/PUM_Input/vn4.5/ancil mv data32/um/PUM_Input/vn4.5/dumps32 um/PUM_Input/vn4.5/dumps mv data32/um/PUM_Input/vn4.5/lbcs32 um/PUM_Input/vn4.5/lbcs mv data32/um/vn4.5/ancil32 um/vn4.5/ancil rmdir -p data32/um/PUM_Input/vn4.5 data32/um/vn4.5
You should ensure that a script with the pathname ~um/setvars_4.5
exists on the system, and that its contents reflect the paths used in the installation which you want to use. If you installed the UM under the home directory of user um
, then this will already be the case. But if not, then you may need to create a um
username and/or add a symbolic link to setvars_4.5
script in the installation directory.
The reason for this requirement is that the SUBMIT
scripts which are generated by the user interface (when modified as described below) will contain this as the default path.
If it is not possible for you to create this path, alternatives are:
umui2.0/vn4.5/processing/submit
script, changing the "STARTUPFILE
" lines near the top, and do likewise with umui2.0/vn4.5.1/processing/submit
)~/umui_jobs/$JOBID/SUBMIT
" files on a per-job basis between doing "process" and "submit" in the user interface.You may also wish to edit the setvars_4.5
script to include the following lines before the "exports" section. This will give the users more flexibility to set their own paths for $TMPDIR
etc.
# now load in the user's setvars to allow # the user to override any of the above variables if [ -f $HOME/setvars -a "$HOME" != "$UM_HOME" ] then . $HOME/setvars fi
The xargs
program is used as part of the compilation process when you run the UM. Unfortunately the standard version of xargs
enforces an artificially small limit on the space occupied by environment variables, which stops it working.
You should put into the $UMDIR/bin
directory a version of xargs
which has the environment size limit removed; for this you can either:
xargs
executable (remember to do "chmod +x xargs
"), tar xvfz findutils-4.1.6.tar.gz patch -p0 < argmax.patch cd findutils-4.1.6 ./configure make cp xargs/xargs $UMDIR/bin/
The script modifications which will be used for running jobs with the mpirun
command make use of a version of the env
command which has an added feature of being able to read environment variables from a file.
You should put into the $UMDIR/bin
directory the modified env
program;for this you can either:
env
executable (remember to do "chmod +x env
"), tar xvfz sh-utils-2.0.tar.gz patch -p0 < envfile.patch cd sh-utils-2.0 ./configure make cp src/env $UMDIR/bin/
The installation described in this page is for a system on which the PBS queuing system is in use. This machine has a command called qsub
for submitting jobs. The UM knows to make use of the qsub
command, but it assumes usage as on the Cray. The PBS version of qsub
differs from the Cray version in two important respects: the options switches aredifferent, and the file being submitted should contain "#PBS
" lines rather than #QSUB
lines. Unfortunately, there are several different places in the UM from which qsub
is called; so rather than change the UM, a "wrapper" script is used to act as an interface between the UM and the PBS qsub
.
Copy this qsub
script (click to download) to $UMDIR/bin
, and it will act as a wrapper for the PBS version (/usr/local/bin/qsub
), making it look like Cray qsub
. (Remember to do "chmod +x qsub
".)
The above wrapper script also has one other effect: if it detects that the submitted job is a model compilation job rather than a model execution job, it will fall back to using at
rather than qsub
to submit the job. This is so that compilation jobs will run on the master node of the Beowulf rather than one of the slave nodes. If you do not want this behaviour, find the line of the script which says "$use_at_for_compile_jobs=1;
" and change the 1
to 0
.
As a final step to installation of the UM, you are recommended to recursively set correct accessible file permissions to the entire UM installation, to avoid problems later. The command "chmod -R og-w+rX $UM_HOME
" should do the trick, (although you may need to modify it if you need to restrict access to certain groups of users in order to comply with usage agreement).
In order to submit jobs on the Beowulf, you will need to make some changes to the UM User Interface (UMUI) on the machine on it is run (not necessarily the Beowulf). These changes are distributed by Jeff Cole for use with the CSAR T3E machine (turing), and may therefore already have been applied at many UGAMP sites.
If the changes have not already been applied, you will need to change to the parent directory of the umui2.0
directory, and unpack this umui2.0_changes.tar
tar file.
(NB this is the same tar file as distributed by Jeff Cole, except that it does not contain backup copies of the original files. If you want to keep backups, you can extract the tar file using "tar xvf umui2.0-changes.tar --suffix=.orig
")
Before you can run UM jobs, you will need to create (or add to) these two files in the home directory of your user account on the Beowulf:
.rhosts
hostname usernameand one of these lines should match the account on which you run the UMUI. (Ensure that neither the
.rhosts
file nor your home directory is either group- or world-writable, else your .rhosts
will not be honoured by the system.)
.profile
. /path/to/UM_HOME/setvars_4.5(with
/path/to/UM_HOME
substituted appropriately, of course; note the space between the "." and the pathname). This will be necessary in order for job re-submission to work properly (including the submission of the model run after compilation). Note that this .profile
is used even if your login shell is csh
/ tcsh
.
Here is a description of the changes which you will need to make to the example MPP atmosphere-only job which is supplied with the UMUI, in order to produce something which will run on the Beowulf. In the UMUI you should find a job with ID=xaaab, owner=frav, description="Climate 96x73x19 - MPP generic", version="4.5.1". (NB you may need to turn off the experiment owner filter on the search in order to find the experiment.)
Start by copying the MPP job xaaab into one of your experiments, (or if job xaaab doesn't exist, then create a new job at version 4.5.1, open it in read-write mode, upload this basis file and save the job).
The following instructions will detail a number of changes that are made to the example test job in order to have a runnable job. (In case it helps, this basis file is an example of the job configuration after applying those changes).
Now make the following changes. In the case of the mods (and script mod and compile option override), you will need to click on the filenames to download the mods; then choose a path on the Beowulf to save them to, and enter into the UMUI the path which you have chosen. (You may wish to use environment variables to define the directories; see under "Sub-Model Independent".)
umui_input.tar
on the first CD.
mpirun-chgmp121-7
NB this script mod makes use of the following perl script which you also need to download:
mkgmconf
. Remember to make it executable after downloading it.
The script mod assumes a path of /usr/local/sbin/mkgmconf
; if you install it elsewhere then edit the script mod to point to the path chosen.
$DATAM
and $DATAW
to where you want the output directories.
stzonm1a_no_optim
Definitely add the following Fortran modifications:
$MODS/general/lux_open.mod
lux_32bit.mod
linuxf_mpp.mod
relax_pstartest.mod
fixmeanctl.mod
linux_ocean.mod
The following mods are not needed to get the HadAM3 test job running, but may be needed / useful for other jobs:
coupledfix.mod
env_f_new.mod
, plus the C mod env_c.mod
to go with it.atmstep_flush.mod
fixstdia.mod
ars1f405
and use ars1f405-fixstdia.mod
instead.All's ready! Save, process, submit.
This will launch a compilation job on the master node (using "at
": can be inspected with atq
and killed with atrm
), followed by a run job on the slave nodes (using "qsub
": can be inspected with qstat
and killed with qdel
).
Please do let me have any comments on this document.
Alan Iwi <A.M.Iwi@rl.ac.uk>
Last edited: 14 November 2001