Building and running the UM on the Beowulf

Introduction

This document describes the steps which I took in order to install a 32-bit version of the UKMO Portable Unified Model version 4.5 on a Beowulf cluster at Rutherford Laboratory and get a HadAM3 MPP test job running, but is written in the format of instructions in order for you to replicate the installation on another similar cluster. This document is quite long because of a certain amount of explanatory text, but the symbol --> is used to draw your attention to places where the required actions are described.

If the Unified Model has already been installed according to these instructions, you can skip to the section on running jobs.

NB the system in question ("tolkien") has the following characteristics (see also this information about clusters from Compusys):

Installation files

-->

You should obtain from the Met. Office the distribution CDs for Portable Unified Model Version 4.5 (email portable_um@metoffice.com).

Because of copyright, the installation files are not downloadable from this web page. However, you can verify that you are using exactly the same versions of the installation files as the ones on which these instructions are based, by comparing against the following checksums (obtained with the cksum command):

CD 1
	3905733269 348160 gcom.tar
	1474628344 543324160 um_system.tar
	4009872946 30720 umui_input.tar
	4211383340 6666240 umui_package.tar
	2958471921 30720 umui_templates.tar
	3220693870 27074560 um_visual.tar
	
CD 2
	2163338337 300257280 data32.tar

Installation procedure

A note about mods

Before describing the installation steps, it is worth noting the following.

The model code and scripts will ultimately require certain modifications (or "mods") to be applied to the model code and/or scripts in order for the model to be successfully on the Beowulf cluster. Although the UM installation procedure allows for mods to be applied at installation time, the aim here is to do a "vanilla" installation without applying such mods. Certain mods will subsequently have to be applied to individual jobs.

The reason for this choice is two-fold: first, it gives the user more control over what mods to apply; second, if the user's job applies further mods to the same source files (or "decks"), then any mods applied at installation time can end up being ignored.

Where to install the UM

You can install the UM in an arbitrary directory tree. However, these instructions assume that you have created a username called um, and that you are running the installation process as the um user, in which case the installation will default to using $UM_HOME=~um and $UMDIR=~um/um for the directory paths. If you choose not to do this, further tweaks will be needed as described in the note about setvars below.

-->

Create a um username on the system, setting the login shell to a Bourne-shell derivative (ksh / bash). Log in as that user for the installation instructions which follow.

Compiler setup

-->

Having logged in as user um, create a file called $HOME/compiler.setup containing any commands which are locally required in order to run the Portland Compiler and access the MPI libraries (in Bourne-shell syntax). The following lines are examples:

        export PGI=/usr/local/pgi
        MPI_HOME=/usr/local/mpich-gm-pgroup121-7
	export PATH=$PGI/linux86/bin:$MPI_HOME/bin:$PATH
        export LM_LICENSE_FILE=7496@host.name.goes.here
Then type ". $HOME/compiler.setup" to load in the settings for this session. Having done so, you ought to have the commands pgf90, mpif90 and mpicc in your $PATH; check by typing "which pgf90 mpif90 mpicc".

Unpacking the model

-->

Extract the main tarfile from the first CD:

        mount /mnt/cdrom
	cd
	tar xvf /mnt/cdrom/um_system.tar

-->

Now unpack the model by typing:

        um/vn4.5/scripts/Install/unpackmodel
Give the default answers to all the questions which are asked by "unpackmodel", except the following (which are encountered in the order listed):

Changes to scripts

-->

Now type:

        . $HOME/setvars
        echo '. $HOME/setvars' >> $HOME/.profile
This will ensure that the variables set in setvars are set both in this session and in subsequent sessions.

-->

Type the following commands, which will add lines to $HOME/setvars

        echo >> $HOME/setvars
        echo '. $UM_HOME/compiler.setup' >> $HOME/setvars
This will ensure that UM compilation jobs can find the compiler commands.

Building the GCOM library

Now build the GCOM library, which is needed in order to run MPP jobs, as follows:

-->

Type:

	cd $UMDIR
	tar xvf /mnt/cdrom/gcom.tar 
	cd gcom/rel_1m1s5x5/build/

-->

Edit the Makefile; there are a number of changes to be made, so here is a Makefile which you can drop in place. Alternatively, here are the differences between that and the original makefile. (NB: download the Makefile rather than using copy and paste from your browser, as the difference between tabs and spaces is crucial.)

-->

Now type: "make"

-->

Now edit $UMDIR/vn4.5/source/compile_vars and change two of the lines in the "load options" section as follows (in order to add link paths for GCOM):

        @load    LCOM_PATH=-L. -L$(UMDIR)/gcom/rel_1m1s5x5
        @load    LCOM_LIBS=-lgcom1m1s5x5_mpi

Building the small executables

--> Type the following commands, which will build the small executables, and create links to them in the "bin" directory.
        cd $UMDIR/vn4.5/scripts/Install
        ./configure_execs 

        cd $UMDIR/bin
        for i in bcreconf convpp cumf fieldop makebc mergeum pptoanc pumf; do
          ln -s ../vn4.5/utils/$i .
        done

Compiling the model

--> Type the following:
        cd $UMDIR/vn4.5/scripts/Install
        ./configure_all_sects

Preparing the data files

The data files which you have unpacked in the model installation are 64-bit, with big-endian byte ordering. You want 32-bit, with little-endian byte ordering. The second CD contains some 32-bit datafiles, but they are big-endian byte-ordering. So the aim is to use them in conjunction with the "bigend" program to generate what we need.

-->

Start by removing the old 64-bit files:

        cd $UMDIR/vn4.5
        rm -fr ancil
        cd ../PUM_Input/vn4.5
        rm -fr ancil dumps lbcs
-->

Now extract the 32-bit files from the CD:

        umount /mnt/cdrom
        eject
        (load 2nd CD)
        mount /mnt/cdrom
        
        cd $UM_HOME
        tar xvf /mnt/cdrom/data32.tar
 

-->

Now run "bigend" on the 32-bit files and move them to the correct paths:

        cd $UM_HOME
        find data32 -type f -not -name basin.index -print -exec sh -c "bigend -32 {} {}.tmp; mv -f {}.tmp {}" \;
        mv data32/um/PUM_Input/vn4.5/ancil32 um/PUM_Input/vn4.5/ancil
        mv data32/um/PUM_Input/vn4.5/dumps32 um/PUM_Input/vn4.5/dumps
        mv data32/um/PUM_Input/vn4.5/lbcs32 um/PUM_Input/vn4.5/lbcs
        mv data32/um/vn4.5/ancil32 um/vn4.5/ancil
        rmdir -p data32/um/PUM_Input/vn4.5 data32/um/vn4.5 
 

Post-installation tweaks

setvars

-->

You should ensure that a script with the pathname ~um/setvars_4.5 exists on the system, and that its contents reflect the paths used in the installation which you want to use. If you installed the UM under the home directory of user um, then this will already be the case. But if not, then you may need to create a um username and/or add a symbolic link to setvars_4.5 script in the installation directory.

The reason for this requirement is that the SUBMIT scripts which are generated by the user interface (when modified as described below) will contain this as the default path.

If it is not possible for you to create this path, alternatives are:

-->

You may also wish to edit the setvars_4.5 script to include the following lines before the "exports" section. This will give the users more flexibility to set their own paths for $TMPDIR etc.

       # now load in the user's setvars to allow
       # the user to override any of the above variables
       if [ -f $HOME/setvars -a "$HOME" != "$UM_HOME" ]
       then
         . $HOME/setvars
       fi

xargs

The xargs program is used as part of the compilation process when you run the UM. Unfortunately the standard version of xargs enforces an artificially small limit on the space occupied by environment variables, which stops it working.

-->

You should put into the $UMDIR/bin directory a version of xargs which has the environment size limit removed; for this you can either:

env

The script modifications which will be used for running jobs with the mpirun command make use of a version of the env command which has an added feature of being able to read environment variables from a file.

-->

You should put into the $UMDIR/bin directory the modified env program;for this you can either:

qsub

The installation described in this page is for a system on which the PBS queuing system is in use. This machine has a command called qsub for submitting jobs. The UM knows to make use of the qsub command, but it assumes usage as on the Cray. The PBS version of qsub differs from the Cray version in two important respects: the options switches aredifferent, and the file being submitted should contain "#PBS" lines rather than #QSUB lines. Unfortunately, there are several different places in the UM from which qsub is called; so rather than change the UM, a "wrapper" script is used to act as an interface between the UM and the PBS qsub.

-->

Copy this qsub script (click to download) to $UMDIR/bin, and it will act as a wrapper for the PBS version (/usr/local/bin/qsub), making it look like Cray qsub. (Remember to do "chmod +x qsub".)

The above wrapper script also has one other effect: if it detects that the submitted job is a model compilation job rather than a model execution job, it will fall back to using at rather than qsub to submit the job. This is so that compilation jobs will run on the master node of the Beowulf rather than one of the slave nodes. If you do not want this behaviour, find the line of the script which says "$use_at_for_compile_jobs=1;" and change the 1 to 0.

File permissions

-->

As a final step to installation of the UM, you are recommended to recursively set correct accessible file permissions to the entire UM installation, to avoid problems later. The command "chmod -R og-w+rX $UM_HOME" should do the trick, (although you may need to modify it if you need to restrict access to certain groups of users in order to comply with usage agreement).

UMUI changes

In order to submit jobs on the Beowulf, you will need to make some changes to the UM User Interface (UMUI) on the machine on it is run (not necessarily the Beowulf). These changes are distributed by Jeff Cole for use with the CSAR T3E machine (turing), and may therefore already have been applied at many UGAMP sites.

-->

If the changes have not already been applied, you will need to change to the parent directory of the umui2.0 directory, and unpack this umui2.0_changes.tar tar file.

(NB this is the same tar file as distributed by Jeff Cole, except that it does not contain backup copies of the original files. If you want to keep backups, you can extract the tar file using "tar xvf umui2.0-changes.tar --suffix=.orig")

Running jobs

Home-directory dotfiles

--> Before you can run UM jobs, you will need to create (or add to) these two files in the home directory of your user account on the Beowulf:

Job edits

Here is a description of the changes which you will need to make to the example MPP atmosphere-only job which is supplied with the UMUI, in order to produce something which will run on the Beowulf. In the UMUI you should find a job with ID=xaaab, owner=frav, description="Climate 96x73x19 - MPP generic", version="4.5.1". (NB you may need to turn off the experiment owner filter on the search in order to find the experiment.)

-->

Start by copying the MPP job xaaab into one of your experiments, (or if job xaaab doesn't exist, then create a new job at version 4.5.1, open it in read-write mode, upload this basis file and save the job).

The following instructions will detail a number of changes that are made to the example test job in order to have a runnable job. (In case it helps, this basis file is an example of the job configuration after applying those changes).

-->

Now make the following changes. In the case of the mods (and script mod and compile option override), you will need to click on the filenames to download the mods; then choose a path on the Beowulf to save them to, and enter into the UMUI the path which you have chosen. (You may wish to use environment variables to define the directories; see under "Sub-Model Independent".)

Job submission

-->

All's ready! Save, process, submit.

This will launch a compilation job on the master node (using "at": can be inspected with atq and killed with atrm), followed by a run job on the slave nodes (using "qsub": can be inspected with qstat and killed with qdel).

Feedback

Please do let me have any comments on this document.
Alan Iwi <A.M.Iwi@rl.ac.uk>
Last edited: 14 November 2001