SEE ALSO: SYSTEM STATUS INFORMATION
Contents -- click to jump to given section:
User registration will be performed by the e-Science centre.
To request an account, please contact the e-Science centre (contact information below), with the following information:
Please note that your account will be created with a home directory which is readable by others, although you can change this if you wish.
A list of names associated with the COAPEC programme has been supplied to the e-Science centre by Helen Snaith. Provided that you are already listed, an account will be set up for you. If not, then the e-Science centre will not accept your application directly, but please instead contact Helen Snaith who will apply on your behalf if appropriate.
Lewis is a Beowulf Cluster consisting of 17 dual-processor 2.4GHz Pentium 4 Xeon machines, of which one is for interactive logins and control and 16 are for batch processing (hereafter, "master node" and "slave nodes"). The slave nodes (but not the master node) are connected with Myrinet 2000 networking, giving fast inter-process communication for parallel programs. The slave nodes each have 512MB memory; the master node has 1GB memory.
The following system software is provided: RedHat Linux with usual bundled software, Intel fortran compiler version 7, MPICH library for parallel code, OpenPBS queuing system with Maui job scheduler, Totalview debugger. (More specific modelling-related software is described under "Using the Unified Model" below).
For security reasons, all logins to lewis are via ssh.
The hostname is "lewis.esc.rl.ac.uk", (being hosted at the e-Science Centre at Rutherford Lab; see also why "Lewis"?).
To use SSH:
Note that ssh connections described above will connect to the master node. If you get "permission denied" messages trying to connect, then your host probably needs to be authorised; see under "obtaining an account" above.
You should not normally log into the slave nodes interactively because that may adversely affect performance of other people's jobs; if for exceptional reasons you need to log into them, then from the master node you can type "rsh node2" through to "rsh node17".
Please note that there is no FTP server running on lewis. The recommended way to copy files from or to lewis is via the secure copy ("scp") utility, which should be distributed with your SSH client.
Lewis does have FTP clients ftp and ncftp installed, although beware that using them will cause the passwords of your other accounts to be transmitted in cleartext.
Your files on lewis are stored in the /home area (with no distinction between "home" and "data" areas). The entire filestore is backed up regularly, and is subject to quotas.
Quotas are applied on a group basis, with each group corresponding to a COAPEC member institution. The filespace quotas for each group are as follows:
| Birmingham | 60GB |
| Liverpool | 60GB |
| Oxford | 120GB |
| RAL | 60GB |
| Reading | 300GB |
| Sheffield | 60GB |
| SOC | 300GB |
| UCL | 60GB |
| UEA | 180GB |
The remaining unallocated space (~500GB) may be used to provide shared-access data, or alternatively may be used to increase quotas in future.
To show your group's quota and usage, use the command "quota -g". Note that your group cannot create files in excess of this quota, even for a short time; this should ensure that provided your group remains within quota it cannot be inconvenienced by other groups filling the disk.
In the event that your group cannot manage usage within its quota informally, it is also possible to impose filestore quotas on certain users. To request this, your Principal Investigator should contact the e-Science Centre. Please note that user quotas would not replace the group quotas, but merely limit how much of the group allocation can be used by those users.
Jobs may also write to the temporary area /tmp. Please note that every node sees a different /tmp directory, which is local to the given node, and that these are regularly purged of old files.
The following information may be useful, although skip this section for a quick introduction, because you can submit Unified Model jobs as described below without needing to know about the queuing system.
Lewis is running the OpenPBS queuing software with the MAUI scheduler. These have been configured to allow a number of job execution queues, with the following characteristics:
Jobs are submitted to queues "normal", "dev" or "compile". "normal" and "dev" are for batch jobs, which run on the slave nodes of the cluster. "compile" jobs are single-processor, and run on the master node.
Jobs in the "normal" and "dev" queues are routed through to a number of execution queues depending on the number of processors. These execution queues are configured with differing time limits. For example the queue "normal8" allows up to 8 processors and has a time limit of 18 hours, whereas the queue "normal16" allows up to 16 processors and has a time limit of 9 hours.
Jobs submitted to the "dev" queue have higher priority than "normal" jobs, but a shorter execution time limit. It is intended for development work, so please use "normal" for normal work where turnaround time is not critical. In addition "dev" jobs a higher priority, there are four nodes (eight processors) of the cluster which are reserved exclusively for "dev" jobs from 9am to 6pm Monday to Friday, although they will run "normal" jobs overnight and at weekends.
Apart from the distinction between "dev" and "normal" queues, the highest priority job is the one which has been waiting longes. There is, however, one exception to this: a "fair share" factor. This is configured on a per-group basis; each group has a permitted CPU allocation. If the recent (few days') CPU usage has been above this level, then the group's jobs will be deprioritised, but will still run if the CPU is otherwise idle. The fair share usage level for each group will be set proportional to the group filesystem quotas shown above (although the scaling factor may be revised depending on levels of machine usage). As with filestore quotas, user-specific limits could also be defined if required.
Resource requirements for UM jobs are set within the user interface. For other jobs, they can be specified as command-line options to the "qsub" command, or with control directives within the jobs script (see the OpenPBS documentation, including "qsub" manual page).
When specifying a time limit for your job, it is to your advantage not to overstate the requirements. The scheduler has an intelligent "backfill" algorithm, which means that if for example the highest priority job is a 16-processor job which it calculates can be started in 4 hours' time, but meanwhile there are 8 processors idle, it will run your 8 processor job if it requests no more than 4 hours' CPU. Likewise
However, beware that if you exceed the requested time limit your job will be terminated.
One approach to specifying time limits is empirically: submit a test job with a generous time limit, and base future job requirements on the actual run time, adding say a 10% safety margin. (See also the benchmark timings for the Unified Model below.)
Memory requirements are not such an issue; because the cluster is largely a distributed memory environment, the scheduler is not configured to penalise jobs which ask for generous memory allocation (although as there is 512MB physical memory per dual-processor node, it is not sensible to use more than about 200MB per processor).
| PBS Command | Description |
|---|---|
| qsub jobscript | Submit a job script. (Not normally typed manually in UM submission.) |
| qdel jobid | Stop a queued or running job |
| qstat | Show the job queue |
| qstat -n | Show the job queue, reporting the nodes on which jobs are running |
| qstat -Qf [queuename] | Show the limits for all queues or a specified queue |
| qhold | Hold a queued job to prevent execution |
| qrls | Release a previously held job for potential execution |
| pbsnodes -l | List any nodes which are down (null output means all nodes functioning) |
| Maui Command | Description |
| showq | Show the job queue, with job start time and duration information |
| showstart | Show the estimated start time of a queued job |
| showres [-n] | Show the reservations made by the scheduler (with optional node-specific detail) |
| diagnose -p | Show the priorities of queued jobs |
| diagnose -f | Show fair share usage information |
The following information relates to use of the Met Office Unified Model specifically on lewis. For general information about the model, please refer to the UM Users' guide. If you do not have this already, a copy resides under file:///usr/local/um/umdoc_system/index.htm on lewis; web browsers mozilla, netscape, lynx are installed for the purpose of viewing locally stored HTML files.
There are two installations of the portable UM version 4.5 on lewis: one at 64-bit and one at 32-bit. The base directories ($UM_HOME) for these installations are ~um64 (/usr/local/um64) and ~um32 (/usr/local/um32) respectively.
In addition, there is an installation of the UM User Interface (UMUI), located at ~umui (/usr/local/umui).
You should make sure that you have signed the usage agreement with the Met Office for the Portable Unified Model (even though there are currently no technical measures on lewis to enforce this). One way to do this is via the British Atmosphere Data Centre; if you register for access to the PUM Software, they will process the usage agreement on behalf of the Met Office, even though you do not need to download the software from the BADC because it is already installed.
Before you first run the UM, you need to do create files in your home directory:
You need to have a $HOME/setvars file. This should be a copy of (or link to) ~um64/setvars or ~um32/setvars, depending whether you want to run the 64- or 32-bit installation. If you may wish to run 64- and 32-bit integrations at the same time, please see these instructions.
$HOME/.profile should contain the line
. $HOME/setvarsYou need this file even if your login shell is tcsh. (Note the "." at the start of the line.)
You may need to create directories for output files. The example jobs specify directories $HOME/datam/$RUNID (for model time-stamped files) and $HOME/dataw/$RUNID (for other output files), so if you retain this setup you should create the parent directories $HOME/datam and $HOME/dataw.
$HOME/.rhosts should contain the lines
lewis.esc.rl.ac.uk your_username
master your_username
Ensure that permissions on .rhosts are restricted, e.g. type: chmod 600 ~/.rhosts
You can run the UM User Interface (UMUI) on either on lewis (on the master node), or on a machine in your home institution.
To run the UMUI on lewis, simply type umui. This has the advantage of being self-contained, and also ease of submitting jobs (you just press the "Submit" button). Also if many people run the UMUI on lewis, then they can all easily inspect one another's jobs. (The UMUI installation on lewis also incorporates these changes which allow automated running of "hand-edit" scripts.)
However, you prefer to on the UMUI on another machine. In that case:
You may need to apply some changes to the UMUI: change to the parent directory of the umui2.0 directory, and apply this patch (type "patch -p0 < umui2.0_changes.patch"), or if you do not have the GNU patch program then extract this tar file instead. (NB these are the same changes used for running on the CSAR T3E, "turing", so may already have been applied in some institutions.)
To view the jobs, whose run IDs are referred to in this document, transfer the basis files from lewis, and "upload" them into a job in your UM distribution. On lewis, the basis files are located under ~umui/umui2.0/DBSE/, e.g. ~umui/umui2.0/DBSE/xabc/d for job xabcd -- or run the UMUI on lewis and click "download".
To submit jobs, you cannot simply click "Submit". Instead, after clicking "process", run this umsumbit-ssh script, typing: "umsubmit-ssh -h lewis.esc.rl.ac.uk -u username runid". You will have to type your password a few times (unless you have set up ssh authorization keys, outside the scope of this document).
The .rhosts file mentioned above is redundant -- used when submitting jobs from the UMUI running on lewis.
Assuming you have done the setup described above, you are ready to take one of the sample jobs found in experiment xaar in the UMUI on lewis, edit your username in the "General details" section (as a minimum; make any other changes required), then save, process and submit the job.
(Remember that where a job is described as 64 or 32 bit, it is still necessary to ensure that your setvars file points to the installation at the corresponding precision.)
Remember also that you can track the execution of the jobs with the "qstat" command.
The output from the runs is initially stored in the following locations:
However, the sample HadCM3 jobs shown above have post-processing options turned on -- see documentation. (The main purpose of this is to permit restart files to be written frequently, as required temporarily for climate meaning, but without leaving a great number of these permanently stored.) The result is that some output files and dump files which are to be retained are moved from $DATAM to the directory $HOME/um_archive/run_id/ (subdirectories ppfiles and dumps).
At the end of the run, the job output ("leave") file is copied to $HOME/umui_out
If you wish to receive email when your UM job starts and/or finishes, check the relevant options in "Submodel-independent" -> "Output management".
As a convenience, you can create a file called $HOME/.um_mail which will override any email address set in the job. If the file exists and contains an email address, then that address will be used if mail is sent, regardless of the address configured in the UMUI. If the file exists and is blank, then no mail will be sent from UM jobs, regardless of any options set in the UMUI.
A 100-year integration of HadCM3 has already been performed on lewis at 64-bit precision. It is intended for use as a control integration for perturbation experiments performed on lewis, and also for other statistical studies.
Time-average model output fields have been stored on the BADC, with different averaging periods (monthly, seasonal, annual, decadal). These files contain the same fields as for the COAPEC 100 year HadCM3 integration on the Cray, but with the addition of MEAD diagnostics in the ocean fields, and are in NetCDF format. Monthly restart dumps (some in compressed format) are also stored on the BADC.
The data has been added as a subdirectory of the COAPEC data directory on the BADC, so for access to this dataset, follow that link and register for that dataset on the BADC if you have not already done so. The subdirectory is called "100yr_beowulf", and it contains a README file with further information about the files.
If doing a perturbation run, take job xaara in the UMUI on lewis as a starting point. (The actual run was performed as xaaqa; however some special values were used for some of the resource specifications, which have been removed in xaara. Also xaara specifies 8 processors instead of 16, and the run length is changed to 1 year so that a 100-year run is not submitted inadvertently.)
Although the integration was started directly from a initial conditions obtained from running HadCM3 on a Cray, the model does not appear to drift from the start, at least as regards the global mean surface temperature: see plot. Other fields will be added here as they are evaluated.
For those who do not have access to lewis, but want to inspect the job, xaaqa is also available for download:
Converting a job which is setup to run on another machine onto the cluster requires a rather variable amount of effort: at best it can be a relatively easy task, although you may encounter complications.
Here are some pointers to get you started. Feel free to request assistance from Alan Iwi, who will help on a best-efforts basis, but please first make an attempt yourself with the following information.
First of all, a pointer to help you identify where to make some of the changes described below: in the UM User Interface, as installed on Lewis and elsewhere, there are a number of standard experiments (atmosphere = xaaa, ocean = xaab, limited area = xaac). These experiments contain jobs for non-MPP, MPP generic and MPP T3E architectures, and it is instructive to examine the differences between these jobs (under "Job->Difference" in the UMUI).
You may also want to examine the list of code modifications used in the sample jobs (in experiment xaar in the UMUI on lewis), and also examine the differences between the standard HadAM3 jobs in xaaa with the HadAM3 jobs in xaar, and merge some of those differences with your job. Generally the mods concerned have comment lines describing what they do.
Assuming that you want to run a parallel job, you must include the script modification $UM_HOME/local/vn4.5/script_mods/mpirun-mpich. The mpirun command will not be invoked correctly without it. Also ensure that the target machine is selected as "Distributed memory parallel", with "Generic type" compiler.
(It is possible to run single-processor jobs on lewis, by selecting the target machine as "single node of a parallel machine". However, please do this for development purposes only, because it is not the main use envisaged for the cluster.)
You should ensure that all files (mods, ancillary files, etc) which are referenced in the UMUI exist in the correct paths on lewis (except for any user STASHmaster files which should exist on the machine on which you run the UMUI, possibly a different machine).
Any binary data files should be in little-endian byte ordering. (This typically applies to restart dumps and ancillary files, but other control data files are ASCII -- examine the file if in doubt.) This means that they will need byte-swapping if original files are in Cray T3E native byte ordering. To do this, do "$UM_HOME/um/bin/bigend -64 input_file output_file" or "swap8 < input_file > output_file". (For a 32-bit ancillary file use analogously "bigend -32" or "swap4".)
You must turn off packing of all fields written to dumps and PP files. The reason for this is that the packing and unpacking routines use T3E-specific library calls. If your model is configured to write a packed field, then a call will be executed to this routine but the routine will be missing. This will result in your model crashing immediately with an "Address error".
If your input files have packing, then you will need to produce an unpacked version for the same reason, before transferring them to the cluster. To do this you should use the ieee utility which is included when the UM is installed on a Cray T3E.
Ensure that your username details are set correctly, and that the job submission resources are set according to the queue information describe above (i.e. the job queue should be either "normal" or "dev", and the job time limit should be consistent with the queue limits as shown by "qstat -Qf") -- NB compilation jobs (or the "compile" part of a compile and run job) are automatically directed to the "compile" queue, without your needing to specify this.
Some of the code sections chosen under "Submodel Independent" -> "Submodel Independent Section Options" and "Atmosphere" -> "Scientific parameters and sections" -> section by section choices involve the choice of including non-MPP, generic MPP or T3E-specific MPP code. Ensure that generic MPP is selected.
Finally, a note that if you are porting a job from the Cray T3E, beware that the portable version of the nupdate utility (which is used to preprocess the source code and apply any modifications) is less functional than the Cray version, and in particular it does not handle modifications to lines of code which themselves came from other modsets rather than from the source decks. If you are in doubt what code is actually being compiled, then examine the preprocessed source code, which the model stores in a tar.gz file in the $DATAW directory for the job.
The following graphs show the speed benchmarks for HadAM3 and HadCM3, with and without writing output diagnostics (STASH). (Preview bitmaps only - click on images for scalable Encapsulated Postscript versions.)
In the above units "Model years per total CPU days", total CPU days is the wallclock time multiplied by the number of processors. So for example a 4-processor job given as 0.8 model years per total CPU days would take about 3.1 days for a 10-year run.
Although the speed of the integration increases with the number of processors (scaling curves better than the "gradient for no further speedup" shown), it becomes decreasingly efficient (scaling curves worse than "gradient for perfect scaling").
This means that unless part of the cluster is otherwise idle, it is better to run jobs on fewer processors rather than more, but with more jobs running concurrently (e.g. different ensemble members or different people's jobs).
The one exception is the anomalous 1x2 configuration for HadCM3 at 64-bit. The cause is insufficient memory (see page on memory requirement, particularly last paragraph). This could be rectified by buying more memory, if there is a demand -- backed up with funds! -- but for the moment please avoid this configuration!
Except in some heavily network-latency limited configurations, there is about a factor of 2 speedup by running at 32-bit instead of 64-bit. Initial model validation tests (see document in PDF or .ps.gz format) suggest that this may be appropriate for the atmosphere model but that caution should be exercised in using 32-bit with the ocean model.
The presence or absence of STASH makes a significant difference. In the case of the HadAM3 benchmarks, the set of diagnostics is the extremely large set in the standard HadAM3 job which is distributed with the portable model. In the case of the HadCM3 benchmarks, the set of diagnostics is the set of diagnostics in the COAPEC control integration.
Clearly you will want to write output from your model! However, there may be efficiency implications in writing large numbers of diagnostics when using many processors.
In separate tests, it has been found that the slowdown is largely due to network latency during sampling of diagnostics; therefore if you write time-averaged diagnostics, could the time average be calculated from diagnostics sampled every six hours instead of every timestep? This is configurable under "Edit Time Profile" in the STASH windows of the UMUI.
A number of software utilities are available on lewis. Please keep usage of the machine which is not directly associated with running models to sensible levels; if your postprocessing is resource-hungry then please consider transferring the output files to a machine in your home institution.
Any reasonable request for additional free software will be considered. If you are considering installing a package in your home directory, please first contact Alan Iwi, who may decide instead to install it in a shared location for wider benefit.
To run the Intel Fortran 90 compiler (version 7), first load the appropriate setup script, depending on your shell:
(Note that when compiling/running UM jobs, this is already taken care of in setvars.)
Then the compiler is invoked with ifc.
Note that you also have to load the setup script before running programs compiled with this compiler (or at least set the environment variable LD_LIBRARY_PATH as done by the setup script).
For documentation on the Intel Fortran Compiler, see
The PDF readers acroread and xpdf are installed.
A number of Unified model file utilities are installed accompanying the UM distribution. You will find convsh and xconv in /usr/local/bin.
Additionally, there are ancillary file utilities which are specific to the 64- or 32-bit distributions (pumf, etc). These are in $UM_HOME/um/vn4.5/utils; use the correct value of $UM_HOME for the precision of your ancillary files.
See also these additional UM file utilities (written by Alan Iwi) -- some of these are installed on the system.
NetCDF libraries (version 3.5.0) are installed, including programming interface for Fortran, C, C++ (see in /usr/local/lib), and with bundled utilities ncdump and ncgen.
In addition, the NetCDF Operators (NCO) package (version 2.7.2) is installed. This includes various utilites for operating on sets of NetCDF files, e.g. concatenating, averaging, differencing.
Ferret, (version 5.51) is installed. This will enable visualisation of NetCDF files (e.g. converted from Unified Model output).
Note that to run Ferret, it is first necessary to type source /usr/local/bin/ferret_paths. There is not an equivalent script for other shells, so if your login shell is bash or ksh then first invoke tcsh manually.
Some documentation of use of Ferret from the command line is shown on the Tools for COAPEC web page.
Additionally, Ferret can be invoked with the Graphical User Interface, by typing "ferret -gui". Typically datafiles is opened by doing: "File" -> "Open data set" -> "Search" -> .... A variable for plotting is then selected with the Data "Select" button, and a variety of plots (e.g. zonal mean plots) can then be performed, including e.g. zonal means.
When the GUI is used, a file called ferret.jnl is created in the current directory. This contains the commands which can be typed in command-line mode in order to perform the same operations which were performed using the GUI.
Climate Data Analysis Tools (CDAT) is installed -- see documentation
This will enable visualisation of NetCDF files (e.g. converted from Unified Model output). CDAT does not currently open UM output files directly; however, this functionality is being developed, and will be made available on lewis when it exists.
There is additionally a GUI, which works in a similar way to the Ferret GUI. Additionally it has some useful features, such as including a cos-latitude weighting when averaging fields in the meridional direction. To run the GUI, type vcdat.
A number of floating licences for IDL are owned by the research section of the BADC, and it has been decided to make them available to users on lewis on an informal basis. You are encouraged to log out of IDL sessions promptly after use, because this arrangement may be reviewed if the number of simultaneous users of these licences on lewis is enough to inconvenience the BADC research group. If this arises, it may be possible to fund dedicated IDL licences for lewis, but no promises.
To run IDL, first load the appropriate setup script, depending on your shell:
Then type idl, or for documentation type idlhelp.
The Totalview debugger is installed on the system. However, this has not yet been tested and documented. (Once it is done, instructions will be added here. There is currently no timescale for this, but if you have a particular requirement for the debugger, then please contact Alan Iwi, who may be willing to bring forward this activity.)
The following disk areas have been set up for model-related files which were not supplied with the standard model installation. To facilitate collaboration, they are world-writable so you can place model-related files which will be of benefit to others.
The following areas are common to both 32- and 64-bit installations:
The following areas exist separately for each of the 32- and 64-bit installations:
The guidelines for putting files there are:
In the standard HadAM3 and HadCM3 jobs described above, environment variables have been set which point to many of these directories.
<A.M.Iwi@rl.ac.uk>