Note on MPI buffering used by the portable UM

Why buffering is needed.

In the MPP implementation of the UM, when the inter-process communication is handled by calls to the general communicaton library (GCOM), which in turn calls the appropriate routines in whichever underlying communication library is actually in use (MPI / Cray SHMEM / PVM / etc).

Specifically, in the UM code has in the most part calls to GC_RSEND in the GCOM library to send a real array, and GC_RRECV to receive a real array. The way the code is written, all processes call GC_RSEND, and then all processes call GC_RRECV. This therefore makes an implicit dependence on the data being buffered somewhere between the sending and receiving stages (i.e. the code does not implement simultaneous pairs of send and receive, which would not require buffering).

Where does the buffering come from, and how much is provided?

How much buffering is needed?

In general, it would depend on the resolution of the model, and the MPP configuration in use. But here are a few numbers showing the maximum message size actually found to be passed in a few configurations (found by adding a "print" statement to GCOM):

Model Configuration Number of reals Comment
HadAM3
96x73x19
2x2 1824 = 96 x 19
2x4 1824
4x2 1558 = 19 x 82
HadCM3L:
96x73x19 atmos,
98x73x20 ocean
2x2 atmos,
1x4 ocean
7840 = 98 x 20 x 4.
NB this occurs
in the ocean steps.
4x2 atmos,
1x8 ocean
7840
2x4 atmos,
1x8 ocean
7840

As you can see, the 7840 reals in the HadCM3L ocean would require about 31kb at 32-bit or about 61kb at 64-bit. This exceeds the 16kb provided by the particular MPICH implementation mentioned above. The result was that an ocean-only or coupled integration with GCOM 1m1s5x5 and this MPI library was found to hang with process deadlock. The limit in GCOM 2.8 (625kb at 32-bit, 1250kb at 64-bit) exceeds the HadCM3L requirements by a factor of 20.

The buffering requirement appears to be proportional to the product of horizontal and vertical resolution (i.e. the number of interface points between different processors' regions) - and does not decrease with the number of processors, because the ocean always has only one processor in the zonal direction.

Conclusions

Use a recent version of GCOM with MPI buffering enabled; there is more than adequate buffering for climate studies, and the buffer size is easily increased if required.

If you do have to use the older GCOM, this mod may help deal with the large message which occurs in the ocean model.


Last edited: 30 November 2001
Alan Iwi <A.M.Iwi@rl.ac.uk>