************************************************************************* * * * MPICH2-MX * * * * MPICH2 over Myrinet Express (ch_mx) documentation * * * * Copyright (C) 2007 Myricom, Inc. * * Author: Myricom, Inc. * * * ************************************************************************* README of MPICH2-MX MPICH2-MX provides support for Myricom's Myrinet Express (MX) communication layer. MPICH2-MX may be used with either MX-10G or MX-2G. See MX's README for supported NICs. Table of Contents: I. Installation 1. Configuring and compiling 2. Runtime tunables 2.1 Registration cache 2.2 Error handling 2.3 Send cancellation II. MPICH2-MX Performance III. Caveats 1. Multiple NICs are not supported 2. Polling versus blocking mode 3. Checksums IV. License V. Support =============== I. Installation =============== MPICH2-MX requires Myricom's MX version 1.2.1 or higher. See MX's README for the supported list of platforms. 1. Configuring and compiling MX has been fully integrated into the MPICH2 build process. To build MPICH2-MX, you will need to do the following: $ export MX=/opt/mx $ export LD_LIBRARY_PATH=$MX/lib $ ./configure --with-device=ch_mx --with-mx=/opt/mx replacing /opt with the actual path to MX. Then run: $ make $ make install If you want to build shared libraries, run: $ ./configure --help or read the MPICH2 manual. 2. Runtime tunables You can change some behaviors in MPICH2-MX by setting some environment variables. Some of these affect MX directly and others only impact MPICH2. 2.1 Registration cache MX has an internal memory registration cache (regcache) than can improve repetitive communication of large messages. By default, MX will try to use the regcache. Previously, the regcache was not the default and was enabled with MX_RCACHE=1. In applications that override memory functions such as malloc(), the MX regcache will not work. You can disable the regcache with: $ export MX_RCACHE=0 2.2 Error handling By default, MX will abort if an error occurs. This is useful for catching errors but can be ignored if the upper layers of software expect errors and can handle them correctly. MPICH2, in general, can tolerate some errors. The ch_mx device can handle some errors and abort for others. You can safely change the behavior to not abort on MX error by setting: $ export MX_ERRORS_ARE_FATAL=0 This setting is necessary to pass the errhan tests in the MPICH2 test suite. 2.3 Send cancellation In MPI, it is optional as to whether an implementation will cancel a send. By default, MPICH2-MX will not cancel sends. You can enable this feature by setting: $ export MX_ENABLE_CANCEL_SEND=1 This setting is necessary to pass *scancel tests in the pt2pt tests in the MPICH2 test suite. This will also switch error handling to return rather than abort. 2.4 Recv mode By default, MPICH2-MX will using polling for blocking receives. You can change this behavior to a blocking mode or mixed mode (some polling, then blocking) by setting: $ export MX_RECV_POLLING=N where N is -1, 0, positive integer. The value -1 indicates polling, the value 0 indicates blocking, and a positive integer value will poll this many times before blocking. Changing the behavior to blocking will lower CPU usage but increase latency. You will need to test various values to determine which is best for your application. 2.5 Unexpected queue length By default, MX will buffer up to 4 MB of unexpected messages before starting to drop unexpected messages (the sender will automatically try to retransmit). You can alter this amount by setting: $ export MX_UNEX_Q_LENGTH=N where N is the number of bytes to buffer. ========================= II. MPICH2-MX Performance ========================= On MX-2G systems, MPICH2-MX should easily saturate the link and use minimal CPU. On MX-10G systems, MPICH2-MX can saturate the link and use moderate CPU resources. MX-10G relies on PCI-Express which is relatively new and performance varies considerably by processor, motherboard and PCI-E chipset. Refer to Myricom's website for the latest DMA read/write performance results by motherboard. The DMA results will place an upper-bound on MPICH2-MX performance. ============ III. Caveats ============ 1. Multiple NICs are not supported Currently, MPICH2-MX will only use the first NIC found (board 0). Future releases will support selecting boards and possibly multiple boards concurrently. 2. Polling versus blocking mode Currently, MPICH2-MX will poll during blocking calls. The leads to the lowest latency but increases processor load. Future releases will support polling, blocking, and poll some then block. 3. Checksums This release of MPICH2-MX does not support checksumming of messages to aid in debugging. We will add this feature in a future release. 4. Send cancellation If the sender sends two identical messages (same receiver, same MPI tag, same communicator) and if the receiver has not posted a recv for either message and if the sender wants to cancel the second message, MPICH2-MX will instead cancel the first message (it will be matched first). This will lead to undefined behavior. =========== IV. License =========== In addition to the standard MPICH2 license found in the COPYRIGHT file, Myricom adds the following for ch_mx: MPICH2-MX is copyright (C) 2007 of Myricom, Inc. MPICH2-MX is free software; you can redistribute it and/or modify it under the terms of version 2.1 of the GNU Lesser General Public License as published by the Free Software Foundation. MPICH2-MX is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU Lesser General Public License along with MPICH2; if not, write to the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. ========== V. Support ========== If you have questions about MPICH2-MX, please contact help@myri.com. /* -*- mode: c; c-basic-offset: 8; indent-tabs-mode: nil; -*- * vim:expandtab:shiftwidth=8:tabstop=8: */