IB6054601-00 D Page iQSimplifyInfiniPath User GuideVersion 2.0
InfiniPath User GuideVersion 2.0 Page x IB6054601-00 DQNotes
C – TroubleshootingInfiniPath MPI TroubleshootingC-26 IB6054601-00 DQThere is no route to any host:$ mpirun -np 2 -m ~/tmp/q mpi_latency 100 100ssh:
C – TroubleshootingInfiniPath MPI TroubleshootingIB6054601-00 D C-27Q$ mpirun -np 2 -m ~/tmp/q -q 60 mpi_latency 1000000 1000000MPIRUN: MPI progress
C – TroubleshootingInfiniPath MPI TroubleshootingC-28 IB6054601-00 DQC.8.13MPI StatsUsing the -print-stats option to mpirun will result in a listing t
C – TroubleshootingUseful Programs and Files for DebuggingIB6054601-00 D C-29QC.9Useful Programs and Files for DebuggingThe most useful programs and f
C – TroubleshootingUseful Programs and Files for DebuggingC-30 IB6054601-00 DQC.9.3Summary of Useful Programs and FilesUseful programs and files are s
C – TroubleshootingUseful Programs and Files for DebuggingIB6054601-00 D C-31QC.9.4boardversionIt may be useful to keep track of the current version o
C – TroubleshootingUseful Programs and Files for DebuggingC-32 IB6054601-00 DQC.9.5ibstatusThis program displays basic information on the status of In
C – TroubleshootingUseful Programs and Files for DebuggingIB6054601-00 D C-33QC.9.8ipath_checkoutipath_checkout is a bash script used to verify that t
C – TroubleshootingUseful Programs and Files for DebuggingC-34 IB6054601-00 DQ--workdir=DIRUse DIR to hold intermediate files created while running te
C – TroubleshootingUseful Programs and Files for DebuggingIB6054601-00 D C-35Q00: LID=0x30 MLID=0x0 GUID=00:11:75:00:00:07:11:97 Serial: 1236070407C.9
IB6054601-00 D 1-1Section 1 IntroductionThis chapter describes the objectives, intended audience, and organization of the InfiniPath User Guide.T
C – TroubleshootingUseful Programs and Files for DebuggingC-36 IB6054601-00 DQC.9.13lsmod If you need to find which InfiniPath and OpenFabrics modules
C – TroubleshootingUseful Programs and Files for DebuggingIB6054601-00 D C-37QThe following table shows the possible contents of the file, with brief
C – TroubleshootingUseful Programs and Files for DebuggingC-38 IB6054601-00 DQC.9.17stringsThe command strings can also be used. Its format is as foll
IB6054601-00 D D-1Appendix DRecommended ReadingReference material for further reading is provided here.D.1References for MPIThe MPI Standard specifica
D – Recommended ReadingRocksD-2 IB6054601-00 DQD.6ClustersGropp, William, Ewing Lusk, and Thomas Sterling, Beowulf Cluster Computing with Linux, Secon
IB6054601-00 D E-1Appendix EGlossaryA glossary is provided below for technical terms used in the documentation.bandwidth The rate at which data can be
E – GlossaryE-2 IB6054601-00 DQGID For Global Identifier. Used for routing between different InfiniBand subnets.GUID For Globally Unique Identifier fo
E – GlossaryIB6054601-00 D E-3QLID For Local Identifier. Assigned by the Subnet Manager (SM) to each visible node within a single InfiniBand fabric. I
E – GlossaryE-4 IB6054601-00 DQMTRR For Memory Type Range Registers. MTRR For "Memory Type Range Registers". Used by the InfiniPath driver
E – GlossaryIB6054601-00 D E-5QSDP For Sockets Direct Protocol. An InfiniBand-specific upper layer protocol. It defines a standard wire protocol to su
1 – IntroductionInteroperability1-2 IB6054601-00 DQ Appendix E Glossary of technical terms IndexIn addition, the InfiniPath Install Guide contains i
E – GlossaryE-6 IB6054601-00 DQNotes
IB6054601-00 D Index-1IndexAACPI, enabling C-9BBatch queuing for MPI jobs B-1–B-4BenchmarkingMPI bandwidth A-2–A-3MPI latency measurement A-1–A-2MPI l
InfiniPath User GuideVersion 2.0 Beta2Index-2 IB6054601-00 DQconfiguration of on SUSE and SLES 10 2-8–2-11layered Ethernet driver 2-6ipathbug_helper C
1 – IntroductionWhat’s New in this ReleaseIB6054601-00 D 1-3QNOTE: OpenFabrics was known as OpenIB until March 2006. All relevant references to OpenIB
1 – IntroductionSupported Distributions and Kernels1-4 IB6054601-00 DQSupport for multiple versions of MPI has been added. You can use a different ver
1 – IntroductionSoftware ComponentsIB6054601-00 D 1-5Q1.8Software ComponentsThe software provided with the InfiniPath Interconnect product consists of
1 – IntroductionDocumentation and Technical Support1-6 IB6054601-00 DQNOTE: 32 bit OpenFabrics programs using the verb interfaces are not supported in
1 – IntroductionDocumentation and Technical SupportIB6054601-00 D 1-7Q Readme fileThe Troubleshooting Appendix for installation, InfiniPath and OpenF
1 – IntroductionDocumentation and Technical Support1-8 IB6054601-00 DQNotes
IB6054601-00 D 2-1Section 2 InfiniPath Cluster AdministrationThis chapter describes what the cluster administrator needs to know about the Infini
InfiniPath User GuideVersion 2.0 QPage ii IB6054601-00 DInformation furnished in this manual is believed to be accurate and reliable. However, QLogic
2 – InfiniPath Cluster AdministrationMemory Footprint2-2 IB6054601-00 DQMPI include files are in: /usr/includeMPI programming examples and source for
2 – InfiniPath Cluster AdministrationMemory FootprintIB6054601-00 D 2-3Qon system configuration. OpenFabrics support is under development and has not
2 – InfiniPath Cluster AdministrationConfiguration and Startup2-4 IB6054601-00 DQThis breaks down to a memory footprint of 331MB per node, as follows:
2 – InfiniPath Cluster AdministrationConfiguration and StartupIB6054601-00 D 2-5QYou can check and adjust these BIOS settings using the BIOS Setup Uti
2 – InfiniPath Cluster AdministrationConfiguration and Startup2-6 IB6054601-00 DQand unmounted when the infinipath script is invoked with the "st
2 – InfiniPath Cluster AdministrationConfiguration and StartupIB6054601-00 D 2-7QYou must create a network device configuration file for the layered E
2 – InfiniPath Cluster AdministrationConfiguration and Startup2-8 IB6054601-00 DQIf you are using DHCP (dynamic host configuration protocol), add the
2 – InfiniPath Cluster AdministrationConfiguration and StartupIB6054601-00 D 2-9QStep 3 is applicable only to SLES 10; it is required because SLES 10
2 – InfiniPath Cluster AdministrationConfiguration and Startup2-10 IB6054601-00 DQCheck each of the lines starting with SUBSYSTEM=, to find the highe
2 – InfiniPath Cluster AdministrationConfiguration and StartupIB6054601-00 D 2-11Q6. To verify that the configuration files are correct, you will norm
InfiniPath User GuideVersion 2.0QIB6054601-00 D Page iiiAdded info about using MPI over uDAPL. Need to load modules rdma_cm and rdma_ucm.3.7Added sect
2 – InfiniPath Cluster AdministrationConfiguration and Startup2-12 IB6054601-00 DQTo verify the configuration, type:# ifconfig ib0The output from this
2 – InfiniPath Cluster AdministrationStarting and Stopping the InfiniPath SoftwareIB6054601-00 D 2-13Qand you can stop it again like this:# /etc/init.
2 – InfiniPath Cluster AdministrationStarting and Stopping the InfiniPath Software2-14 IB6054601-00 DQTo disable the driver on the next system boot, u
2 – InfiniPath Cluster AdministrationConfiguring ssh and sshd Using shosts.equivIB6054601-00 D 2-15QIf there is output, you should look at the output
2 – InfiniPath Cluster AdministrationConfiguring ssh and sshd Using shosts.equiv2-16 IB6054601-00 DQThis next example assumes the following: Both the
2 – InfiniPath Cluster AdministrationPerformance and Management TipsIB6054601-00 D 2-17Q0zwxSL7GP1nEyFk9wAxCrXv3xPKxQaezQKs+KL95FouJvJ4qrSxxHdd1NYNR0D
2 – InfiniPath Cluster AdministrationPerformance and Management Tips2-18 IB6054601-00 DQnodes. Since these are presumed to be specialized computing a
2 – InfiniPath Cluster AdministrationPerformance and Management TipsIB6054601-00 D 2-19QFor SUSE 9.3 and 10.0 run this command as root:# /sbin/chkconf
2 – InfiniPath Cluster AdministrationPerformance and Management Tips2-20 IB6054601-00 DQ2.10.6Hyper-ThreadingIf using Intel processors that support Hy
2 – InfiniPath Cluster AdministrationPerformance and Management TipsIB6054601-00 D 2-21Q00: LID=0x30 MLID=0x0 GUID=00:11:75:00:00:07:11:97 Serial: 123
InfiniPath User GuideVersion 2.0 QPage iv IB6054601-00 D© 2006, 2007 QLogic Corporation. All rights reserved worldwide.© PathScale 2004, 2005, 2006. A
2 – InfiniPath Cluster AdministrationCustomer Acceptance Utility2-22 IB6054601-00 DQ$Id: kernel.org InfiniPath Release 2.0 $$Date: 2006-09-15-04:16 $/
2 – InfiniPath Cluster AdministrationCustomer Acceptance UtilityIB6054601-00 D 2-23Q3. Gather and analyze system configuration from nodes.4. Gather an
2 – InfiniPath Cluster AdministrationCustomer Acceptance Utility2-24 IB6054601-00 DQNotes
IB6054601-00 D 3-1Section 3 Using InfiniPath MPI This chapter provides information on using InfiniPath MPI. Examples are provided for compiling a
3 – Using InfiniPath MPIGetting Started with MPI3-2 IB6054601-00 DQThese examples assume that: Your cluster administrator has properly installed Inf
3 – Using InfiniPath MPIGetting Started with MPIIB6054601-00 D 3-3QHere ./cpi designates the executable of the example program in the working director
3 – Using InfiniPath MPIConfiguring MPI Programs for InfiniPath MPI3-4 IB6054601-00 DQand run it with:$ mpirun -np 2 -m mpihosts ./pi3f90The C++ progr
3 – Using InfiniPath MPIInfiniPath MPI DetailsIB6054601-00 D 3-5QYou may need to instead pass arguments to configure directly, in a fashion similar to
3 – Using InfiniPath MPIInfiniPath MPI Details3-6 IB6054601-00 DQThe process is shown in the following steps:1. Create a key pair. Use the default fil
3 – Using InfiniPath MPIInfiniPath MPI DetailsIB6054601-00 D 3-7Q3.5.2Compiling and LinkingThese scripts invoke the compiler and linker for programs i
IB6054601-00 D Page vTable of ContentsSection 1 Introduction1.1 Who Should Read this Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3 – Using InfiniPath MPIInfiniPath MPI Details3-8 IB6054601-00 DQline options. See the PathScale compiler documentation and the man pages for pathcc a
3 – Using InfiniPath MPIInfiniPath MPI DetailsIB6054601-00 D 3-9QTo use the Intel compiler for Fortran90/Fortran95 programs, use:$ mpif90 -f90=ifort .
3 – Using InfiniPath MPIInfiniPath MPI Details3-10 IB6054601-00 DQThe current workaround for this is to compile on a supported and compatible distribu
3 – Using InfiniPath MPIInfiniPath MPI DetailsIB6054601-00 D 3-11Qprogram-name will generally be the pathname to the executable MPI program. If the MP
3 – Using InfiniPath MPIInfiniPath MPI Details3-12 IB6054601-00 DQprograms will be started on that host before using the next entry in the mpihosts fi
3 – Using InfiniPath MPIInfiniPath MPI DetailsIB6054601-00 D 3-13QLD_LIBRARY_PATH, and other environment variables for the node programs through the u
3 – Using InfiniPath MPIInfiniPath MPI Details3-14 IB6054601-00 DQ3.5.9Multiprocessor NodesAnother command line option, -ppn, instructs mpirun to assi
3 – Using InfiniPath MPIInfiniPath MPI DetailsIB6054601-00 D 3-15Q-verbose Print diagnostic messages from mpirun itself. Can be useful in troubleshoot
3 – Using InfiniPath MPIInfiniPath MPI Details3-16 IB6054601-00 DQ-nonmpi Run a non-MPI program. Required if the node program makes no MPI calls. Defa
3 – Using InfiniPath MPIMPDIB6054601-00 D 3-17Q-statsfile file-prefixSpecifies alternate file to receive the output from the -print-stats option. Defa
InfiniPath User GuideVersion 2.0 Page vi IB6054601-00 DQ2.10 Performance and Management Tips . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3 – Using InfiniPath MPIFile I/O in MPI3-18 IB6054601-00 DQ3.8.1MPD DescriptionThe Multi-Purpose Daemon (MPD) was developed by Argonne National Labora
3 – Using InfiniPath MPIInfiniPath MPI and Hybrid MPI/OpenMP ApplicationsIB6054601-00 D 3-19Qaccessed via some network file system, typically NFS. Par
3 – Using InfiniPath MPIDebugging MPI Programs3-20 IB6054601-00 DQmay be desirable to run multiple MPI processes and multiple OpenMP threads per node.
3 – Using InfiniPath MPIInfiniPath MPI LimitationsIB6054601-00 D 3-21QSymbolic debugging is easier than machine language debugging. To enable symbolic
3 – Using InfiniPath MPIInfiniPath MPI Limitations3-22 IB6054601-00 DQNo ports available on /dev/ipathNOTE: If port sharing is enabled, this limit is
IB6054601-00 D A-1Appendix ABenchmark ProgramsSeveral MPI performance measurement programs are installed from the mpi-benchmark RPM. This Appendix des
A – Benchmark ProgramsBenchmark 2: Measuring MPI Bandwidth Between Two NodesA-2 IB6054601-00 DQThis benchmark always involves just two node programs.
A – Benchmark ProgramsBenchmark 3: Messaging Rate MicrobenchmarksIB6054601-00 D A-3QMPI_Isend function, while the receiving node consumes them as quic
A – Benchmark ProgramsBenchmark 3: Messaging Rate MicrobenchmarksA-4 IB6054601-00 DQbenchmark (as shown in the example above). It has been enhanced wi
A – Benchmark ProgramsBenchmark 4: Measuring MPI Latency in Host RingsIB6054601-00 D A-5QA.4Benchmark 4: Measuring MPI Latency in Host RingsThe progra
InfiniPath User GuideIB6054601-00 D Page viiQInfiniPath User GuideVersion 2.03.11 Debugging MPI Programs . . . . . . . . . . . . . . . . . . . . . .
A – Benchmark ProgramsBenchmark 4: Measuring MPI Latency in Host RingsA-6 IB6054601-00 DQNotes
IB6054601-00 D B-1Appendix BIntegration with a Batch Queuing SystemMost cluster systems use some kind of batch queuing system as an orderly way to pro
B – Integration with a Batch Queuing SystemA Batch Queuing ScriptB-2 IB6054601-00 DQrequire that his node program be the only application running on e
B – Integration with a Batch Queuing SystemA Batch Queuing ScriptIB6054601-00 D B-3Qby mpirun.Each line consists of a node name, a colon, and the numb
B – Integration with a Batch Queuing SystemLock Enough Memory on Nodes When Using SLURMB-4 IB6054601-00 DQThe following command will terminate all pro
IB6054601-00 D C-1Appendix CTroubleshootingThis Appendix describes some of the existing provisions for diagnosing and fixing problems. The sections ar
C – TroubleshootingBIOS SettingsC-2 IB6054601-00 DQstates of the LEDs. The green LED will normally illuminate first. The normal state is Green On, Amb
C – TroubleshootingBIOS SettingsIB6054601-00 D C-3QC.2.1MTRR Mapping and Write CombiningMTRR (Memory Type Range Registers) is used by the InfiniPath d
C – TroubleshootingBIOS SettingsC-4 IB6054601-00 DQC.2.3Incorrect MTRR Mapping Causes Unexpected Low BandwidthThis same MTRR Mapping setting as descri
C – TroubleshootingSoftware Installation IssuesIB6054601-00 D C-5QC.3Software Installation IssuesThis section covers issues related to software instal
InfiniPath User GuideVersion 2.0 Page viii IB6054601-00 DQC.4.5 OpenFabrics Load Errors If ib_ipath Driver Load Fails . . . . . . . . . . C-10C.4.6
C – TroubleshootingSoftware Installation IssuesC-6 IB6054601-00 DQIn older distributions, such as RHEL4, the 32-bit glibc will be contained in the lib
C – TroubleshootingKernel and Initialization IssuesIB6054601-00 D C-7Q8. Reload all modules by using this command (as root):# /etc/init.d/infinipath s
C – TroubleshootingKernel and Initialization IssuesC-8 IB6054601-00 DQC.4.1Kernel Needs CONFIG_PCI_MSI=yIf the InfiniPath driver is being compiled on
C – TroubleshootingKernel and Initialization IssuesIB6054601-00 D C-9QNOTE: This problem has been fixed in the 2.6.17 kernel.org kernel.C.4.3Driver Lo
C – TroubleshootingKernel and Initialization IssuesC-10 IB6054601-00 DQA zero count in all CPU columns means that no interrupts have been delivered to
C – TroubleshootingKernel and Initialization IssuesIB6054601-00 D C-11QC.4.6InfiniPath ib_ipath Initialization FailureThere may be cases where ib_ipat
C – TroubleshootingSystem Administration TroubleshootingC-12 IB6054601-00 DQC.5OpenFabrics IssuesThis section covers items related to OpenFabrics, inc
C – TroubleshootingInfiniPath MPI TroubleshootingIB6054601-00 D C-13QC.6.1Broken Intermediate LinkSometimes message traffic passes through the fabric
C – TroubleshootingInfiniPath MPI TroubleshootingC-14 IB6054601-00 DQ$ mpirun -vMPIRUN:Infinipath Release2.0 : Built on Wed Nov 19 17:28:58 PDT 2006 b
C – TroubleshootingInfiniPath MPI TroubleshootingIB6054601-00 D C-15QOn a SLES 10 system, you would need: compat-libstdc++ (for FC3) compat-libstdc+
InfiniPath User GuideIB6054601-00 D Page ixQInfiniPath User GuideVersion 2.0C.9.11 ipath_pkt_test . . . . . . . . . . . . . . . . . . . . . . . . . .
C – TroubleshootingInfiniPath MPI TroubleshootingC-16 IB6054601-00 DQFor these examples in Section C.8.5 below, we assume that these new locations are
C – TroubleshootingInfiniPath MPI TroubleshootingIB6054601-00 D C-17QThe above compiler command insures that the program will run using this path on a
C – TroubleshootingInfiniPath MPI TroubleshootingC-18 IB6054601-00 DQExamples are given below.In the following command, the HP-MPI version of mpirun i
C – TroubleshootingInfiniPath MPI TroubleshootingIB6054601-00 D C-19QThe following two commands will both work properly:QLogic mpirun and executable u
C – TroubleshootingInfiniPath MPI TroubleshootingC-20 IB6054601-00 DQ ^pathf95-389 pathf90: ERROR BORDERS, File = communicate.F, Lin
C – TroubleshootingInfiniPath MPI TroubleshootingIB6054601-00 D C-21Q integer count, datatype, root, comm, ierror ! Call the Fortran 77 style impl
C – TroubleshootingInfiniPath MPI TroubleshootingC-22 IB6054601-00 DQIf this file is not present or the node has not been rebooted after the infinipat
C – TroubleshootingInfiniPath MPI TroubleshootingIB6054601-00 D C-23QFound unknown timer type typeunknown frame type typerecv done: available_tids now
C – TroubleshootingInfiniPath MPI TroubleshootingC-24 IB6054601-00 DQThe following message indicates that a node program may not be processing incomin
C – TroubleshootingInfiniPath MPI TroubleshootingIB6054601-00 D C-25QThese messages appear in the mpirun output. Most are followed by an abort, and po
Comentários a estes Manuais