Viewing: README.ior-survey

Introduction :

  The ior_survey script can be used to test the performance of the lustre
file systems. It uses IOR (Interleaved Or Random), a script used for testing
performance of parallel file systems using various interfaces and access 
patterns.  IOR uses MPI for process synchronization.

General Description:

  ior_mpiio is a parallel file system test developed by the SIOP (Scalable
I/O Project) at LLNL. This parallel program performs parallel writes and
reads to/from a file using MPI-IO and reporting the throughput rates.

  MPI is used for process synchronization.  Under the control of compile-time
defined constants (and, to a lesser extent, environment variables), I/O is done
via MPI-IO. The data are written and read using independent parallel transfers
of equal-sized blocks of contiguous bytes that cover the file with no gaps and
that do not overlap each other. The test consists of creating a new file, 
writing it with data, then reading the data back.

  The data written are C integers. If the program runs successfully to
completion, it returns 0. If a problem is detected with any I/O routine, the
program exits with a value of IO_ERR.

  If a non-I/O problem is detected, the program exits with a value of
INTERNAL_ERR (this can be caused by a bug in the test program, or a problem in
MPI, or by inconsistencies in the environment variable settings).

Requirements :
	To run the ior_survey script following items are required.

1: IOR

  The IOR test should be obtained at
  ftp://ftp.llnl.gov/pub/siop/ior/

2: pdsh
	The tarball can be obtained from
   http://sourceforge.net/project/showfiles.php?group_id=33530&package_id=183641

3: pdsh-rcmd-ssh module
	The rpm for this could be found at
   http://sourceforge.net/project/showfiles.php?group_id=33530&package_id=183641

4: lam/mpi
	The tarball can be obtained from
   http://www.lam-mpi.org/7.1/download.php

5: You need to be a non-root user to execute the script and should have the 
   super-user privileges.

6: The user should have login on all the nodes without password on which the
   test is going to be run.



To make an entry into the sudoers file :

1: Become super user (root)

2: type visudo

3: make an entry as
  username   ALL=(ALL) NOPASSWD: ALL //(username is the name of the user)
 

Building IOR :

  Type 'gmake mpiio' from the IOR/ directory.  In
  IOR/src/C, the file Makefile.config currently has settings for AIX, Linux,
  OSF1 (TRU64), and IRIX64 to model on.  Note that MPI must be present for
  building/running IOR, and that MPI I/O must be available for MPI I/O, HDF5,
  and Parallel netCDF builds.  As well, HDF5 and Parallel netCDF libraries are
  necessary for those builds.  All IOR builds include the POSIX interface.

  Copy the IOR binary file in IOR/src/C/ to /usr/local/sbin/ using

	sudo cp IOR/src/C/IOR /usr/local/sbin/



Installing pdsh and pdsh-rcmd-ssh module :

1: Download the pdsh tarball

2: untar it using tar -xzvf (if tar.gz) or tar -xjvf(if tar.bz2) 

3: go to the pdsh directory and type ./bootstrap

4: configure it using the following command

	./configure --with-ssh

5: Build it using "make"

6: Install it using "sudo make install"

7: Download the pdsh-rcmd-ssh rpm

8: Install the rpm using "rpm -ivh pdsh-rcmd-ssh*"


Installing lam/mpi :

1: Download the lam tarball

2: untar it using tar -xzvf (if tar.gz) or tar -xjvf(if tar.bz2)

3: go to the lam directory and type ./configure

4: Build it using "make"

5: Install it using "sudo make install"

	The lam, IOR, pdsh should be installed on all the nodes on which the
	test is going to be run.
	
Note: Please make sure that you have installed the same version of lam on all
the nodes on which the test is going to be run.



Running the ior_survey script :

1: Lustre should be mounted at /mnt/lustre. Do 
	"touch /mnt/lustre/ior_survey_testfile"

2: Make a hostfile in which the ip addresses of all the nodes are present on
   the node from where the script is going to be executed.

3: run the lam using "lamboot -v -d hostfile". This will start lamd on all the
   nodes.

4: run the ior_survey script using "./ior_survey"

Note:
   The node names of the clients should be like rhea1, rhea2, rhea3, so on.
   The name of the cluster (1st part of the node name) should be set in the 
   ior_survey script in the cluster name field.
   e.g.  cluster=rhea //name of the cluster

   The client node numbers should be set as last part of the node name i.e.
   numeral part.
   e.g. client=(1)   //to run test on one node only node1.
	client=(1-2) //to run test on two nodes node1, node2.

	Please note that the hostfile should contain the ip addresses of only
   those nodes on which the lustre filesystem is mounted i.e. clients are 
   mounted.

	The details of the test can be found on the node from where the
   test was run as /tmp/ior_survey_run_date@start_time_nodename.detail

	The output of the IOR looks like
	
host1: access    bw(MiB/s)  block(KiB) xfer(KiB)  open(s)    wr/rd(s)   close(s)   iter
host1: ------    ---------  ---------- ---------  --------   --------   --------   ----
host1: write     1.58       2097152    1024.00    0.000873   1299.37    0.000132   0
host1:
host1: Max Write: 1.58 MiB/sec (1.65 MB/sec)
 
	where, 
		host1 : node on which the test is run
		access: the test which is run (write, rewrite, read, reread)
		bw    : band width
		block : total size to be written
		xfer  : block size to transfer here 1MB
		open  : time taken for open
		close : time taken for close
		wr/rd : time taken for read/write
		iteration : iteration no.
		Max write : Max_write speed obtained
		
Note : MB is defined as 1,000,000 bytes and MiB is 1,048,576 bytes.

	The summary of the test can be found on the node from where the
   test was run as /tmp/ior_survey_run_date@start_time_nodename.summary
   It contains the tests run and the status of those tests.


Instructions for graphing IOR results

   The plot-ior.pl script will plot the results from the .detail file
   generated by ior-survery. It will create a data file for writes as
   /tmp/ior_survey_run_date@start_time_nodename.detail.dat1 and for reads
   as /tmp/ior_survey_run_date@start_time_nodename.detail.dat2 and gnuplot
   file as /tmp/ior_survey_run_date@start_time_nodename.detail.scr.

   $ perl iokit-parse-ior /tmp/ior_survey_run_date@start_time_nodename.detail