Timing behaviour of PVM

This is the result of the programming assignment #2.

Creating a hostfile is simple. Just add the hosts you would like to include in your virtual machine to a file, one entry per line. For example:

sci-002.bgsu.edu
sci-003.bgsu.edu
sci-004.bgsu.edu
alpha.bgsu.edu

Then start the pvm with pvm <filename>.

See the logfile of the exercise session. Note that sci-003 failed to start for some reason.

This exercise contains lots of tedious work. The first thing I did was to write some scripts to support it. First I renamed timing.c to timing.c.in.

tim_mod.pl: This script reads timing.c.in, modifies the test parameters and writes timing.c.
runtest.sh: This script runs a full test on a machine.
eval_log.pl: Most of the output of runtest.sh is useless; this script creates a summary of the usefull facts.

Usage is simple:

Compile timing_slave on all hosts. (only once!)
Log into master host and setup PVM (adding hosts, etc).
Run runtest.sh <slave_host> > <logfile>
Run eval_log.pl <logfile> to get summary.
Halt pvm

Table I-A. Return trip times. Homogeneous Architecture A (SGI O2)

Table I-A

Master/Slave on same host

same host
msg size	Raw	Default	InPlace
100	969	1068	1070
1000	1094	1050	1215
10000	2320	2671	1660

Logfile

Master/Slave on different hosts

different hosts
msg size	Raw	Default	InPlace
100	2501	2513	2658
1000	3292	3308	3344
10000	14262	14263	14306

Logfile

2 users logged in

Table I-B. Return trip times. Homogeneous Architecture A (ALPHA)

Table I-B

Master/Slave on same host

same host
msg size	Raw	Default	InPlace
100	2150	2541	2346
1000	2048	2097	2048
10000	(19502) 2926	3030	failed! (terminated)

Logfile

Master/Slave on different hosts

different hosts
msg size	Raw	Default	InPlace
100
1000
10000

53 users logged in!!

Table I-C Return trip times. Homogeneous Architecture A (SGI MP)

Table I-C

Master/Slave on same host

same host
msg size	Raw	Default	InPlace
100	929	1261	1344
1000	1047	1017	1257
10000	1388	1691	1451

Logfile

Master/Slave on different hosts

different hosts
msg size	Raw	Default	InPlace
100
1000
10000

6 users logged in

Table I-D. Return trip times. Heterogeneous Architecture SGI O2 - ALPHA

Table I-D

Master/Slave on same host

same host
msg size	Raw	Default	InPlace
100
1000
10000

Master/Slave on different hosts

different hosts
msg size	Raw	Default	InPlace
100	2956	2873	2960
1000	4748	4710	12231
10000	31384	22473	19458

Logfile

Table II. Return trip times - Short message

Table II

Master/Slave on same host

same host
Arch	Time (w / w/o first sample)
SGI O2	1686 uSec (1782/993)
SGI MP	1822 uSec (1783/957)
ALPHA MP	3798 uSec (3861/2452)

Master/Slave on different hosts

different hosts
Archs	Time (w / w/o first sample)
O2 - O2	4635 uSec (16095/2478)
O2 - Sigma	3257 uSec (3261/2470)
O2 - Alpha	5736 uSec (6860/2934)

Data contained in Table I logfiles.

Time: Average over all tests which give this time, with first sample.
with first sample: Result of first test with first sample.
without first sample: Result of first test without first sample.

Table III. Data Packing times SGI O2

Table III

Master/Slave on same host

same host
msg size	Raw	Default	InPlace
100	43	45	26
1000	49	134	29
10000	212	264	35

Master/Slave on different hosts

different hosts
msg size	Raw	Default	InPlace
100
1000
10000

Data contained in Table I logfiles.

My observations

First, look at Table I-B, Master/Slave on same host, 10000 Byte message size. The Raw data test result seemed strange (value in parenthesis), so I redid the test. The InPlace test failed because of a process termination. Running the program under gdb leads to:

Program received signal SIGTERM, Terminated.
main (argc=1, argv=0x11ffffae8) at timing.c:118
118         if (pvm_recv (-1, -1) < 0) {

If you take a look at one of the logfiles you will see that the send time for the first message is much higher than the others. This is, as mentioned in classes, due to the fact that the server starts sending before the slave is spawned. This really is worth mentioning, because it has an impact on the test results. For example take SGI O2, homogenous arch, same host, short message round trip time. The resulting average is 1782 uSec with the first sample and 993 uSec w/o it!

Getting realistic results is very difficult because others are working in the system, in particular on Alpha. So there is a large spread in resulting time values.

Changes

I changed psum.c and spsum.c, and wrote the wrappers uni_psum.c and uni_spsum.c to fit the new source in the compile scheme.

Changes in psum.c

The following will, depending on nproc, spawn the slaves as required by the assignment. SGI6 is the architecture of the SGI O2s, ALPHAMP is Alpha and SGIMP64 is Sigma.

  switch(nproc) {
  case 10:
    numt = pvm_spawn(SLAVENAME, (char**)0, 0, "", nproc, tids);
    break;
  case 6:
    numt += pvm_spawn(SLAVENAME, (char**)0, PvmTaskArch, "SGIMP64", 2, tids+4);
  case 4:
    numt += pvm_spawn(SLAVENAME, (char**)0, PvmTaskArch, "ALPHAMP", 2, tids+2);
  case 2:
    numt += pvm_spawn(SLAVENAME, (char**)0, PvmTaskArch, "SGI6", 2, tids);
    break;
  default:
    printf("Requested number of slaves not implemented\n");
  }

This sends each slave it's portion to sum up.

#ifdef UNICAST
  printf("Using unicast send\n");
  for (i=0; i<nproc; i++) {
    low = i *((n/nproc)+1);
    high = low +((n/nproc)+1);
    if (high > n) {
      high = n;
    }
    send_len = high-low;

    pvm_initsend(PvmDataDefault);
    pvm_pkint(&i, 1, 1);
    pvm_pkint(&send_len, 1, 1);
    pvm_pkint(data+low, send_len, 1);
    pvm_send(tids[i], 0);
  }
#else
  ... broadcast code ...
#endif

Note that calculating the chunks size is different from the original source. The original would miss some numbers if (n/nproc) is not an integer.

Changes in spsum.c

Receive the data and sum it up.

#ifdef UNICAST
  printf("Using unicast send\n");
  pvm_upkint(&me, 1, 1);
  pvm_upkint(&n, 1, 1);
  pvm_upkint(data, n, 1);

/* calculate sum */
  result =0;
  for(i=0; i<n; i++) {
    result += data[i];
  }
#else
   ... broadcast code ...
#endif

Table IV. Timing values for psum, spsum application.

Table IV
	Multicast	Individual sends
number of slaves
2	15184 uSec	16536 uSec
4	31450 uSec	29488 uSec
6	35478 uSec	28167 uSec

What is the purpose of the hostfile?
What conclusions can you draw from the data you gathered in exercise 2?
What conclusions can you draw from the data you gathered in exercise 3?
What issues do you think need to be taken into consideration in analyzing the above data sets?

Q1

The purpose of the hostfile is to configure a virtual machine. The simplest form is to just list the host members of the PVM; those will be added on startup. Options to configure the hosts, eg startup or working directory, may be supplied.

It may also be used to specify the options for the hosts without adding them. Those options will be used if the machine is added later.

Q2

Interpreting the data is very difficult because of the huge spread. In general, for large data sets Raw or even InPlace should be used if possible. For little amount of data it makes almost no difference.

Starting processes is slow. So spawn them as soon as possible and give them enough work.

Q3

Using more slaves only slows the computation down, because most of the work the system does is spawning processes and sending messages; very costly operations!

The result of Table III suggests that there is not much difference between a multicast and several unicasts. This is not suprising, because the amount of data to transmit stays almost the same.

Q4

These tests are run in a used network, ie network traffic and busy computers. Therefore results will be falsified.

First I trusted in the short message times; after talking about that topic in classes I took a second look at the numbers (see Table II). My conclusion: Ensure that you know what you are measuring.

Timing behaviour of PVM

About

Exercise I

Exercise II

Table I-A. Return trip times. Homogeneous Architecture A (SGI O2)

Table I-B. Return trip times. Homogeneous Architecture A (ALPHA)

Table I-C Return trip times. Homogeneous Architecture A (SGI MP)

Table I-D. Return trip times. Heterogeneous Architecture SGI O2 - ALPHA

Table II. Return trip times - Short message

Table III. Data Packing times SGI O2

My observations

Exercise III

Changes

Changes in psum.c

Changes in spsum.c

Table IV. Timing values for psum, spsum application.

Questions

Answers