Timing behaviour of PVM
- About
- Exercise I
-
Exercise II
- Table I-A. Return trip times. Homogeneous Architecture A (SGI O2)
- Table I-B. Return trip times. Homogeneous Architecture A (ALPHA)
- Table I-C Return trip times. Homogeneous Architecture A (SGI MP)
- Table I-D. Return trip times. Heterogeneous Architecture SGI O2 - ALPHA
- Table II. Return trip times - Short message
- Table III. Data Packing times SGI O2
- My observations
- Exercise III
- Questions
- Answers
About
This is the result of the programming assignment #2.
Exercise I
Creating a hostfile is simple. Just add the hosts you would like to include in your virtual machine to a file, one entry per line. For example:
sci-002.bgsu.edu sci-003.bgsu.edu sci-004.bgsu.edu alpha.bgsu.edu
Then start the pvm with pvm <filename>.
See the logfile of the exercise session. Note that sci-003 failed to start for some reason.
Exercise II
This exercise contains lots of tedious work. The first thing I did was to write some scripts to support it. First I renamed timing.c to timing.c.in.
- tim_mod.pl
- This script reads timing.c.in, modifies the test parameters and writes timing.c.
- runtest.sh
- This script runs a full test on a machine.
- eval_log.pl
- Most of the output of runtest.sh is useless; this script creates a summary of the usefull facts.
Usage is simple:
- Compile timing_slave on all hosts. (only once!)
- Log into master host and setup PVM (adding hosts, etc).
- Run runtest.sh <slave_host> > <logfile>
- Run eval_log.pl <logfile> to get summary.
- Halt pvm
Table I-A. Return trip times. Homogeneous Architecture A (SGI O2)
Master/Slave on same host
|
Master/Slave on different hosts
|
2 users logged in
Table I-B. Return trip times. Homogeneous Architecture A (ALPHA)
Master/Slave on same host
|
Master/Slave on different hosts
|
53 users logged in!!
Table I-C Return trip times. Homogeneous Architecture A (SGI MP)
Master/Slave on same host
|
Master/Slave on different hosts
|
6 users logged in
Table I-D. Return trip times. Heterogeneous Architecture SGI O2 - ALPHA
Master/Slave on same host
|
Master/Slave on different hosts
|
Table II. Return trip times - Short message
Master/Slave on same host
|
Master/Slave on different hosts
|
Data contained in Table I logfiles.
- Time
- Average over all tests which give this time, with first sample.
- with first sample
- Result of first test with first sample.
- without first sample
- Result of first test without first sample.
Table III. Data Packing times SGI O2
Master/Slave on same host |
|
Master/Slave on different hosts |
|
Data contained in Table I logfiles.
My observations
First, look at Table I-B, Master/Slave on same host, 10000 Byte message size. The Raw data test result seemed strange (value in parenthesis), so I redid the test. The InPlace test failed because of a process termination. Running the program under gdb leads to:
Program received signal SIGTERM, Terminated. main (argc=1, argv=0x11ffffae8) at timing.c:118 118 if (pvm_recv (-1, -1) < 0) {
If you take a look at one of the logfiles you will see that the send time for the first message is much higher than the others. This is, as mentioned in classes, due to the fact that the server starts sending before the slave is spawned. This really is worth mentioning, because it has an impact on the test results. For example take SGI O2, homogenous arch, same host, short message round trip time. The resulting average is 1782 uSec with the first sample and 993 uSec w/o it!
Getting realistic results is very difficult because others are working in the system, in particular on Alpha. So there is a large spread in resulting time values.
Exercise III
Changes
I changed psum.c and spsum.c, and wrote the wrappers uni_psum.c and uni_spsum.c to fit the new source in the compile scheme.
Changes in psum.c
The following will, depending on nproc, spawn the slaves as required by the assignment. SGI6 is the architecture of the SGI O2s, ALPHAMP is Alpha and SGIMP64 is Sigma.
switch(nproc) { case 10: numt = pvm_spawn(SLAVENAME, (char**)0, 0, "", nproc, tids); break; case 6: numt += pvm_spawn(SLAVENAME, (char**)0, PvmTaskArch, "SGIMP64", 2, tids+4); case 4: numt += pvm_spawn(SLAVENAME, (char**)0, PvmTaskArch, "ALPHAMP", 2, tids+2); case 2: numt += pvm_spawn(SLAVENAME, (char**)0, PvmTaskArch, "SGI6", 2, tids); break; default: printf("Requested number of slaves not implemented\n"); }
This sends each slave it's portion to sum up.
#ifdef UNICAST printf("Using unicast send\n"); for (i=0; i<nproc; i++) { low = i *((n/nproc)+1); high = low +((n/nproc)+1); if (high > n) { high = n; } send_len = high-low; pvm_initsend(PvmDataDefault); pvm_pkint(&i, 1, 1); pvm_pkint(&send_len, 1, 1); pvm_pkint(data+low, send_len, 1); pvm_send(tids[i], 0); } #else ... broadcast code ... #endif
Note that calculating the chunks size is different from the original source. The original would miss some numbers if (n/nproc) is not an integer.
Changes in spsum.c
Receive the data and sum it up.
#ifdef UNICAST printf("Using unicast send\n"); pvm_upkint(&me, 1, 1); pvm_upkint(&n, 1, 1); pvm_upkint(data, n, 1); /* calculate sum */ result =0; for(i=0; i<n; i++) { result += data[i]; } #else ... broadcast code ... #endif
Table IV. Timing values for psum, spsum application.
Multicast | Individual sends | |
number of slaves | ||
2 | 15184 uSec | 16536 uSec |
4 | 31450 uSec | 29488 uSec |
6 | 35478 uSec | 28167 uSec |
Questions
- What is the purpose of the hostfile?
- What conclusions can you draw from the data you gathered in exercise 2?
- What conclusions can you draw from the data you gathered in exercise 3?
- What issues do you think need to be taken into consideration in analyzing the above data sets?
Answers
- Q1
-
The purpose of the hostfile is to configure a virtual machine. The simplest form is to just list the host members of the PVM; those will be added on startup. Options to configure the hosts, eg startup or working directory, may be supplied.
It may also be used to specify the options for the hosts without adding them. Those options will be used if the machine is added later.
- Q2
-
Interpreting the data is very difficult because of the huge spread. In general, for large data sets Raw or even InPlace should be used if possible. For little amount of data it makes almost no difference.
Starting processes is slow. So spawn them as soon as possible and give them enough work.
- Q3
-
Using more slaves only slows the computation down, because most of the work the system does is spawning processes and sending messages; very costly operations!
The result of Table III suggests that there is not much difference between a multicast and several unicasts. This is not suprising, because the amount of data to transmit stays almost the same.
- Q4
-
These tests are run in a used network, ie network traffic and busy computers. Therefore results will be falsified.
First I trusted in the short message times; after talking about that topic in classes I took a second look at the numbers (see Table II). My conclusion: Ensure that you know what you are measuring.