Output from the Tests in ARCHTEST

Copyright (C) 1994-2009 Multiprocessor Diagnostics. All rights reserved.

Output from Each Test

ARCHTEST consists of 30 distinct tests of the logical behavior of a SMMP on shared data. (In addition, T10 consists of four tests of the performance effect of cache hits.)

There are two aspects to the analysis of test results: detection and presentation. ANALYSIS explains how relaxed behavior is detected. This file describes how observed behavior is presented. The output for each test includes:

The format of the output for each test is generally the same. The following is a description of the output for a typical test. Later, the differences for each test are described.

Description of the test.

The description consists of:

The name of the test.

A few lines of a logical description of the test, written without any initialization or control instructions.

The conditions sought in the data in the arrays at the end of the test. These are the conditions which indicate relaxed behavior on the part of the machine.

The suffix to be appended to data which is generated externally to ARCHTEST. For more information, see the file HOWTORUN.

Timing and Synchronization of the Threads

The starting time of the test, its ending time, and its duration are printed out. Optionally, so is information which shows the threads passing a hand-coded barrier routine at the beginning of the test and another at the end of the test. The function of the beginning barrier is to synchronize the start of the threads as closely as possible. The function of the ending barrier is to show that each thread has completed (useful sometimes for debugging).

Description of the Data in the Arrays

It sometimes happens that one thread will run to completion before another thread has even started. Consequently, the test will not detect any instances of relaxed behavior. Less extreme is the case in which one thread will briefly interrupt away and then return. During the interruption no relaxed behavior will be detected. Suppose a test reveals very few incidents of relaxed behavior. The question that arises is: is this result due to the behavior of the machine or to an unfortunate timing of the threads, or partly to each? In the last case one may want to estimate what percent of the time the threads were simultaneously active. Four sets of data are printed out to try to answer this question.

Twenty, uniformly spaced entries of the arrays are printed out. (The number of entries can be changed at run time; see the file HOWTORUN.

The values in the arrays are plotted in a DIMxDIM grid. (DIM is initialized to 80; it can be changed at run time; see the file HOWTORUN. The abscissa plots the arrays as DIM-1 equal segments. The ordinate plots the values of the endpoints of the segments from zero to the maximum value. All four arrays are plotted simultaneously. Values for array u are plotted with the value 1; for v, 2; for x, 4; and for y, 8. Where one value might overlay another, the logical OR of the values is plotted.

A histogram is printed showing the deltas in value of consecutive array entries. A zero value indicates that a thread read the same value of a shared operand on two consecutive loop iterations, possibly because the thread which wrote into the shared operand had stopped.

A second histogram is printed showing the number of strings of a given length, where the entries of the string all have the same value. Each such string indicates a time when the writing thread may have stopped executing.

A Hidden Histogram

ARCHTEST checks to see that the (subset of) values in an array which were all originally read from the same shared operand are nondecreasing. A pair of decreasing values would indicate a relaxation of the rules of UPO or URR. If a relaxation is found, a histogram is printed (see the next section for the format and content). Since no machine has been found to indulge in this relaxation, this histogram is never seen (except when debugging).

Results of the Test

At the end of a test the data in the arrays is analyzed in search of a condition indicating relaxed behavior of the machine. Two types of information are printed out: the particular and the general.

The analysis routine gathers the twenty most extreme incidents of relaxed behavior which occurred on the test and prints, most extreme first, a description of each incident. Then portions of the arrays are printed, specifically, those portions of the arrays containing the most relaxed events observed. This can be used to verify that the analysis of ARCHTEST was correct.

The general type of information is embodied in a histogram. For each entry in the arrays a value d is calculated. A d value represents the difference betweeen an observed value of an operand and the smallest value the operand could have had without relaxed behavior being detected. Negative values of d indicate relaxed behavior. Nonnegative values indicate strong behavior.

If the goal of a design is to achieve strong behavior, then zero values represent ideal behavior, and positive values represent less than ideal behavior.

The d values for each test are plotted in a histogram. A made up example for Test T700 is shown below.

     -- Begin excerpt from output from ARCHTEST --

  Relaxed (negative) versus strong (nonnegative) behavior seen in this test.
      0 -80        0 -60        0 -40        0 -20   150722   0        1  20
      0 -79        0 -59        0 -39        0 -19   322634   1        1  21
      0 -78        0 -58        0 -38        0 -18   190119   2        1  22
      0 -77        0 -57        0 -37        0 -17    84814   3        1  23
      0 -76        0 -56        0 -36        0 -16    33320   4        1  24
      0 -75        0 -55        0 -35        0 -15    11591   5        1  25
      0 -74        0 -54        0 -34        0 -14     3873   6        1  26
      0 -73        0 -53        0 -33        0 -13     1249   7        1  27
      0 -72        0 -52        0 -32        0 -12      402   8        1  28
      0 -71        0 -51        0 -31        0 -11       67   9        1  29
      0 -70        0 -50        0 -30        0 -10       13  10        1  30
      0 -69        0 -49        0 -29        0  -9        3  11        1  31
      0 -68        0 -48        0 -28        0  -8        1  12        1  32
      0 -67        0 -47        0 -27        0  -7        1  13        1  33
      0 -66        0 -46        0 -26        0  -6        1  14        1  34
      0 -65        0 -45        0 -25        3  -5        1  15        1  35
      0 -64        0 -44        0 -24        6  -4        1  16        1  36
      0 -63        0 -43        0 -23       35  -3        1  17        1  37
      0 -62        0 -42        0 -22      119  -2        1  18        1  38
      0 -61        0 -41        0 -21      930  -1        1  19        1  39

            ------ Relaxed ------    |         ------ Strong ------
380000 |                             .
360000 |                             .
340000 |                             .
320000 |                             .o
300000 |                             .o
280000 |                             .o
260000 |                             .o
240000 |                             .o
220000 |                             .o
200000 |                             .o
180000 |                             .oo
160000 |                             .oo
140000 |                             ooo
120000 |                             ooo
100000 |                             ooo
 80000 |                             oooo
 60000 |                             oooo
 40000 |                             oooo
 20000 |                             ooooo
     0 |_________._________.____ooooo.ooooooooooooooooooooooooooooooooooooooo
     -30       -20       -10         0        10        20        30        40

  Test  =     700.         Total   =  799943.      Min thru -81 =       0.
  Type  =     CC1.         Minimum =      -5.      -80 thru  -1 =    1093.
  Behav =  Strong.         Maximum =      54.        0 thru  39 =  798835.
  histend                  K       =  200000.       40 thru Max =      15.

     -- End excerpt from output from ARCHTEST --
Test T700 looks for write operations being performed nonatomically. On this run the arrays were 200000 entries long. There were two arrays. Each was examined twice, resulting in almost 800000 data points. (Some points were lost due to boundary problems at the ends of the arrays.) The maximum d value was 54; the minimum was -5. Virtually all points fell in the range of 0-39. One eighth of one percent fell into negative territory, signalling that these write operations were not performed atomically. (For more detail see the example in the description of Test T700 in the file ANALYSIS.

Multiple Architectures

Tests T4, T7, T8, T11, and T12 test for multiple architectures. More specifically, the data from each test can be analyzed in multiple arrays. In some case it can be deduced that a machine relaxes A1 and A2 or A3 and A4 (the Ai are architectures; the 'or' is inclusive.) In these cases two histograms are presented; one for the A1 and A2 case; the other for the A3 and A4 case.

Saving Output Arrays

All of the tests permit the output data in the arrays to be saved in a file for later study. Files can be saved either at the beginning of an analysis routine or at the end of the analysis routine. The latter case is convenient for saving data whose histogram shows unusal behavior. The former case is useful for capturing data on which a newly modified and undebugged analysis routine blows up. For the mechanics of dumping the arrays, see the file HOWTORUN.

Instruction Timing Information

The timing information from ARCHTEST can be used to compare the relative performance of two machines in operating on shared data. Since performance on shared data does not correlate perfectly with overall performance, ARCHTEST also prints out the time to perform 1,000,000 of each of the following operations:
     Integer load
     Integer add            Floating add
     Integer subtract       Floating subtract
     Integer multiply       Floating multiply
     Integer divide         Floating divide

T6PLOT

The output from T6 can be used to obtain a view of the instruction by instruction interaction between processors.

       P1       P2         P3         P4         P5        P6
     A = 0;  U[0] = A;  V[0] = A;  X[0] = A;  Y[0] = A;  B = 0;
     A = 1;  U[1] = B;  V[1] = B;  X[1] = B;  Y[1] = B;  B = 1;
     A = 2;  U[2] = A;  V[2] = A;  X[2] = A;  Y[2] = A;  B = 2;
     A = 3;  U[3] = B;  V[3] = B;  X[3] = B;  Y[3] = B;  B = 3;
     A = 4;  U[4] = A;  V[4] = A;  X[4] = A;  Y[4] = A;  B = 4;
     A = 5;  U[5] = B;  V[5] = B;  X[5] = B;  Y[5] = B;  B = 5;
     A = 6;  U[6] = A;  V[6] = A;  X[6] = A;  Y[6] = A;  B = 6;
     A = 7;  U[7] = B;  V[7] = B;  X[7] = B;  Y[7] = B;  B = 7;
View P1 and P6 as clocks, generating time signals A and B. Even though P1 and P6 can be interrupted, in practice they often produce signals (= increments in the value of A and B) quite regularly over a period of long duration.

View each pair of values of A and B captured by P2-P5 as representing a point on a plane with coordinates A and B. Thus, the points traversed by P2 are:

     (U[0],U[1]), (U[2],U[3]), (U[4],U[5]), (U[6],U[7]), ....
Let the points traversed by P2 (respectively, P3, P4, P5) be represented by '1' (respectively, '2', '4', '8'). Where two processes traverse the same point record the logical OR of their path values. Then four distinct paths can be viewed on a plot.

These ideas are implemented by T6PLOT. After running Test T6 ARCHTEST asks the user if the output should be saved in a file. If so, the arrays are dumped into a file which can then be used as input to T6PLOT.

T6PLOT initially presents a view starting at (0,0). Strings of the characters listed below can be entered to navigate around the plot. Each view shows the coordinates of the view (starting in the upper left corner) and the level of magnification. Level 1 is the highest magnification. Possible levels are 1, 2, 4, 8, ..., etc. At level x every x by x byte block is condensed into one character in the plot. The initial view is at level 32.

     f: move one half screen to the right.
     s: move one half screen to the left.
     e: move one half screen upwards.
     c: move one half screen downwards.
     u: demagnify the view by a factor of two.
     d: magnifiy the view by a factor of two.
     x: end T6PLOT.
Here is an example of a recent run on a 2-way machine.

0____.____1____.____2____.____3____.____4____.____5____.____6____.____7____.___
 (x,y): from (2240,2496) to (2560,2592)   level = 4   x - exit
...................1..........................................................
.    .    .    .    .    .    .    .    .    .    .    .    .    .    .    .
.    .    .    .    .    .    .    .    .    .    .    .    .    .    .    .
.    .    .    .    .    .    .    .    .    .    .    .    .    .    .    .
.    .    .    .    .    .    .    .    .    .    .    .    .    .    .    .
..............................................................................
.    .    .    .    .    2    .    .    .    .    .    .    .    .    .    .
.    .    .    .    .    .    .    .    .    .    .    .    .    .    .    .
.    .    .    .    .    .    .    .    .    .    .    .    .    .    .    .
.    .    .    .    .    .    .    .    .    .    .    .    .    .    .    .
..............................................................................
.    .    .    .    .    .    .    .    .    .    .    .    .    .    .    .
.    .    .    .    .    .1   .    .    .    .    .    .    .    .    .    .
.    .    .    .    .    .    .2   .    .    .    .    .    .    .    .    .
.    .    .    .    .    .    .    .    .    .    .    .    .    .    .    .
..............................................................................
.    .    .    .    .    .    .    .    .    .    .    .    .    .    .    .
.    .    .    .    .    .    .    .    .    .    .    .    .    .    .    .
.    .    .    .    .    .    .    .    .    .    .    .    .    .    .    .
.    .    .    .    .    .    .    . 12 .  3 . 3  3  3 . 231.  3 .3  3.  23. 3
..............................................................................
.    .    .    .    .    .    .    .    .    .    .    .    .    .    .    .
.    .    .    .    .    .    .    .    .    .    .    .    .    .    .    .
At level 4 the plot shows that P6 has temporarily stopped executing. Further, P2 and P3 appear to be largely synchronized in their execution (shown by the large number of 3s). This seems improbable; surely this behavior will disappear under higher magnification. But viewing (a portion of) the above screen at level 1 the behavior is seen to be real.
0____.____1____.____2____.____3____.____4____.____5____.____6____.____7____.___
 (x,y): from (2480,2556) to (2560,2580)   level = 1   x - exit
..............................................................................
.    .    .    .    .    .    .    .    .    .    .    .    .    .    .    .
.    .    .    .    .    .    .    .    .    .    .    .    .    .    .    .
.    .    .    .    .    .    .    .    .    .    .    .    .    .    .    .
.    .    .    .    .    .    .    .    .    .    .    .    .    .    .    .
..............................................................................
.    .    .    .    .    .    .    .    .    .    .    .    .    .    .    .
.    .    .    .    .    .    .    .    .    .    .    .    .    .    .    .
.    .    .    .    .    .    .    .    .    .    .    .    .    .    .    .
.    .    .    .    .    .    .    .    .    .    .    .    .    .    .    .
..............................................................................
.    .    .    .    .    .    .    .    .    .    .    .    .    .    .    .
.    .    .    .    .    .    .    .    .    .    .    .    .    .    .    .
.    .    .    .    .    .    .    .    .    .    .    .    .    .    .    .
.    .    .    .    .    .    .    .    .    .    .    .    .    .    .    .
..............................................................................
.    .    .    .    .    .    .    .    .    .    .    .    .    .    .    .
.    .    .    .    .    .    .    .    .    .    .    .    .    .    .    .
.    .    .    .    .    .    .    .    .    .    .    .    .    .    .    .
.    .    . 3  .    .    3    .    .   3.    .    .  2 .3   .    .    3    .
..............................................................................
.    .    .    .    .    .    .    .    .    .    .    .    .    .    .    .
.    .    .    .    .    .    .    .    .    .    .    .    .    .    .    .

T8CONVOY

If there are no circuits in the data from Test T8, then another form of analysis of the data becomes possible: the four columns of the W matrix can be merged into one sequence of ascending values. This is the function performed by T8CONVOY.

On many machines each processor sees a seemingly random sequence of values distinct from that seen by other processors. Occasionally, on some machines T8CONVOY reveals a curious behavior: a sequence of values, called a convoy, is seen by two or more distinct processors. This suggests that the several processors have become momentarily serialized in accessing shared data. The significance of such behavior is not yet understood.

T8CONVOY prints out the original input file. Alongside it T8CONVOY prints out the same four columns in an elongated form in which each line represents one value in a sequence of ascending values. A blank entry in a line for a particular value indicates that a processor did not see the value. A nonblank entry indicates that it did. Lines with two or more nonblank entries (indicating that two or more processors saw the same value) are starred. A long sequence of starred lines represents one (or possibly more than one) convoy.

Site Map

What's New?

References

Last updated January 4, 2006.