N=2 T=4 K=500000 2004 Brad Richards 1 OO 2 OOO 3 OOO 4 XXX 5 OOO 6 OOO 7 XXX 8 OO 9 OO 11 XXX 12 XXX
N=8 T=4 K=500000 2004 Brad Richards 1 OO 2 OOO 3 OOO 4 XXX 5 OOO 6 OOO 7 XXX 8 OO 9 OO 11 XXX 12 XXX
The theorem says that if a machine does not execute reads before writes, then it will appear to be program ordered. It also explains why, if a machine does not obey program order, then it always visibly executes reads before logically preceding writes.
Proof. Consider this execution.
P1 L1: B = A; L2: D = C; L3: F = E;The events for the execution are:
(P1,L1,R,-,A,S) (P1,L1,W,-,B,S) (P1,L2,R,-,C,S) (P1,L2,W,-,D,S) (P1,L3,R,-,E,S) (P1,L3,W,-,F,S)Since all executions obey CMP, we know that
(P1,L1,R,-,A,S) <srw (P1,L1,W,-,B,S) (P1,L2,R,-,C,S) <srw (P1,L2,W,-,D,S) (P1,L3,R,-,E,S) <srw (P1,L3,W,-,F,S)If the machine obeys WR, then
(P1,L1,W,-,B,S) <wr (P1,L2,R,-,C,S) (P1,L2,W,-,D,S) <wr (P1,L3,R,-,E,S)Therefore,
(P1,L1,R,-,A,S) <srw (P1,L1,W,-,B,S) <wr (P1,L2,R,-,C,S) <srw (P1,L2,W,-,D,S) <wr (P1,L3,R,-,E,S) <srw (P1,L3,W,-,F,S)and so the execution obeys WW, RR, and RW and thus obeys PO.
Rules versus Tests.
Here is a simple problem. Can the results of running ARCHTEST be
presented, not in terms of tests passed or failed, but in terms of
rules obeyed or violated? The answer is emphatically no. Suppose a
machine passed a test for A(R1,R2) and violated a test for
A(R1,R2,R3). There is no certainty that the machine violated rule R3.
It might violate rule R2, but the violation becomes visible only in
the environment of the test for A(R1,R2,R3). Nonetheless, it is a
common-sense guess that the machine violated rule R3.
To assist in the development of common-sense analysis, ARCHTEST
now prints out a summary of up to nine lines where each line
identifies one architecture that was found to have been relaxed.
The nine possible lines are:
WW URR WW WW WR WW RR RW CC3 URR CC3 WR CC3 RR CC1For further information, see the ANALYSIS file.
The CRW Rule.
In the course of revising the ANALYSIS file the underpinnings
required for the use of the CRW rule came into sharper focus. Suppose
the following execution occurs.
Initially, (A,X) = (0,0). P1 P2 A = 1; X = A; A = 2; A = 3; Terminally, (A,X) = (3,1).It is clear by UPO that
(P1,L1,W,1,A,S1) <upo (P1,L2,W,2,A,S1) <upo (P1,L1,W,3,A,S1)And CMP requires that
(P1,L1,W,1,A,S2) <cwr (P2,L1,R,1,A,S2)What we do not know, and what we need to know in order to reason successfully about many of the tests in ARCHTEST, is that
(P2,L1,R,1,A,S2) <crw (P1,L2,W,2,A,S2)This appears so obvious, that it seems churlish to doubt it, but a moment's thought shows that there is no rule to prevent (P1,L2,W,2,A,S2) from occurring before (P1,L1,W,1,A,S2). (Conveniently, the third statement in P1 erases all trace of such a transgression.)
If the machine is known to obey WW, CC1, or CC3, then it is possible to deduce that
(P2,L1,R,1,A,S2) <crw (P1,L2,W,2,A,S2)The details are in the ANALYSIS file.
Multiple Analyses of a Test.
Let A1, A2, ..., be architectures. Previously, some of the tests
in ARCHTEST showed that a machine could relax A1 || A2. Now, some of
the tests are understood to show that a machine relaxes A1 && A2. In
fact, some tests can show that a machine relaxes ((A1 && A2) || (A3 &&
A4)) (where && and || are used for 'and' and 'or', as in C).
To see how a machine can be seen to relax A1 && A2, consider the following scenario. Analysis of the data from a test of a machine shows that a circuit can be seen; the circuit employs rules R1 and R2 and CRW, and R1 and R2 are neither equal to WW, CC1, or CC3. In order to justify the employment of the rule CRW, either WW or CC3 must be assumed. (There is no point in assuming the stronger CC1 if the weaker CC3 will do.) Then the machine can be seen to have relaxed both A(R1,R2,WW) and A(R1,R2,CC3).
Use of CMP and UPO.
In RAPA the goal in distinguishing two architectures was to use
as few rules as possible.
In the more pragmatic world in which ARCHTEST is used, the rules
of CMP and UPO are always assumed. Therefore, in revising ARCHTEST I
have shown all architectures as including both CMP and UPO, whether or
not UPO is used in each case (CMP always is, of course).
A pure test is a test that involves only CMP, UPO, and one other
rule R. A violation of the test can be reasonably thought to be a
violation of the rule R.
Until recently, the only pure tests were for WW and RW. As a result of the new analysis involving CRW, tests T8 and T9 are now seen to be pure tests of CC3. This makes an important difference. Previously, it was possible to dismiss a relaxation detected by T8 or T9 as being due to a relaxation of an ordering rule, rather than necessarily involving CC3. Now such a relaxation is seen to involve CC3 and only CC3.
Files of Input Parameters.
In debugging ARCHTEST and in testing machines the same
problem comes up: one wants a certain set of parameters for one
situation and another set for another situation, and one doesn't want
to have to type in either entire set every time. ARCHTEST now
allows a user to create a file specifying the values of all of the
run-time parameters. See the HOWTORUN file.
Controlling Output from ARCHTEST.
Many new flags have been defined to turn on or turn off
the generation of separate categories of output data. These flags
can be set in the parms files described in the previous item.
Information on setting parameters and on controlling output will shortly be available in the HOWTORUN file. In the meantime users can make a test run and at the end of the run save the parameters in a parms file. This will give a quick idea of the capabilities available.
Initial State of Operands in the Cache
ARCHTEST now allows a user to manipulate operands into either
read-only or exclusive states in the caches.
Here are two sequences of code which are indistinguishable to a programmer (X is a local variable which is never referenced again):
P1 A = 1;
P2 X = A; A = 1;
However, to an engineer there is a significant difference. If A is not in the cache at the time of initial reference, then in the first case A is brought into the cache in the exclusive state; in the second case A is brought into the cache in the read-only state. The different states cause different subsequent actions in the hardware; conceivably, one set of actions could involve an error not present in the other set. ARCHTEST now provides a user three compile-time options:
Every write into a shared operand is ALWAYS, NEVER, or ONLY-SOMETIMES preceded by a fetch of the shared operand into a local variable to force it into read-only state.
These options are based on ideas presented in [mntz96]. So far, only two machines have been tested with these new features; no differences were found. See Results of Testing. For information on setting these parameters, see the HOWTORUN file.
Send email to: William W. Collier.
Last updated January 4, 2006.