William W. Collier, firstname.lastname@example.org.
13 Gary Place, Wappingers Falls, New York 12590
ARCHTEST is a program which tests
the logical behavior of a shared memory multiprocessor (SMMP)
when two or more processors simultaneously access the same shared
When SMMP systems first appeared, Leslie Lamport defined
sequential consistency (SC), the standard of behavior the systems
were expected to exhibit:
The result of any execution is the same as if the operations
of all the processors were executed in some sequential order,
and the operations of each individual processor appear in
this sequence in the order specified by its program.
SC implies that two strong rules are obeyed:
In "Reasoning About Parallel Architectures" [COLL92] I
exhibited programs which detect a failure of a machine to obey
SC. Shortly afterwards I founded Multiprocessor Diagnostics and
began offering these programs under the name of ARCHTEST.
- Program order. Instructions are executed in the order
defined by the underlying program.
- Atomicity. Since the order is sequential, there is no overlapping
in time of the execution of instructions. Consequently, each instruction
is executed atomically.
At about the same time engineers were pointing out that:
1. Machines could run considerably faster if they violated SC.
2. Programmers can deal with a machine's failure to be SC by:
- a. Using Lock/Unlock statements around any access to shared
- b. Using hardware instructions, such as Exchange or Compare
and Swap, to access shared data where Lock/Unlock instructions
can not be used (as in the Lock and Unlock routines).
What is the "Right" standard of behavior for SMMP systems?
Most machines today, as evidenced by the sample of those tested
by ARCHTEST, are not SC. Some perform read operations before
logically preceding write operations. Others do this and also
perform write operations nonatomically.
Almost all machines are claimed to be cache coherent. This
phrase has had different meanings over time.
If machines need not be SC, then of what use is ARCHTEST today?
- CC1. Initially, when SC was thought to be the
correct standard of behavior, cache coherent meant write atomic.
- CC2. The SPARC architecture (Version 9) [SPARCV9]
required that all processes see all changes in value of all
operands in the same order. This is CC2 behavior. In [COLL92]
CC2 was shown to be logically indistinguishable from CC1.
- CC3. A more relaxed standard is that all processes see
all changes in value of each separate operand in the same order.
This standard is adhered to, not so much because it meets the
needs of programmers, but rather simply because it falls out of
the MESI discipline.
It is important to recognize that there are still some very
basic rules which SMMPs must obey. Here are three elementary
examples of basic rules being violated.
Example 1. The machine must compute.
Initially, A = 0;
A = 1;
Terminally, A = 23.
Example 2. The rules followed by a uniprocessor must be obeyed.
Initially, A = X = 0.
A = 1;
X = A;
Terminally, A = 1, X = 0.
Example 3. The machine must be cache coherent (in the CC3 sense).
Initially, A = U = V = X = Y = 0.
A = 1; A = 2;
U = A; X = A;
V = A; Y = A;
Terminally, A = either 1 or 2, U = 1, V = 2, X = 2, Y = 1.
Testing for violations of such basic rules can be very
valuable. Some customers have used ARCHTEST in simulation and
have thereby found design flaws early in the design process.
(They don't run all of ARCHTEST in simulation, of course; they
run the basic test programs in assembler language and save the
output in a file; then the file is fed into ARCHTEST for
At the other end of the spectrum some have used ARCHTEST to
verify the behavior of a completed system. See [PHIL05] for an
Finally, ARCHTEST provides performance information in several
- graphs showing the distribution of times that events occur
prematurely, thus signaling a violation of a rule.
- graphs showing the distribution of times that processors
are delayed when they collide over data.
Current development efforts for ARCHTEST
ARCHTEST is being improved on several fronts.
The analysis routines now provide more explanatory information
describing instances where a machine has violated a rule.
The output routines are now in html format. This will make it
easier to annotate, to compare, and to cross reference
performance information on different machines in a new round of
testing that is about to begin.
Recently Jens Ramsey of Freescale Semiconductor and I
independently discovered a bug in the analysis of the results
from Test 3 in ARCHTEST. The bug could cause Test 3 to fail to
see that a machine did not obey write order. Because of this bug
I will update the copy of ARCHTEST, at no fee, held by each
For years I have looked for new tests, which differed in a
logically significant way from the current tests in ARCHTEST, but
have found none.
The tests in ARCHTEST currently involve 2-4 threads and 2-4
operands. When there were only 2-4 processors in a system to be
tested, this was sufficient. Today it is not.
At one time in the past another fellow and I wrote code to test
a new system. I wrote what I thought were very subtle and clever
programs. The other fellow wrote programs which tried something
simple. If that succeeded, he doubled one of the parameters. If
that succeeded, he doubled another one. And so on. When
production time came around, the other fellow's programs found
far more bugs than mine did.
I propose to use the other fellow's approach in extending
ARCHTEST. ARCHTEST2 will have no new logical tests. However, it
will have the capability of running many threads, operating on
many operands. Plans are still not definite. Ideally, the new
code will be available in the summer of 2009.
W. W. Collier. Reasoning About Parallel Architectures.
Prentice-Hall, N.J. 1992.
L. Lamport. How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE
Trans. on Computers, vol. C28:9 (1979), 690-691.
Jos van Eijndhoven, Jan Hoogerbrugge, Jayram
M.N., Paul Stravers and Andrei Terechko. "Cache-Coherent
Heterogeneous Multiprocessing as Basis for Streaming
Applications", pp 61-80. Chapter 3 in Phillips Research Book
Series, Volumn 3, Dynamic and Robust Streaming in and between
Connected Consumer-Electronic Devices. Edited by Peter van der
Stok. ISBN 978-1-4020-3453-4 (Print) 978-1-4020-3454-1 (Online)
DOI 10.1007/1-4020-3454-7_3. 2005. [On the web at
The SPARC Architecture Manual, Version 9. SPARC
International, Inc., Santa Clara, California. David L. Weaver /
Tom Germond, Editors. ISBN 0-13-825001-4. 2000. p. 262. [On
the web at developers.sun.com/solaris/articles/sparcv9.pdf.]
Last updated June 15, 2008.