Multiprocessor Diagnostics

William W. Collier, collier@acm.org.
13 Gary Place, Wappingers Falls, New York 12590
Tel: 845.297.5901.

ARCHTEST is a program which tests the logical behavior of a shared memory multiprocessor (SMMP) when two or more processors simultaneously access the same shared data.

Background

When SMMP systems first appeared, Leslie Lamport defined sequential consistency (SC), the standard of behavior the systems were expected to exhibit:

The result of any execution is the same as if the operations of all the processors were executed in some sequential order, and the operations of each individual processor appear in this sequence in the order specified by its program. [LAMP79]

SC implies that two strong rules are obeyed:

Program order. Instructions are executed in the order defined by the underlying program.
Atomicity. Since the order is sequential, there is no overlapping in time of the execution of instructions. Consequently, each instruction is executed atomically.

In "Reasoning About Parallel Architectures" [COLL92] I exhibited programs which detect a failure of a machine to obey SC. Shortly afterwards I founded Multiprocessor Diagnostics and began offering these programs under the name of ARCHTEST.

At about the same time engineers were pointing out that:

1. Machines could run considerably faster if they violated SC.
2. Programmers can deal with a machine's failure to be SC by:
- a. Using Lock/Unlock statements around any access to shared data.
- b. Using hardware instructions, such as Exchange or Compare and Swap, to access shared data where Lock/Unlock instructions can not be used (as in the Lock and Unlock routines).

What is the "Right" standard of behavior for SMMP systems?

Most machines today, as evidenced by the sample of those tested by ARCHTEST, are not SC. Some perform read operations before logically preceding write operations. Others do this and also perform write operations nonatomically.

Almost all machines are claimed to be cache coherent. This phrase has had different meanings over time.

CC1. Initially, when SC was thought to be the correct standard of behavior, cache coherent meant write atomic.
CC2. The SPARC architecture (Version 9) [SPARCV9] required that all processes see all changes in value of all operands in the same order. This is CC2 behavior. In [COLL92] CC2 was shown to be logically indistinguishable from CC1.
CC3. A more relaxed standard is that all processes see all changes in value of each separate operand in the same order. This standard is adhered to, not so much because it meets the needs of programmers, but rather simply because it falls out of the MESI discipline.

If machines need not be SC, then of what use is ARCHTEST today?

It is important to recognize that there are still some very basic rules which SMMPs must obey. Here are three elementary examples of basic rules being violated.

Example 1. The machine must compute.

  Initially, A = 0;
  
      P1
    A = 1;
  
  Terminally, A = 23.

Example 2. The rules followed by a uniprocessor must be obeyed.

  Initially, A = X = 0.
  
      P1
    A = 1;
    X = A;

  Terminally, A = 1, X = 0.

Example 3. The machine must be cache coherent (in the CC3 sense).

  Initially, A = U = V = X = Y = 0.
  
      P1          P1   
    A = 1;      A = 2; 
    U = A;      X = A; 
    V = A;      Y = A; 

  Terminally, A = either 1 or 2, U = 1, V = 2, X = 2, Y = 1.

Testing for violations of such basic rules can be very valuable. Some customers have used ARCHTEST in simulation and have thereby found design flaws early in the design process. (They don't run all of ARCHTEST in simulation, of course; they run the basic test programs in assembler language and save the output in a file; then the file is fed into ARCHTEST for analysis.

At the other end of the spectrum some have used ARCHTEST to verify the behavior of a completed system. See [PHIL05] for an example.

Finally, ARCHTEST provides performance information in several forms, including:

graphs showing the distribution of times that events occur prematurely, thus signaling a violation of a rule.
graphs showing the distribution of times that processors are delayed when they collide over data.

Current development efforts for ARCHTEST

ARCHTEST is being improved on several fronts.

The analysis routines now provide more explanatory information describing instances where a machine has violated a rule.

The output routines are now in html format. This will make it easier to annotate, to compare, and to cross reference performance information on different machines in a new round of testing that is about to begin.

Recently Jens Ramsey of Freescale Semiconductor and I independently discovered a bug in the analysis of the results from Test 3 in ARCHTEST. The bug could cause Test 3 to fail to see that a machine did not obey write order. Because of this bug I will update the copy of ARCHTEST, at no fee, held by each current licensee.

Future Development

For years I have looked for new tests, which differed in a logically significant way from the current tests in ARCHTEST, but have found none.

The tests in ARCHTEST currently involve 2-4 threads and 2-4 operands. When there were only 2-4 processors in a system to be tested, this was sufficient. Today it is not.

At one time in the past another fellow and I wrote code to test a new system. I wrote what I thought were very subtle and clever programs. The other fellow wrote programs which tried something simple. If that succeeded, he doubled one of the parameters. If that succeeded, he doubled another one. And so on. When production time came around, the other fellow's programs found far more bugs than mine did.

I propose to use the other fellow's approach in extending ARCHTEST. ARCHTEST2 will have no new logical tests. However, it will have the capability of running many threads, operating on many operands. Plans are still not definite. Ideally, the new code will be available in the summer of 2009.

References

COLL92. W. W. Collier. Reasoning About Parallel Architectures. Prentice-Hall, N.J. 1992.
LAMP79. L. Lamport. How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Trans. on Computers, vol. C28:9 (1979), 690-691.
PHIL05. Jos van Eijndhoven, Jan Hoogerbrugge, Jayram M.N., Paul Stravers and Andrei Terechko. "Cache-Coherent Heterogeneous Multiprocessing as Basis for Streaming Applications", pp 61-80. Chapter 3 in Phillips Research Book Series, Volumn 3, Dynamic and Robust Streaming in and between Connected Consumer-Electronic Devices. Edited by Peter van der Stok. ISBN 978-1-4020-3453-4 (Print) 978-1-4020-3454-1 (Online) DOI 10.1007/1-4020-3454-7_3. 2005. [On the web at www.eijndhoven.net/jos/publications/eijndhoven_wasabi_chapter_2005.pdf.]
SPARCV9. The SPARC Architecture Manual, Version 9. SPARC International, Inc., Santa Clara, California. David L. Weaver / Tom Germond, Editors. ISBN 0-13-825001-4. 2000. p. 262. [On the web at developers.sun.com/solaris/articles/sparcv9.pdf.]

Site Map

Last updated June 15, 2008.