ARCHTEST provides a concrete and objective measure of a machine's logical behavior in accessing shared data. ARCHTEST is unique. No other program provides the information provided by ARCHTEST.
Uses for ARCHTEST
ARCHTEST is used by machine designers in numerous
different corporations to test that their machines behave as
intended. In some cases a stripped down version of
ARCHTEST is run in simulation. In other cases
ARCHTEST is run constantly and simultaneously with other
tests during system verification in order to detect when errors
At the heart of an operating system there are multiple threads operating on shared data. In this area programmers cannot use semaphores; to avoid slow, atomic instructions, they will use load and store instructions whenever possible. Such programmers need to know the behavior, as identified by ARCHTEST, of the machines on shared operands.
Rules tested for by ARCHTEST
The computation rules (CMP) describe how the terminal value of
each operand is calculated from the initial values of the operands.
The rule of uniprocessor (UPO) order describes the ordering constraints that a uniprocessor must obey. Each processor on a multiprocessor must obey the UPO rule.
The order rules describe the order constraints that a processor on a multiprocessor system may be required to obey:
Tests Performed by ARCHTEST
ARCHTEST consists of twelve distinct tests, T1 through T12.
Denote by A(X,Y,Z) the architecture consisting of the rules X, Y, and
Z. Each test in ARCHTEST seeks to detect that a machine relaxes a
particular architecture. The architecture for which each test seeks
to detect a relaxation is:
If '**' = '00', then the test operates only on integer operands. If '**' = '10', then the test operates on both integer operands and floating point operands. If '**' = '20', then the test operates only on floating point operands. Tests T1, T8, and T9 operate only on a single operand and so for them there is no '10' case.
Tests T1-T7 were first described in Appendix D of [coll92].
The most recent extension to ARCHTEST allows it to
execute a large number of times. If an unexpected error is
detected, the program terminates. Otherwise, it iterates. If
several copies of ARCHTEST are run simultaneously in this
mode, it presents a very stressful load to the storage
hierarchy with a high probability that any logical error will be
For each interaction on shared data ARCHTEST calculates a numerical value representing how close the machine came to relaxing a rule. These values for each test are accumulated and presented in a histogram. Positive values represent strong behavior; negative values relaxed behavior. The histograms describe the behavior of a machine. Some machines never relax a given rule; some do so often and to an extreme degree; others do so very infrequently (thereby suggesting that an unintended relaxation has become visible). The most relaxed values seen so far show an instruction occurring around a couple of hundred instructions out of order.
ARCHTEST is written in C. Normally, one just compiles it and runs it. Each test consists of a data generation phase and an analysis phase. Some users have wanted to run the tests in simulation. To do this, they rewrite the data generation part of each test routine in hex or assembler and run this part in simulation. Once the data is generated, it is saved and then later fed back into ARCHTEST for analysis.
The basic tests in ARCHTEST operate on only one, two, or four operands. It is conceivable that a machine could operate correctly on such a light load, but fail on a heavier load. Tests logically involving more operands can take an extremely long time to analyze. A simple compromise is to run a basic test accessing the essential operands, while continually accessing some number of irrelevant shared operands. This increases the cache traffic, while allowing the analysis to remain brief. Two forms of irrelevant traffic can be generated: (1) accesses to lines other than the ones containing the essential operands, and (2) accesses to (different operands in) the same lines as the lines containing the essential operands.
Some of the data produced by ARCHTEST yields performance information. For example, ARCHTEST prints out the time to perform 1,000,000 load instructions, add instructions, subtract instructions, multiply instructions, and divide instructions (both integer and floating operations). ARCHTEST also records the time to perform each test and the time to analyze each test. In one case machine B was known to be twice as fast overall as machine A, yet the tests showed machine A to be faster (15% in some tests; 80% in others) on shared data operations than machine B.
Some other behaviors made visible by ARCHTEST are neither tests of the logic nor measurements of performance. The output from Test T6 can be plotted to show the interaction of either four or six processors (two writing, two or four reading) on almost an instruction-by-instruction level. The output from Test T8 shows that on some machines, the same sequences of values (called convoys) appear to each processor in turn. It is not yet understood why some machines exhibit convoys and others do not.
If there is a need to analyze a particular behavior in greater detail than that performed by ARCHTEST, it is possible to dump the arrays created by ARCHTEST for later study.
ARCHTEST performs each logical test (1) on integer operands only, (2) on floating operands only, and (3) where possible, on both integer and floating operands.
Many parameters can be set and modified at run time.
Example. Test T4: Seek a relaxation of
Sequential consistency requires that all read and write
operations to shared data occur in the order defined by the program.
The most common relaxation of PO is to allow read operations to occur
in time before logically preceding write operations occur. Test T4
seeks to detect a relaxation of WR and of RW.
Suppose a 2-way multiprocessor executes the following code and calculates the given terminal values. The way to show that a read occurred before a logically preceding write (or even worse, that the machine relaxed the rule of computation) is to assume that the machine obeyed the rules of computation and of program order and then to deduce a contradiction.
Initially, (A,B,U,V) = (0,0,0,0). P1 P2 A = 1; B = 1; U = B; V = A; Terminally, (A,B,U,V) = (1,1,0,0).If the machine obeys program order, then in P1 the write into A occurs before the read from B.
If the machine obeys the rule of computation, then since U is zero, the read from B in P1 occurs before the write of a one value into B in P2.
If the machine obeys program order, then in P2 the write into B occurs before the read from A.
If the machine obeys the rule of computation, then since V is zero, the read from A in P2 occurs before the write of a one value into A in P1.
Therefore, a circuit in time exists among the read and write operations for the program, which is a contradiction. Thus the machine fails to obey the rule of computation and/or the rule of program order.
For such a brief program to be successful as a test case, the two processors have to be exactly synchronized. Extending the test case in the manner shown below creates a test program that can be used on a real machine.
P1 P2 A = 0; B = 0; U = B; V = A; A = 1; B = 1; U = B; V = A; A = 2; B = 2; U = B; V = A; A = 3; B = 3; U = B; V = A; . . . . . . A = k; B = k; U[k] = B; V[k] = A;If k = 200,000, then there is a much longer window during which a relaxation of the rule of program order can be detected. Note that the processors do not have to be in sync; either can be interrupted away for a long period. So long as both execute at the same time, the possibility exists for them to reveal a relaxation of the rule of program order.
A True Experience with Minusculy Distinct
An operating systems programmer for company A saw the system
crash when the system ran on a machine from company B. The system
never crashed on the supposedly compatible machine produced by company
A. The programmer concluded that the B machine had a bug which was
not present on the A machine.
The truth was difficult to get to. Eventually, both machines were found to obey the architecture defined for the machines. However, the A machine, as implemented, was stronger than the architecture required. The B machine implemented the architecture more accurately. The OS had not been coded to perform correctly on a machine obeying a strict implementation of the architecture, and so the fault was found to be in the OS, not in the B machine.
Multiprocessor Diagnostics was formed by William W. Collier in
1993 to develop and to market ARCHTEST. The early
development of ARCHTEST was supported by the (now defunct)
Mid-Hudson InfoMall (Dennis Eaton, Director).
ARCHTEST is about 13,000 lines of C code. A single flag allows ARCHTEST to compile to run either on Windows NT/XP or on a Posix-compliant Unix system. For other systems users may have to supply code to start and to stop multithreading, to synchronize threads, and to record the time.
Complete documentation is available through the site map. Sample licenses for the use of ARCHTEST are available at Licenses.
For further information send email to: William W. Collier at Multiprocessor Diagnostics, 13 Gary Place, Wappingers Falls, New York 12590. Tel: 845-297-5901.
Last updated January 4, 2006