What's New?

New Test Runs.
An Elegant Theorem by Rajnish Ghughal.
Rules versus Tests.
The CRW Rule.
Multiple Analyses of a Test.
Use of CMP and UPO.
Pure Tests.
Files of Input Parameters.
Controlling Output from ARCHTEST.
Initial State of Operands in the Cache.

New Test Runs.

Two machines have just been tested by Brad Richards at Vassar College:

PC13. A 2-way SunBlade 1000 at Vassar College (731K).

  N=2 T=4 K=500000   2004  Brad Richards
  1 OO 2 OOO 3 OOO 4 XXX 5 OOO 6 OOO 7 XXX 8 OO 9 OO 11 XXX 12 XXX

PC14. A Sun E3500 with 8 400MHz processors and 6 Gig of RAM at Vassar College (741K).

  N=8 T=4 K=500000   2004  Brad Richards
  1 OO 2 OOO 3 OOO 4 XXX 5 OOO 6 OOO 7 XXX 8 OO 9 OO 11 XXX 12 XXX

An Elegant Theorem by Rajnish Ghughal [ghug99].

Ghughal's Theorem. A(CMP,WR) => A(CMP,PO)

The theorem says that if a machine does not execute reads before writes, then it will appear to be program ordered. It also explains why, if a machine does not obey program order, then it always visibly executes reads before logically preceding writes.

Proof. Consider this execution.

         P1
     L1: B = A;
     L2: D = C;
     L3: F = E;

The events for the execution are:

     (P1,L1,R,-,A,S)
     (P1,L1,W,-,B,S)
     (P1,L2,R,-,C,S)
     (P1,L2,W,-,D,S)
     (P1,L3,R,-,E,S)
     (P1,L3,W,-,F,S)

Since all executions obey CMP, we know that

     (P1,L1,R,-,A,S) <srw (P1,L1,W,-,B,S)
     (P1,L2,R,-,C,S) <srw (P1,L2,W,-,D,S)
     (P1,L3,R,-,E,S) <srw (P1,L3,W,-,F,S)

If the machine obeys WR, then

     (P1,L1,W,-,B,S) <wr  (P1,L2,R,-,C,S)
     (P1,L2,W,-,D,S) <wr  (P1,L3,R,-,E,S)

Therefore,

     (P1,L1,R,-,A,S) <srw (P1,L1,W,-,B,S)
                     <wr  (P1,L2,R,-,C,S)
                     <srw (P1,L2,W,-,D,S)
                     <wr  (P1,L3,R,-,E,S)
                     <srw (P1,L3,W,-,F,S)

and so the execution obeys WW, RR, and RW and thus obeys PO.

Rules versus Tests.

Here is a simple problem. Can the results of running ARCHTEST be presented, not in terms of tests passed or failed, but in terms of rules obeyed or violated? The answer is emphatically no. Suppose a machine passed a test for A(R1,R2) and violated a test for A(R1,R2,R3). There is no certainty that the machine violated rule R3. It might violate rule R2, but the violation becomes visible only in the environment of the test for A(R1,R2,R3). Nonetheless, it is a common-sense guess that the machine violated rule R3. To assist in the development of common-sense analysis, ARCHTEST now prints out a summary of up to nine lines where each line identifies one architecture that was found to have been relaxed. The nine possible lines are:

         WW
     URR WW
         WW WR
         WW    RR
                  RW
                     CC3
     URR             CC3
            WR       CC3
               RR        CC1

For further information, see the ANALYSIS file.

The CRW Rule.

In the course of revising the ANALYSIS file the underpinnings required for the use of the CRW rule came into sharper focus. Suppose the following execution occurs.

     Initially, (A,X) = (0,0).

        P1       P2
      A = 1;   X = A;
      A = 2;
      A = 3;

     Terminally, (A,X) = (3,1).

It is clear by UPO that

     (P1,L1,W,1,A,S1) <upo (P1,L2,W,2,A,S1) <upo (P1,L1,W,3,A,S1)

And CMP requires that

     (P1,L1,W,1,A,S2) <cwr (P2,L1,R,1,A,S2)

What we do not know, and what we need to know in order to reason successfully about many of the tests in ARCHTEST, is that

     (P2,L1,R,1,A,S2) <crw (P1,L2,W,2,A,S2)

This appears so obvious, that it seems churlish to doubt it, but a moment's thought shows that there is no rule to prevent (P1,L2,W,2,A,S2) from occurring before (P1,L1,W,1,A,S2). (Conveniently, the third statement in P1 erases all trace of such a transgression.)

If the machine is known to obey WW, CC1, or CC3, then it is possible to deduce that

     (P2,L1,R,1,A,S2) <crw (P1,L2,W,2,A,S2)

The details are in the ANALYSIS file.

Multiple Analyses of a Test.

Let A1, A2, ..., be architectures. Previously, some of the tests in ARCHTEST showed that a machine could relax A1 || A2. Now, some of the tests are understood to show that a machine relaxes A1 && A2. In fact, some tests can show that a machine relaxes ((A1 && A2) || (A3 && A4)) (where && and || are used for 'and' and 'or', as in C).

To see how a machine can be seen to relax A1 && A2, consider the following scenario. Analysis of the data from a test of a machine shows that a circuit can be seen; the circuit employs rules R1 and R2 and CRW, and R1 and R2 are neither equal to WW, CC1, or CC3. In order to justify the employment of the rule CRW, either WW or CC3 must be assumed. (There is no point in assuming the stronger CC1 if the weaker CC3 will do.) Then the machine can be seen to have relaxed both A(R1,R2,WW) and A(R1,R2,CC3).

Use of CMP and UPO.

In RAPA the goal in distinguishing two architectures was to use as few rules as possible.

In the more pragmatic world in which ARCHTEST is used, the rules of CMP and UPO are always assumed. Therefore, in revising ARCHTEST I have shown all architectures as including both CMP and UPO, whether or not UPO is used in each case (CMP always is, of course).

Pure Tests.

A pure test is a test that involves only CMP, UPO, and one other rule R. A violation of the test can be reasonably thought to be a violation of the rule R.

Until recently, the only pure tests were for WW and RW. As a result of the new analysis involving CRW, tests T8 and T9 are now seen to be pure tests of CC3. This makes an important difference. Previously, it was possible to dismiss a relaxation detected by T8 or T9 as being due to a relaxation of an ordering rule, rather than necessarily involving CC3. Now such a relaxation is seen to involve CC3 and only CC3.

Files of Input Parameters.

In debugging ARCHTEST and in testing machines the same problem comes up: one wants a certain set of parameters for one situation and another set for another situation, and one doesn't want to have to type in either entire set every time. ARCHTEST now allows a user to create a file specifying the values of all of the run-time parameters. See the HOWTORUN file.

Controlling Output from ARCHTEST.

Many new flags have been defined to turn on or turn off the generation of separate categories of output data. These flags can be set in the parms files described in the previous item.

Information on setting parameters and on controlling output will shortly be available in the HOWTORUN file. In the meantime users can make a test run and at the end of the run save the parameters in a parms file. This will give a quick idea of the capabilities available.

Initial State of Operands in the Cache

ARCHTEST now allows a user to manipulate operands into either read-only or exclusive states in the caches.

Here are two sequences of code which are indistinguishable to a programmer (X is a local variable which is never referenced again):

However, to an engineer there is a significant difference. If A is not in the cache at the time of initial reference, then in the first case A is brought into the cache in the exclusive state; in the second case A is brought into the cache in the read-only state. The different states cause different subsequent actions in the hardware; conceivably, one set of actions could involve an error not present in the other set. ARCHTEST now provides a user three compile-time options:

Every write into a shared operand is ALWAYS, NEVER, or ONLY-SOMETIMES preceded by a fetch of the shared operand into a local variable to force it into read-only state.

These options are based on ideas presented in [mntz96]. So far, only two machines have been tested with these new features; no differences were found. See Results of Testing. For information on setting these parameters, see the HOWTORUN file.

Send email to: William W. Collier.

Site Map

References

Last updated January 4, 2006.