Analysis

Analysis of the Output from ARCHTEST

Introduction
Rules of Architecture
-- The Computation Rules
-- The Compute-Write-Read Rule (CWR)
-- The Compute-Read-Write Rule (CRW)
-- The Compute-Write-Write Rule (CWW)
-- The Statement-Read-Write Rule (SRW)
-- The Rules of Uniprocessor Order (UPO)
-- Order Rules
-- Read-Read Order (RR)
-- Write-Write Order (WW)
-- Write-Read Order (WR)
-- Read-Write Order (RW)
-- Program Order (PO)
-- Rules of Cache Coherence
-- CC1
-- CC2
-- CC3
-- CC4
The Structure of the Tests
Determining the Rules a Machine Violates
Justifying the Use of <crw
Two (or More) Tests for the Price of One
Analysis of the Tests in ARCHTEST
Test T100
Test T200
Test T300
Test T400
Test T500
Test T600
Test T700
Test T800
Test T900
Test T1000
Test T1100
Test T1200

Introduction

Let M1 and M2 each be a shared memory multiprocessor (SMMP). Suppose that M1 and M2 are identical except that M1 is sequentially consistent and M2 isn't, or M1 is cache coherent and M2 isn't, or M1 and M2 are thought both to be cache coherent, but in distinct ways.

Certainly, M1 and M2 will appear to an engineer to be distinct machines. The interesting question is whether or not M1 and M2 will appear to be distinct machines to a programmer. Let us make this thought more precise.

Let P be a program and D a set of input data, and let R be one possible result when P is run with D on M1. If for all such R, R can also be calculated by P running with D on M2, then write M1 => M2. M1 => M2 means that the set of possible outputs from running P with D on M1 is contained in the set of possible outputs from running P with D on M2. (Due to nondeterminism there may be more than one possible result from running P with D.)

If for all possible programs P and input data D it is the case that M1 => M2 and M2 => M1, then write M1 <=> M2 and say that M1 and M2 are indistinguishable.

Suppose M1 and M2 are distinguishable. Then M1 and M2 differ in their behavior and that difference is visible to at least one program operating on at least on set of input data. So now the interesting questions become:

What differences in behavior can programs detect?

What programs can detect those differences?

ARCHTEST is a program to test a shared memory multiprocessor (SMMP) to determine how it behaves in accessing shared data. ARCHTEST incorporates all such tests as are presently known. The following is a description of these tests and an explanation of how they detect relaxed behavior.

Rules of Architecture

There are several distinct sets of rules of architecture.

The computation rules describe how processes interact to calculate terminal values of operands from a given set of initial values of the operands.

The uniprocessor order rules define the ordering rules that a uniprocessor follows.

Program order, and its subsidiary rules, define rules of order based on the order of read and write operations in a program.

The rule of write atomicity requires that each write operation be instantaneously visible throughout a system.

The strongest architecture is sequential consistency (SC). An SC machine obeys the rules of program order and write atomicity. None of the tests in ARCHTEST will detect relaxed behavior on an SC machine. If a machine provides a relaxed memory model, then many of the twelve tests below should detect that fact.

The Computation Rules

The computation rules are so intuitively obvious that they are almost never stated explicitly. However, they must be defined formally and precisely in order for them to be used in reasoning about interactions between processes on shared data.

A failure to obey one of the rules of computation might look like this:

     Initially, A = 0.

         P1
     L0: A = 1;

     Terminally, A = 13.

Operand A is initially zero. Then it is set to one. But at the end of the execution it has the value 13. Clearly, the machine on which the execution occurred did not compute in the way one normally expects a machine to behave.

The Compute-Write-Read Rule (CWR)

     Initially, A = X = 0.

         P1             P2
     L0: A = 1;     L0: X = A;

     Terminally, A = X = 1.

If X = 1 at the end of the execution, then it must be that the write event in process P1 occurred before the read event in process P2. Represent this by

     (P1,L0,W,1,A,S2) <cwr (P2,L0,R,1,A,S2)

to show that statement L0 in process P1 wrote a 1 into operand A in S2 (S2 is P2's view of storage) before statement L0 in process P2 read a 1 value from operand A in S2.

The Compute-Read-Write Rule (CRW)

     Initially, A = X = 0.

         P1             P2
     L0: A = 1;     L0: X = A;

If X = 0 at the end of the execution, then it must be that the read event in process P2 occurred before the write event in process P1. Represent this by

     (P2,L0,R,0,A,S2) <crw (P1,L0,W,1,A,S2)

to show that statement L0 in process P2 read a 0 from operand A in S2 before statement L0 in process P1 wrote a 1 value into operand A in S2.

The Compute-Write-Write Rule (CWW)

     Initially, A = 0.

         P1             P2
     L0: A = 1;     L0: A = 2;

If A = 2 (in both S1 and S2, that is, in both P1's and P2's view of storage) after this program has executed, then it must be that the write events in process P1 (for both S1 and S2) occurred before the write events in process P2. Represent this by

     (P1,L0,W,1,A,S1) <cww (P2,L0,W,2,A,S1)
     (P1,L0,W,1,A,S2) <cww (P2,L0,W,2,A,S2)

to show that statement L0 in process P1 wrote a 1 into operand A in S1 and in S2 before statement L0 in process P2 wrote a 2 into operand A in S1 and in S2, respectively.

The Statement-Read-Write Rule (SRW) It is convenient to define another rule as one of the rules of computation. Write

     (P1,L0,R,0,B,S1) <srw (P1,L0,W,0,A,S1)

to show that in

         P1
     L0: A = B;

the read event(s) in a statement occur before the write events.

The Rules of Uniprocessor Order (UPO)

On a uniprocessor the parts of the following three code fragments (1) involving operations on X and (2) involving operations on the store for the process containing the code fragments, would always be executed in order.

     X = 1;     X = 1;     Y = X;
     Y = X;     X = 2;     X = 1;

Thus for this fragment

       Pi
     X = 1;
     X = 2;

we would have

     (Pi,-,W,1,X,Si) <upo (Pi,-,W,2,X,Si)

but UPO would not mean that (Pi,-,W,1,X,Si) would necessarily occur before any of the other write events in the second statement.

The necessity of this behavior is so obvious that UPO is treated like CMP as being part of the rules that all machines, even multiprocessors, automatically obey.

There is one case which is not so clear cut. If two fetches from the same operand are made, will they always occur in order? This seems to me to be a reasonable assumption, but others may not agree.

     Y = X;
     Z = X;

I have chosen to give this rule and the associated relation special names, URR and <urr, respectively.

Order Rules

Read-Read Order (RR)

If a machine obeys RR, and if a first read operation on shared data logically occurs in a program before a second read operation on shared data, then the first read operation is executed before the second read operation.

Write-Write Order (WW)

If a machine obeys WW, and if a first write operation on shared data logically occurs in a program before a second write operation on shared data, then the first write operation is executed before the second write operation.

Write-Read Order (WR)

If a machine obeys WR, and if a write operation on shared data logically occurs in a program before a read operation on shared data, then the write operation is executed before the read operation.

Read-Write Order (RW)

If a machine obeys RW, and if a read operation on shared data logically occurs in a program before a write operation on shared data, then the read operation is executed before the write operation.

Program Order (PO)

A machine obeys PO if and only if it obeys RR, WW, RW, and WR.

The WR and RW rules are adopted from [adgh96].

Rules of Cache Coherence

The term cache coherence is used to describe the behavior of an SMMP in making the results of a write operation by one process visible to other processes. Several different standards of behavior have been proposed.

CC1

All threads see each operand change value at exactly the same instant. CC1 is also called write atomicity (WA).

CC2

All threads see exactly the same sequence of changes of values for all operands. CC2 is also called write synchronization.

CC3

All threads see exactly the same sequence of changes of values for each separate operand.

CC4

Different threads may see an operand assume different sequences of values.

Suppose a system obeys CC1, and let w1 and w2 be any two write events in the same statement. If it is known that event x happens before w1, say, then, due to the write atomicity of the system, it must be that x happens before w2. Symbolically, if x <hb w1, then x <hb w2. This is inconvenient to write when one is trying to establish the existence of a circuit. A more economical notation is to write x <hb w1 =cc1 w2 to show that x occurs before w1 which occurs at the same time as w2.

No program can distinguish between CC1 and CC2 machines [coll92].

However, programs can distinguish between machines obeying all of the other rules of cache coherence.

A model can represent a write operation either as a single event or as multiple events. In the first case the event represents the instant in time at which the result of the write operation becomes visible to all processes in the system. Such a model is inherently write atomic. An alternative is to represent a write operation as a set of events, with each event in the set representing the instant in time at which the result of the write operation becomes visible to an individual process. This is the function of the sixth component of each event, Si.

Specifically, each process Pi is considered to have a view of storage Si distinct from the view of all other processes. In particular, each process can see an operand change value at a different time. If process P1 writes into operand X in a 4-way system, there will be four write events, one for each store, S1, S2, S3, and S4. (A read by process Pi is always from store Si). For Tests 1-4 and 11, where write atomicity is assumed, the sixth component of each event can be ignored.

The Structure of the Tests

In each test in ARCHTEST an SMMP is assumed to obey a given set of rules of architecture. Sometimes the data resulting from running a test can be interpreted via the rules to deduce that a circuit exists among certain events in time. Since it is impossible for events to occur in a circular order in time, the circuit is understood to demonstrate that the machine did not in fact obey all of the rules used in constructing the circuit.

Analysis of some of the tests is quite involved. To keep explanations as simple as possible only specific examples, rather than general cases, are analyzed.

Analyzing the test results occurs in two sections. Cases sought in the first section are labelled x.0 in contrast to the cases sought in the second section which are labelled x.1, x.2, ....

In the first section various subsets of the operands in the arrays are checked to see if they are monotonically increasing. If not, then the fundamental rules of CMP and UPO have been violated. To date (January 2005) no machine has been seen to fail any part of the analysis in the first section. Since no such failures have been seen, ARCHTEST does not print out the results for this section.

The second section of the analysis is the significant part. These are the relaxations the tests are nominally about.

For each type of test a numeric value is defined, denoted d, for each entry in each array. A d value represents the difference betweeen an observed value of an operand and the smallest value the operand could have had without indicating relaxed behavior on the part of the SMMP. Negative values of d indicate relaxed behavior; the further from zero, the more relaxed is the behavior of the SMMP. Nonnegative values of d indicate strong behavior. If the goal of an SMMP is to be strong, then positive values represent instances of acceptable, but less than optimal behavior, while zero values represent ideal behavior. The d values for each test are plotted in a histogram in the output for each test. (For more on interpreting the output from each test, see the file called HOWTORUN.

Determining the Rules a Machine Violates

Here are the tests and the relaxations each of them can detect.

     T1xx.    Both A(CMP,UPO,URR,WW) and A(CMP,UPO,URR,CC3)

     T2xx.         A(CMP,UPO,RR,WW)

     T3xx.         A(CMP,UPO,URR,WW)

     T4xx.1.  Both A(CMP,UPO,WW,WR) and (A(CMP,UPO,WR,CC3),
     T4xx.3.  Both A(CMP,UPO,WW)    and (A(CMP,UPO,RW)

     T5xx.         A(CMP,UPO,RR,CC1)

     T6xx.         A(CMP,UPO,RR,CC1)

     T7xx.1.       A(CMP,UPO,RR,CC1)
     T7xx.2.  Both A(CMP,UPO,WW)    and (A(CMP,UPO,RW))

     T8xx.1.  Both A(CMP,UPO,CC3)   and (A(CMP,UPO,WW,WR),
     T8xx.2.  Both A(CMP,UPO,CC3)   and (A(CMP,UPO,WW) and A(CMP,UPO,RW)

     T9xx.         A(CMP,UPO,URR,CC3)

    T11xx.1.  Both A(CMP,UPO,WW,WR) and (A(CMP,UPO,WR,CC3),
    T11xx.2.  Both A(CMP,UPO,WW)    and (A(CMP,UPO,RW)

    T12xx.1.       A(CMP,UPO,RR,CC1)
    T12xx.2.  Both A(CMP,UPO,WW)    and (A(CMP,UPO,RW)

It is inconceivable that a machine would relax the CMP and UPO rules. Therefore, if a machine is seen to violate A(CMP,UPO,X), it is certain that the machine relaxes rule X. Call such a test pure, and the other tests mixed.

If a machine obeys A(CMP,UPO,X), but violates A(CMP,UPO,X,Y), then a commonsense deduction is that the machine relaxes rule Y. The reality is that the machine might relax rule X, but the relaxation becomes visible only in the environment for the second test. It is important to keep this in mind when interpreting the results of the tests.

Once redundancies are eliminated, here are the relaxations sought by the tests. The relaxations are shown on the left; the tests dectecting the relaxations are shown on the right.

                                 T T T T T T T T T T T T T T T T
                                 1 2 3 4 4 5 6 7 7 8 8 9 1 1 1 1
                                       . .     . . . .   1 1 2 2
                                       1 3     1 3 1 2   . . . .
                                                         1 2 1 2
                  RW                     o       o   o     o   o
         WW                              o       o   o     o   o
         WW    RR                  o
     URR WW                      o   o
     URR             CC3         o                     o
            WR       CC3               o                 o
         WW WR                         o           o     o
                     CC3                           o o
               RR        CC1               o o o             o

There are five pure tests for WW, five for RW, and one for CC3. If the machine passes the pure test for WW, then there is one mixed test for RR and three for WR.

Some of the tests can be adapted and extended at the assembly language level to create pure tests. Suppose the machine fails test T2, indicating that it relaxes either RR, or WW, or both. Rewrite T2 in assembler language and insert read barrier instructions to obtain a pure test of WW. Similarly with write barrier instructions to obtain a pure test of RR. (And with both read and write barrier instructions to obtain a test of the barrier instructions themselves.)

Justifying the Use of <crw

Consider the following execution.

     Initially, (A,X) = (0,0).

         P1            P2
     L1: X = A;    L1: A = 1;
                   L2: A = 2;
                   L2: A = 3;

     Terminally, (A,X) = (3,1).

Obviously,

     (P2,L1,W,1,A,S1) <cwr (P1,L1,R,1,A,S1)

However, it is not necessarily true that

     (P1,L1,R,1,A,S1) <crw (P2,L2,W,2,A,S1)

since some or all of the write events in (P2,L2) could come before some or all of the write events in (P2,L1). (The purpose of setting A to 3 was to conceal whether or not this occurred.)

The purpose of this note is to show that if the execution obeys either WW, CC1, or CC3, then it is true that

     (P1,L1,R,1,A,S1) <crw (P2,L2,W,2,A,S1)

Assume the execution obeys WW. Since UPO is obeyed by all machines, it must be that

     (P2,L1,W,1,A,S2) <upo (P2,L2,W,2,A,S2)

Since the execution obeys WW, it must be that

     (P2,L1,W,1,A,Si) <ww  (P2,L2,W,2,A,Sj), for i,j in {1,2}

In particular,

     (P2,L1,W,1,A,S1) <ww  (P2,L2,W,2,A,S1)

Then to avoid a circuit and to obey CMP, it must be that

     (P2,L1,W,1,A,S1) <cwr (P1,L1,R,1,A,S1)
                      <crw (P2,L2,W,2,A,S1)

Assume the execution obeys CC1. Then it must be that

     (P2,L1,W,1,A,Si) <cc1 (P2,L2,W,2,A,Sj), for i,j in {1,2}

which suffices to establish the same argument as for WW.

Assume the execution obeys CC3. Then it must be that

     (P2,L1,W,1,A,Si) <cc3 (P2,L2,W,2,A,Si), for i in {1,2}

which suffices to establish the same argument as for WW.

This reasoning will be referred to at various places in the analysis of the following tests by the phrases "<crw by WW", "<crw by CC1", or "<crw by CC3".

Two (or More) Tests for the Price of One

For some of the tests nothing more can be detected in the output data than whether or not a machine relaxed a single architecture.

There are two important cases in which one test may provide evidence that the machine relaxed two or more architectures. First, the data may be interpreted in more than one way. Second, different rules may be used to accomplish the same analysis, that is, to prove that a given set of data imply the existence of a circuit of events in time.

As described earlier, all test results (except for T1) are analyzed in at least two ways. The x.0 cases examine the data in the arrays for monotonicity. The x.1, x.2, ..., cases examine the data for more interesting relaxations. Sometimes the different cases detect distinct relaxations; sometimes not.

The are two major ways in which different rules may be substituted and still accomplish the same analysis.

In this code fragment

         P1
     L1: U = V;
     L2: X = Y;

it is possible to assume either RW or WW to argue that one or the other of the following is true:

     (P1,L1,R,-,V,Px) <rw  (P1,L2,W,-X,Sy)

     (P1,L1,R,-,V,Px) <srw (P1,L1,W,-U,Sy)
                      <ww  (P1,L2,W,-X,Sy)

This observation is used with tests T4, T7, T8, T11, and T12.

In tests T1, T4, and T11 the rules WW, CC1, and CC3 do not occur except for the purpose of justifying the <crw link. If the analysis uses WW, say, to show a circuit exists, then the same analysis, slightly modified, can be used with CC3 replacing WW. (CC1 could also be used, of course, but there is no point in using the stronger rule when the weaker rule suffices.) Thus, the tests will show that either neither or both of the two architectures were relaxed.

To summarize, six tests (shown below) can detect the relaxation of more than one architecture. 'and' indicates a modification of the analysis suffices. 'or' (inclusive) shows where separate analyses of the data are required.

     Test T1.       (URR,WW) and (URR,CC3)
     Test T4, T11.  ((WW,WR) and (WR,CC3)) or ((WW) and (RW))
     Test T7, T12.  (RR,CC1) or ((WW) and (RW))
     Test T8.       ((CC3) and (WW,WR)) or ((CC3) and (WW) and (RW))

Analysis of the Tests in ARCHTEST

Test T100.  Seek a relaxation of both A(CMP,UPO,URR,WW)
                                  and A(CMP,UPO,URR,CC3).


     Initially, A = U[i] = 0, i = 0 to K.

         P1            P2
     L0: A = 0;   L0: U[0] = A;
     L1: A = 1;   L1: U[1] = A;
     L2: A = 2;   L2: U[2] = A;
     L3: A = 3;   L3: U[3] = A;
     L4: A = 4;   L4: U[4] = A;
     L5: A = 5;   L5: U[5] = A;
     L6: A = 6;   L6: U[6] = A;
     L7: A = 7;   L7: U[7] = A; etc.

     Seek 1.0.  U[i] > U[i+1].  d = U[i+1] - U[i].

To show: Not A(CMP,UPO,URR,WW) and not A(CMP,UPO,URR,CC3).

After P1 and P2 finish, ARCHTEST performs tests for one case in the data in the U vector.

Case 1.0. U[i] > U[i+1].

     Example of Case 1.0.

         P1            P2
     L3: A = 3;   L2: U[2] = A;
     L4: A = 4;   L3: U[3] = A;

If U[2] = 4 > U[3] = 3, then the following circuit exists.

     (P2,L2,R,4,A,S2) <urr (P2,L3,R,3,A,S2)
                      <crw (P1,L4,W,4,A,S2) (<crw by WW or CC3)
                      <cwr (P2,L2,R,4,A,S2)

Explanation of the circuit

By the rule of URR the read event in statement L2 occurred before the read event in statement L3 in process P2.

     (P2,L3,R,3,A,S2) <crw (P1,L4,W,4,A,S2)

See the note on justifying <crw.

     (P1,L4,W,4,A,S2) <cwr (P2,L2,R,4,A,S2)

If statement L2 in P2 read a value of 4 from A, then this event had to occur after P1 wrote a 4 into A.

The existence of the circuit shows that the machine failed to obey both A(CMP,UPO,URR,WW) and A(CMP,UPO,URR,CC3).

A slight variation on the above example is possible.

If U[2] = 5 > U[3] = 3, then the following circuit exists.

     (P2,L2,R,5,A,S2) <urr (P2,L3,R,3,A,S2)
                      <crw (P1,L4,W,4,A,S2) (<crw by WW or CC3)
                      <xx  (P1,L5,W,5,A,S2) (<xx = <ww or <cc3)
                      <cwr (P2,L2,R,5,A,S2)

Test T120 is identical to Test T100, except that A is a floating point operand.

Test T200.  Seek a relaxation of A(CMP,UPO,RR,WW).


     Initially, A = B = U[i] = V[i] = 0, i = 0 to K.

          P1            P2
     L11: A = 1;   L11: U[1] = A;
     L12: B = 1;   L12: V[1] = B;
     L21: A = 2;   L21: U[2] = A;
     L22: B = 2;   L22: V[2] = B;
     L31: A = 3;   L31: U[3] = A;
     L32: B = 3;   L32: V[3] = B;
     L41: A = 4;   L41: U[4] = A;
     L42: B = 4;   L42: V[4] = B; etc.

     Seek 2.0.  U[i] > U[i+1].  d = U[i+1] - U[i].
     Seek 2.0.  V[i] > V[i+1].  d = V[i+1] - V[i].

       To show:  Not A(CMP,UPO,URR,WW) and not A(CMP,UPO,URR,CC3).

     Seek 2.1.  V[i] > U[i+1].  d = U[i+1] - V[i]
     Seek 2.2.  U[i] > V[i]+1.  d = V[i]+1 - U[i]

       To show:  Not A(CMP,UPO,RR,WW).

After P1 and P2 finish, ARCHTEST performs tests for three distinct conditions in the data in the U and V vectors.

     Case 2.0.  U[i] > U[i+1].
     Case 2.0.  V[i] > V[i+1].
     Case 2.1.  V[i] > U[i+1].
     Case 2.2.  U[i] > V[i] + 1.

Case 2.0. U[i] > U[i+1].

Example of Case 2.0.

If U[2] = 4 > U[3] = 3, then the following circuit exists.

     (P2,L21,R,4,A,S2) <urr (P2,L31,R,3,A,S2)
                       <crw (P1,L41,W,4,A,S2) (<crw by WW or CC3)
                       <cwr (P2,L21,R,4,A,S2)

Explanation of the circuit

     (P2,L21,R,4,A,S2) <urr (P2,L31,R,3,A,S2)

By the rule of URR the read event in statement L21 occurred before the read event in statement L31 in process P2.

     (P2,L31,R,3,A,S2) <crw (P1,L41,W,4,A,S2)

See the note on justifying <crw.

     (P1,L41,W,4,A,S2) <cwr (P2,L21,R,4,A,S2)

If statement L21 in P2 read a value of 4 from A, then this event had to occur after P1 wrote a 4 into B.

The existence of the circuit shows that the machine failed to obey both A(CMP,UPO,URR,WW) and A(CMP,UPO,URR,CC3).

Case 2.1. V[i] > U[i+1]

     Example of Case 2.1.

          P1            P2
     L11: A = 1;
       .  .  .     L32: V[3] = B;
     L21: A = 2;   L41: U[4] = A;
     L22: B = 2;

If V[3] = 2 > U[4] = 1, then the following circuit exists.

     (P2,L32,R,2,B,S2) <rr  (P2,L41,R,1,A,S2)
                       <crw (P1,L21,W,2,A,S2) (<crw by WW)
                       <ww  (P1,L22,W,2,B,S2)
                       <cwr (P2,L32,R,2,B,S2)

Explanation of the circuit

     (P2,L32,R,2,B,S2) <rr  (P2,L41,R,1,A,S2)

By the rule of read order the read event in statement L32 occurred before the read event in statement L41 in process P2.

     (P2,L41,R,1,A,S2) <crw (P1,L21,W,2,A,S2)

See the note on justifying <crw.

     (P1,L21,W,2,A,S2) <ww  (P1,L22,W,2,B,S2)

By the rule of write order the write event in statement L21 occurred before the write event in statement L22 in process P1.

     (P1,L22,W,2,B,S2) <crw (P2,L32,R,2,B,S2)

If statement L32 in P2 read a value of 2 from B, then this event had to occur after P1 wrote a 2 into B.

The existence of the circuit shows that the machine failed to obey A(CMP,UPO,RR,WW).

Case 2.2. U[i] > V[i] + 1.

Example of Case 2.2.

          P1            P2
     L11: B = 1;
       .  .  .     L41: U[4] = A;
     L22: B = 2;   L42: V[4] = B;
     L31: A = 3;

If U[4] = 3 > V[4] = 1, then the following circuit exists.

     (P2,L41,R,3,A,S2) <rr  (P2,L42,R,1,B,S2)
                       <crw (P1,L22,W,2,B,S2) (<crw by WW)
                       <ww  (P1,L31,W,3,A,S2)
                       <cwr (P2,L41,R,3,A,S2)

Explanation of the circuit

     (P2,L41,R,3,A,S2) <rr  (P2,L42,R,1,B,S2)

By the rule of read order the read event in statement L41 occurred before the read event in statement L42 in process P2.

     (P2,L42,R,1,B,S2) <crw (P1,L22,W,2,B,S2)

See the note on justifying <crw.

     (P1,L22,W,2,B,S2) <ww  (P1,L31,W,3,A,S2)

By the rule of write order the write events in statement L22 in P1 occurred before the write events in statement L31 in P1.

     (P1,L31,W,3,A,S2) <cwr (P2,L41,R,3,A,S2)

If statement L41 in P2 read a value of 3 from A, then this event had to occur after P1 wrote a 3 into A.

The existence of the circuit shows that the machine failed to obey A(CMP,UPO,RR,WW).

Test T210 is identical to Test T200, except that B is a floating point operand while A is still a fixed point operand.

Test T220 is identical to Test T200, except that both A and B are floating point operands.

Test T300.  Seek a relaxation of A(CMP,UPO,URR,WW).


     Initially, A = B = U[i] = V[i] = X[i] = Y[i] = 0, i = 0 to K.

        P1       P2         P3         P4         P5        P6
      A = 0;  U[0] = A;  V[0] = B;  X[0] = A;  Y[0] = B;  B = 0;
      B = 1;  U[1] = A;  V[1] = B;  X[1] = A;  Y[1] = B;  A = 1;
      A = 2;  U[2] = A;  V[2] = B;  X[2] = A;  Y[2] = B;  B = 2;
      B = 3;  U[3] = A;  V[3] = B;  X[3] = A;  Y[3] = B;  A = 3;
      A = 4;  U[4] = A;  V[4] = B;  X[4] = A;  Y[4] = B;  B = 4;
      B = 5;  U[5] = A;  V[5] = B;  X[5] = A;  Y[5] = B;  A = 5;
      A = 6;  U[6] = A;  V[6] = B;  X[6] = A;  Y[6] = B;  B = 6;
      B = 7;  U[7] = A;  V[7] = B;  X[7] = A;  Y[7] = B;  A = 7; etc.

      Let i+ > i and j+ > j.

      Seek 3.0.  U[i] > U[i+] and both have the same parity.
      Seek 3.0.  V[i] > V[i+] and both have the same parity.
      Seek 3.0.  X[i] > X[i+] and both have the same parity.
      Seek 3.0.  Y[i] > Y[i+] and both have the same parity.

       To show:  Not A(CMP,UPO,URR,WW) and not A(CMP,UPO,URR,CC3).

      Seek 3.1.  0 < U[i+] < V[j] < V[j+] < U[i].  d = U[i] - V[j+].
                      is      is     is      is
                     odd     even   odd     even

      Seek 3.2.  0 < U[i+] < V[j] < V[j+] < U[i]   d = U[i] - V[j+].
                      is      is     is      is
                     even    odd    even    odd

       To show:  Not A(CMP,UPO,URR,WW).

 Case 3.1.  0 < U[i+] < V[j] < V[j+] < U[i]
                 is      is     is      is
                odd     even   odd     even

Example of Case 3.1.

          P1           P2             P3                P6
      L5: B = 5;  L1: U[1] = A;  L7: V[7] = B;  B;  L3: A = 3;
      L6: A = 6;  L2: U[2] = A;  L8: V[8] = B;  B;  L4: B = 4;

If U[2] = 3 < V[7] = 4 < V[8] = 5 < U[1] = 6, then the following circuit exists:

     (P1,L5,W,5,B,S3) <ww  (P1,L6,W,6,A,S2)
                      <cww (P6,L3,W,3,A,S2)
                      <ww  (P6,L4,W,4,B,S3)
                      <cww (P1,L5,W,5,B,S3)

Explanation of the circuit

Since

     (P2,L1,R,6,A,S2) <urr (P2,L2,R,3,A,S2)

the rule of CMP requires that:

     (P1,L6,W,6,A,S2) <cwr (P2,L1,R,6,A,S2)
                      <crw (P6,L3,W,3,A,S2)
                      <cwr (P2,L2,R,3,A,S2)

Hence,

     (P1,L6,W,6,A,S2) <cww (P6,L3,W,3,A,S2)

Similarly, it can be shown that

     (P6,L4,W,4,B,S3) <cww (P1,L5,W,5,B,S3)

But then this circuit must exist:

     (P1,L5,W,5,B,S3) <ww  (P1,L6,W,6,A,S2)
                      <cww (P6,L3,W,3,A,S2)
                      <ww  (P6,L4,W,4,B,S3)
                      <cww (P1,L5,W,5,B,S3)

The existence of the circuit shows that the machine failed to obey A(CMP,UPO,URR,WW).

 Case 3.2.  0 < U[i+] < V[j] < V[j+] < U[i]
                 is      is     is      is
                 even    odd    even    odd

The analysis for Case 3.2 is the same as for Case 3.1.

Similarly for arrays U&Y, X&V, and X&Y.

The above example involved only two processes. Three or four processes can interact to form one large circuit. The analysis routine in ARCHTEST tests for loops involving two, three, or four processes.

Test T310 is identical to Test T300, except that B is a floating point operand while A is still a fixed point operand.

Test T320 is identical to Test T300, except that both A and B are floating point operands.

Test T400.  Seek a relaxation of
both A(CMP,UPO,WW,WR) and A(CMP,UPO,WR,CC3), or of
both A(CMP,UPO,WW)    and A(CMP,UPO,RW).


     Initially, A = B = U[i] = V[i], i = 0 to K.

           P1                 P2
     L00: A    = 0;     L00: B    = 0;
     L01: U[0] = B;     L01: V[0] = A;
     L10: A    = 1;     L10: B    = 1;
     L11: U[1] = B;     L11: V[1] = A;
     L20: A    = 2;     L20: B    = 2;
     L21: U[2] = B;     L21: V[2] = A;
     L30: A    = 3;     L30: B    = 3;
     L31: U[3] = B;     L31: V[3] = A; etc.

     Seek 4.0. U[i] > U[i+1].          d  = U[i+1] - U[i].
     Seek 4.0. V[i] > V[i+1].          d  = V[i+1] - V[i].

       To show:  Not A(CMP,UPO,URR,WW) and not A(CMP,UPO,URR,CC3).

     Seek 4.1. U[i] < j and V[j] < i.  d1 = V[ U[i]+1 ] - i < 0.
     Seek 4.2. V[i] < j and U[j] < i.  d2 = U[ V[i]+1 ] - i < 0.

       To show:  Not A(CMP,UPO,WW,WR) and not A(CMP,UPO,WR,CC3).

     Seek 4.3. U[i] > j and V[j] > i.  d3 = i - V[ U[i]-1 ] < 0.
     Seek 4.4. V[i] > j and U[j] > i.  d4 = i - U[ V[i]-1 ] < 0.

       To show:  Not A(CMP,UPO,WW) and not A(CMP,UPO,RW).

Cases 4.1 and 4.2.

Example of Case 4.1.

           P1                 P2
     L20: A    = 2;     L40: B    = 4;
        .   .   .          .   .   .
     L30: A    = 3;     L50: B    = 5;
     L31: U[3] = B;     L51: V[5] = A;

     U[3] = 4 < 5 and V[5] = 2 < 3.

The data shows the following circuit exists:

     (P1,L31,R,4,B,S1) <crw (P2,L50,W,5,B,S1) (<crw by WW or CC3)
                       <wr  (P2,L51,R,2,A,S2)
                       <crw (P1,L30,W,3,A,S2)
                       <wr  (P1,L31,R,4,B,S1)

Explanation of the circuit

     (P1,L31,R,4,B,S1) <crw (P2,L50,W,5,B,S1)

The <crw links can be justified by either WW or CC3 (see the note on justifying <crw).

     (P2,L50,W,5,B,S1) <wr  (P2,L51,R,2,A,S2)

If the machine obeyed the rule of WR, then in process P2 all write events in statement L50 occurred before all read events in statement L51.

     (P2,L51,R,2,A,S2) <crw (P1,L30,W,3,A,S2)

As above.

     (P1,L30,W,3,A,S2) <wr  (P1,L31,R,4,B,S1)

As above.

The existence of the circuit shows that the machine failed to obey A(CMP,UPO,WW,WR) and A(CMP,UPO,WR,CC3).

Cases 4.3 and 4.4.

Example of Case 4.3.

           P1                 P2
     L5   U[5] = B;     L31: V[3] = A;
     L60: A    = 6;     L40: B    = 4;

U[5] = 4 > 3 and V[3] = 6 > 5.

The data shows the following circuit exists:

     (P1,L60,W,6,A,S2) <cwr (P2,L31,R,6,A,S2)
                       <rw  (P2,L40,W,4,B,S1)
                       <cwr (P1,L51,R,4,B,S1)
                       <rw  (P1,L60,W,6,A,S2)

Explanation of the circuit

     (P1,L60,W,6,A,S2) <cwr (P2,L31,R,6,A,S2)

If statement L31 in P2 read a value of 6 from A, then this had to happen after statement L60 in P1 wrote a 6 into A.

     (P2,L31,R,6,A,S2) <rw  (P2,L40,W,4,B,S1)

If the machine obeyed the rule of RW, then in process P2 all read events in statement L31 occurred before any write event in statement L40.

     (P2,L40,W,4,B,S1) <cwr (P1,L51,R,4,B,S1)

As above.

     (P1,L51,R,4,B,S1) <rw  (P1,L60,W,6,A,S2)

As above.

The existence of the circuit shows that the machine failed to obey A(CMP,UPO,RW).

The data also shows the following additional circuit exists:

     (P1,L60,W,6,A,S2) <cwr (P2,L31,R,6,A,S2)
                       <srw (P2,L31,W,6,A,S1)
                       <ww  (P2,L40,W,4,B,S1)
                       <cwr (P1,L51,R,4,B,S1)
                       <srw (P1,L51,W,4,A,S2)
                       <ww  (P1,L60,W,6,A,S2)

and so the machine could not have obeyed A(CMP,UPO,WW).

Similarly for Cases 4.2 and 4.4.

See the discussion of Test T700 for further information that can be deduced from the results of running Test T400.

Test T410 is identical to Test T400, except that B is a floating point operand while A is still a fixed point operand.

Test T420 is identical to Test T400, except that both A and B are floating point operands.

Test T500.  Seek a relaxation of A(CMP,UPO,RR,CC1).


     Initially, A = B = U[i] = V[i] = 0, i = 0 to K.

         P1             P2              P3
     L0: A = 0;     L0: B = A;     L00: U[0] = B;
     L1: A = 1;     L1: B = A;     L01: V[0] = A;
     L2: A = 2;     L2: B = A;     L10: U[1] = B;
     L3: A = 3;     L3: B = A;     L11: V[1] = A;
     L4: A = 4;     L4: B = A;     L20: U[2] = B;
     L5: A = 5;     L5: B = A;     L21: V[2] = A;
     L6: A = 6;     L6: B = A;     L30: U[3] = B;
     L7: A = 7;     L7: B = A;     L31: V[3] = A; etc.

     Seek 5.0  U[i] > U[i+1].  d = U[i+1] - U[i].
     Seek 5.0  V[i] > V[i+1].  d = V[i+1] - V[i].

       To show:  Not A(CMP,UPO,URR,WW) and not A(CMP,UPO,URR,CC3).

     Seek 5.1  U[i] > V[i].    d = V[i]   - U[i].

       To show:  Not A(CMP,UPO,RR,CC1).

Case 5.1. U[i] > V[i].

Example of Case 5.1.

         P1             P2              P3
     L3: A = 3;
     L4: A = 4;     Lx: B = A;     L20: U[2] = B;
                                   L21: V[2] = A;

If U[2] = 4 > V[2] = 3, then the following circuit exists. (The line Lx of P2 which copied A into B is indeterminate.)

     (P1,L4,W,4,A,S2) <cwr (P2,Lx, R,4,A,S2)
                      <srw (P2,Lx, W,4,B,S3)
                      <cwr (P3,L20,R,4,B,S3)
                      <rr  (P3,L21,R,3,A,S3)
                      <crw (P1,L4, W,4,A,S3) (<crw by CC1)
                      =cc1 (P1,L4, W,4,A,S2)

Explanation of the circuit

     (P1,L4,W,4,A,S2) <cwr (P2,Lx, R,4,A,S2)

After statement L4 of P1 wrote a value of 4, some statement, Lx, of P2 read the value.

     (P2,Lx,R,4,A,S2) <srw (P2,Lx, W,4,B,S3)

Each statement is composed of read events, which come first in time, and write events, which come later. After the read event for Lx came the write events for Lx.

     (P2,Lx,W,4,B,S3) <cwr (P3,L20,R,4,B,S3)

For statement L20 to see a value of 4 in B, statement Lx in P2 must first have written a value of 4 in B.

     (P3,L20,R,4,B,S3) <rr  (P3,L21,R,3,A,S3)

By the rule of read order the read event in statement L20 occurred before the read event in statement L21.

     (P3,L21,R,3,A,S3) <crw (P1,L4,W,4,A,S3)

See the note on justifying <crw.

     (P1,L4,W,4,A,S3) =cc1 (P1,L4, W,4,A,S2)

By the rule of write atomicity the write into A in S3 occurred at the same time as the write into A in S2.

The existence of the circuit shows that the machine failed to obey A(CMP,UPO,RR,CC1).

If the machine is known to obey RR, then it must have violated CC1. Then the path

     (P1,L4,W,4,A,S2) <cwr (P2,Lx, R,4,A,S2)
                      <srw (P2,Lx, W,4,B,S3)
                      <cwr (P3,L20,R,4,B,S3)
                      <rr  (P3,L21,R,3,A,S3)

shows that the machine makes stale data visible.

In general, if a machine fails a test for CC1, and if the machine is known to obey all of the other rules used in constructing the circuit, and if the circuit demonstrating the failure has only one occurrence of <cc1, then the circuit demonstrates that the machine makes stale data visible. Currently, Test T5 is the only test capable of demonstrating the existence of stale data on a machine (and then only if the machine is known to obey RR.

Test T510 is identical to Test T500, except that B is a floating point operand while A is still a fixed point operand.

Test T520 is identical to Test T500, except that both A and B are floating point operands.

Test T600.  Seek a relaxation of A(CMP,UPO,RR,CC1).


     Initially, A = B = U[i] = V[i] = X[i] = Y[i] = 0, i = 0 to K.

       P1       P2         P3         P4         P5        P6
     A = 0;  U[0] = A;  V[0] = A;  X[0] = A;  Y[0] = A;  B = 0;
     A = 1;  U[1] = B;  V[1] = B;  X[1] = B;  Y[1] = B;  B = 1;
     A = 2;  U[2] = A;  V[2] = A;  X[2] = A;  Y[2] = A;  B = 2;
     A = 3;  U[3] = B;  V[3] = B;  X[3] = B;  Y[3] = B;  B = 3;
     A = 4;  U[4] = A;  V[4] = A;  X[4] = A;  Y[4] = A;  B = 4;
     A = 5;  U[5] = B;  V[5] = B;  X[5] = B;  Y[5] = B;  B = 5;
     A = 6;  U[6] = A;  V[6] = A;  X[6] = A;  Y[6] = A;  B = 6;
     A = 7;  U[7] = B;  V[7] = B;  X[7] = B;  Y[7] = B;  B = 7; etc.

     Seek 6.0.  U[i] > U[i+2].  d = U[i+2] - U[i].
     Seek 6.0.  V[i] > V[i+2].  d = V[i+2] - V[i].
     Seek 6.0.  X[i] > X[i+2].  d = X[i+2] - X[i].
     Seek 6.0.  Y[i] > Y[i+2].  d = Y[i+2] - Y[i].

       To show:  Not A(CMP,UPO,URR,WW) and not A(CMP,UPO,URR,CC3).

     Seek       U[i] < V[j] and V[j+1] < U[i-1].  d = V[j+1] - U[i-1].
     Seek       U[i] < X[j] and X[j+1] < U[i-1].  d = V[j+1] - U[i-1].
     Seek       U[i] < Y[j] and Y[j+1] < U[i-1].  d = V[j+1] - U[i-1].
     Seek       V[i] < X[j] and X[j+1] < V[i-1].  d = V[j+1] - U[i-1].
     Seek       V[i] < Y[j] and Y[j+1] < V[i-1].  d = V[j+1] - U[i-1].
     Seek       X[i] < Y[j] and Y[j+1] < X[i-1].  d = V[j+1] - U[i-1].
          6.1.  i and j are even.
          6.2.  i and j are odd.

       To show:  Not A(CMP,UPO,RR,CC1).

Cases 6.1 and 6.2.

Example of Case 6.1.

       P1       P2         P3        P6
     L3: A = 3;  L1: U[1] = B;  L4: V[4] = A;  L1: B = 1;
     L4: A = 4;  L2: U[2] = A;  L5: V[5] = B;  L2: B = 2;

If U[2] = 3 < V[4] = 4 and V[5] = 1 < U[1] = 2, then the following circuit exists.

     (P2,L2,R,3,A,S2) <crw (P1,L4,W,4,A,S2) (<crs by cc1)
                      =cc1 (P1,L4,W,4,A,S3)
                      <cwr (P3,L4,R,4,A,S3)
                      <rr  (P3,L5,R,1,B,S3)
                      <crw (P6,L2,W,2,B,S3)
                      =cc1 (P6,L2,W,2,B,S2)
                      <cwr (P2,L1,R,2,B,S2)
                      <rr  (P2,L2,R,3,A,S2)

Explanation of the circuit

     (P2,L2,R,3,A,S2) <crw (P1,L4,W,4,A,S2)

See the note on justifying <crw.

     (P1,L4,W,4,A,S2) =cc1 (P1,L4,W,4,A,S3)

By CC1 P1 wrote a 4 into A in S2 at the same instant it wrote a 4 into A in S3.

     (P1,L4,W,4,A,S3) <cwr (P3,L4,R,4,A,S3)

If statement L4 saw a value of 4 in A in S3, then statement L4 in P1 must previously have written the value of 4 into A in S3.

     (P3,L4,R,4,A,S3) <rr  (P3,L5,R,1,B,S3)

By RR the read events in statement L4 occurred before the read events in statement L5.

The reasoning in the second half of the circuit repeats the reasoning in the first half.

The existence of the circuit shows that the machine failed to obey A(CMP,UPO,RR,CC1).

If the data in the arrays for Test T6 is saved in a file, the data can later be viewed with the program T6PLOT. This allows one to view the interaction of four (or six) processors in reading and writing two operands. The granularity is that of the test loops in ARCHTEST. (Specialized programs can be written to see the interaction at the level of individual instructions.) For further information see the file HOWTORUN.

Test T610 is identical to Test T600, except that B is a floating point operand while A is still a fixed point operand.

Test T620 is identical to Test T600, except that both A and B are floating point operands.

Test T700.  Seek a relaxation of A(CMP,UPO,RR,CC1)
or of both A(CMP,UPO,WW) and A(CMP,UPO,RW).


     Initially, A = B = U[i] = V[i] = 0, i = 0 to K.

           P1                P2
     L00: A    = 0;    L00: B    = 0;
     L01: -    = A;    L01: -    = B;
     L02: U[0] = B;    L02: V[0] = A;
     L10: A    = 1;    L10: B    = 1;
     L11: -    = A;    L11: -    = B;
     L12: U[1] = B;    L12: V[1] = A;
     L20: A    = 2;    L20: B    = 2;
     L21: -    = A;    L21: -    = B;
     L22: U[2] = B;    L22: V[2] = A;
     L30: A    = 3;    L30: B    = 3;
     L31: -    = A;    L31: -    = B;
     L32: U[3] = B;    L32: V[3] = A; etc.

     Seek 7.0. U[i] > U[i+1].          d  = U[i+1] - U[i].
     Seek 7.0. V[i] > V[i+1].          d  = V[i+1] - V[i].

       To show:  Not A(CMP,UPO,URR,WW) and not A(CMP,UPO,URR,CC3).

     Seek 7.1. U[i] < j and V[j] < i.  d1 = V[ U[i]+1 ] - i < 0.
     Seek 7.2. V[i] < j and U[j] < i.  d2 = U[ V[i]+1 ] - i < 0.

       To show:  Not A(CMP,UPO,RR,CC1).

     Seek 7.3. U[i] > j and V[j] > i.  d3 = i - V[ U[i]-1 ] < 0.
     Seek 7.4. V[i] > j and U[j] > i.  d4 = i - U[ V[i]-1 ] < 0.

       To show:  Not A(CMP,UPO,WW) and not A(CMP,UPO,RW);

Cases 7.1 and 7.2.

Example of Case 7.1.

           P1                 P2
     L20: A    = 2;     L40: B    = 4;
        .   .   .          .   .   .
     L30: A    = 3;     L50: B    = 5;
     L31: U[3] = B;     L51: V[5] = A;

U[3] = 4 < 5 and V[5] = 2 < 3.

The data shows the following circuit exists:

     (P1,L32,R,4,B,S1) <crw (P2,L50,W,5,B,S1) (<crw by CC1)
                       =cc1 (P2,L50,W,5,B,S2)
                       <upo (P2,L51,R,5,B,S2)
                       <rr  (P2,L52,R,2,A,S2)
                       <crw (P1,L30,W,3,A,S2)
                       =cc1 (P1,L30,W,3,A,S1)
                       <upo (P1,L31,R,3,A,S1)
                       <rr  (P1,L32,R,4,B,S1)

Explanation of the circuit

     (P1,L32,R,4,B,S1) <crw (P2,L50,W,5,B,S1)

See the note on justifying <crw.

     (P2,L50,W,5,B,S1) =cc1 (P2,L50,W,5,B,S2)

By write atomicity the write into B in S1 occurred at the same instant as the write into B in S2.

     (P2,L50,W,5,B,S2) <upo (P2,L51,R,5,B,S2)

By uniprocessor order the write into B in S2 in statement L50 occurred before the read from B in S2 in statement L51.

     (P2,L51,R,5,B,S2) <rr  (P2,L52,R,2,A,S2)

By read order the read event in statement L51 occurred before the read event in statement L52.

The second half of the circuit repeats the reasoning for the first half.

The existence of the circuit shows that the machine failed to obey A(CMP,UPO,RR,CC1).

Cases 7.3 and 7.4.

Example of Case 7.3.

           P1                 P2
     L31: U[3] = B;     L11: V[1] = A;
     L40: A    = 4;     L20: B    = 2;

U[3] = 2 > 1 and V[1] = 4 > 3.

The data shows the following circuit exists:

     (P1,L40,W,4,A,S2) <cwr (P2,L11,R,4,A,S2)
                       <rw  (P2,L20,W,2,B,S1)
                       <cwr (P1,L31,R,2,B,S1)
                       <rw  (P1,L40,W,4,A,S2)

Explanation of the circuit

     (P1,L40,W,4,A,S2) <cwr (P2,L12,R,4,A,S2)

The read of the value 4 from A in S2 must have come after a value of 4 was written into A in S2.

     (P2,L12,R,4,A,S2) <rw  (P2,L20,W,2,B,S1)

By RW the read event in (P2,L12) comes before the write event in (P2,L20).

The second half of the circuit repeats the reasoning for the first half.

The existence of the circuit shows that the machine failed to obey both A(CMP,UPO,RW).

Similarly, the following circuit (based on the same data)

     (P1,L40,W,4,A,S2) <cwr (P2,L11,R,4,A,S2)
                       <srw (P2,L11,W,4,V[1],S1)
                       <ww  (P2,L20,W,2,B,S1)
                       <cwr (P1,L31,R,2,B,S1)
                       <srw (P1,L31,W,2,U[3],S2)
                       <ww  (P1,L40,W,4,A,S2)

shows that the machine failed to obey A(CMP,UPO,WW).

Example from a test of a Sun Sparc 20.

Suppose one is certain (based on the results of other tests) that a machine does not relax the rule of RR. Then relaxed behavior detected by Test T700 shows that the machine relaxed the rule of CC1.

It is instructive to work through a single example in some detail. Here are the results from a test of a Sun Sparc 20. The analysis routine for Test T700 showed that:

U[843] = 501 < 502 and V[502] = 832 < 843.
b=1 c=1 u[    832]:      500     500     500     500     500     500     501
                         501     501     501     501

The relaxed behavior occurred within the following parts of P1 and P2.

              P1                            P2
     L8320: A      = 832;    L5010: B      = 501;
     L8321: -      =   A;    L5011: -      =   B;
     L8322: U[832] =   B;    L5012: V[501] =   A;
     L8330: A      = 833;    L5020: B      = 502;
     L8331: -      =   A;    L5021: -      =   B;
     L8332: U[833] =   B;    L5022: V[502] =   A;
     L8340: A      = 834;    L5030: B      = 503;
     L8341: -      =   A;    L5031: -      =   B;
     L8342: U[834] =   B;    L5032: V[503] =   A;
     L8350: A      = 835;
     L8351: -      =   A;
     L8352: U[835] =   B;
     L8360: A      = 836;
     L8361: -      =   A;
     L8362: U[836] =   B;
     L8370: A      = 837;
     L8371: -      =   A;
     L8372: U[837] =   B;
     L8380: A      = 838;
     L8381: -      =   A;
     L8382: U[838] =   B;
     L8390: A      = 839;
     L8391: -      =   A;
     L8392: U[839      B;
     L8400: A      = 840;
     L8401: -      =   A;
     L8402: U[840] =   B;
     L8410: A      = 841;
     L8411: -      =   A;
     L8412: U[841] =   B;
     L8420: A      = 842;
     L8421: -      =   A;
     L8422: U[842] =   B;
     L8430: A      = 843;
     L8431: -      =   A;
     L8432: U[843] =   B;

The analysis routine printed out a portion of the arrays surrounding the data for the above incident:

      i       U       V      d1      d2      d3      d4
    498     212     820      -1      -2       6       3
    499     214     820      -1      -3       2       4
    500     214     826      -2      -2       3       3
    501     214     828      -3      -2       4       3
    502     216     832      -2      -2       4       2
    503     218     839      -1      -2       3       3
    504     218     844      -2      -2       4       3
    505     218     847      -3      -2       5       3

    835     500     213      -7      -3      15       4
    836     500     213      -8      -4      16       5
    837     500     216      -9      -1      17       3
    838     500     216     -10      -2      18       4
    839     501     218      -7      -1      13       3
    840     501     219      -8      -1      14       2
    841     501     219      -9      -2      15       3
    842     501     221     -10       0      16       3
    843     501     221     -11      -1      17       4
    844     502     221      -5      -2      16       5
    845     502     222      -6      -1      17       3
    846     502     225      -7      -1      18       2

It will be easier to visualize the relationships among the read and write events if the following arcs are drawn.

     From A in L8330 to A in L8430 (by WW),
                     to A in L8431 (by UPO),
                     to B in L8432 (by RR),
                     to B in L5020 (by CRW),
                     to B in L5021 (by UPO),
                     to A in L5022 (by RR),
                     to A in L8330 (by CRW).

Other, secondary arcs, are:

     From A in L8320 to A in L8330 (by WW).
     From A in L8320 to A in L5022 (by CWR).
     From B in L5010 to B in L5020 (by WW).
     From B in L5010 to B in L8432 (by CWR).

Two possibilities exist: either P1 failed to be CC1, or P2 failed to be CC1. Consider the first possibility.

Case 1. P1 failed to be CC1.

(The full portion of P1 and P2 shown above has been abbreviated; some statements not vital to the analysis have been omitted. Also, information has been appended in parentheses to some statements to show the value obtained when the operand in the statement was fetched.)

              P1                            P2
     L8320: A      = 832;
     L8322: U[832] =   B; (= 500)
     L8330: A      = 833;
     L8332: U[833] =   B; (= 500)
     L8340: A      = 834;
     L8342: U[834] =   B; (= 500)
     L8350: A      = 835;
     L8352: U[835] =   B; (= 500)
     L8360: A      = 836;
     L8362: U[836] =   B; (= 500)
     L8370: A      = 837;
     L8372: U[837] =   B; (= 500)
     L8380: A      = 838;
     L8382: U[838] =   B; (= 500)
     L8390: A      = 839;
                                   L5010: B      = 501;
                                   L5012: V[501] =   A; (= 828)
     L8392: U[839] =   B; (= 501)
     L8400: A      = 840;
     L8402: U[840] =   B; (= 501)
     L8410: A      = 841;
     L8412: U[841] =   B; (= 501)
     L8420: A      = 842;
     L8422: U[842] =   B; (= 501)
     L8430: A      = 843;
     L8431: -      =   A;
     L8432: U[843] =   B; (= 501)
                                   L5020: B      = 502;
                                   L5021: -      =   B;
                                   L5022: V[502] =   A; (= 832)

P1 went through its loop eleven times before P2 saw the new value of A. This is the source of the d = 11 value. It gives a rough indication of the amount of time by which a value is delayed in propagating to another processor.

Case 2. P2 failed to be CC1.

              P1                            P2
     L8320: A      = 832;
     L8322: U[832] =   B; (= 500)
                                   L5010: B      = 501;
                                   L5012: V[501] =   A; (= 828)
                                   L5020: B      = 502;
                                   L5021: -      =   B;
                                   L5022: V[502] =   A; (= 832)
     L8330: A      = 833;
     L8332: U[833] =   B; (= 500)
     L8340: A      = 834;
     L8342: U[834] =   B; (= 500)
     L8350: A      = 835;
     L8352: U[835] =   B; (= 500)
     L8360: A      = 836;
     L8362: U[836] =   B; (= 500)
     L8370: A      = 837;
     L8372: U[837] =   B; (= 500)
     L8380: A      = 838;
     L8382: U[838] =   B; (= 500)
     L8390: A      = 839;
     L8392: U[839] =   B; (= 501)
     L8400: A      = 840;
     L8402: U[840] =   B; (= 501)
     L8410: A      = 841;
     L8412: U[841] =   B; (= 501)
     L8420: A      = 842;
     L8422: U[842] =   B; (= 501)
     L8430: A      = 843;
     L8431: -      =   A;
     L8432: U[843] =   B; (= 501)

In this sequence P2 set B to 500, then to 501, then to 502. Subsequent to P2 setting B to 502, P1 fetched the value of B eleven times before it saw B take on the value 502. Six times it saw a value of 500 and five times it saw a value of 501. It is interesting to speculate on how P1 could receive an invalidate for B from P2 and then to refetch B, only to receive an already outdated value (501) instead of the current value (502).

To aid in the reconstruction of such sequences additional information is printed out following the initial diagnostic statement. Specifically, after the line:

U[843] = 501 < 502 and V[502] = 832 < 843.

there appear these lines:

b=1 c=1 u[    832]:      500     500     500     500     500     500     501
                         501     501     501     501

This displays the eleven values visible in the detected relaxation, starting at u[832]. Two other numbers are displayed. The b ("badest") value shows the difference between the last and the first of the eleven numbers. The c ("copies") value shows the number of changes in values seen in the eleven numbers. The values of b and c have been seen to range up to 3 or 4.

The same analysis applied to the results of running Test T1200 also show details on nonatomic events. On Tests T400 and T1100 the same analysis shows details on out of order execution.

Test T710 is identical to Test T700, except that B is a floating point operand while A is still a fixed point operand.

Test T720 is identical to Test T700, except that both A and B are floating point operands.

Test T800.  Seek a relaxation
      of both A(CMP,UPO,CC3) and A(CMP,UPO,WW,WR),
   or of      A(CMP,UPO,CC3),    A(CMP,UPO,WW), and A(CMP,UPO,RW).


     Initially, A = W[i,j] = 0, i = 0 to K, and j = 1,2,3,4.

       P1                P2                P3                P4
L11: A     = 11;  L11: A     = 12;  L11: A     = 13;  L11: A     = 14;
L12: W[1,1] = A;  L12: W[1,2] = A;  L12: W[1,3] = A;  L12: W[1,4] = A;
L21: A     = 21;  L21: A     = 22;  L21: A     = 23;  L21: A     = 24;
L22: W[2,1] = A;  L22: W[2,2] = A;  L22: W[2,3] = A;  L22: W[2,4] = A;
L31: A     = 31;  L31: A     = 32;  L31: A     = 33;  L31: A     = 34;
L32: W[3,1] = A;  L32: W[3,2] = A;  L32: W[3,3] = A;  L32: W[3,4] = A;
L41: A     = 41;  L41: A     = 42;  L41: A     = 43;  L41: A     = 44;
L42: W[4,1] = A;  L42: W[4,2] = A;  L42: W[4,3] = A;  L42: W[4,4] = A;
L51: A     = 51;  L51: A     = 52;  L51: A     = 53;  L51: A     = 54;
L52: W[5,1] = A;  L52: W[5,2] = A;  L52: W[5,3] = A;  L52: W[5,4] = A;
L61: A     = 61;  L61: A     = 62;  L61: A     = 63;  L61: A     = 64;
L62: W[6,1] = A;  L62: W[6,2] = A;  L62: W[6,3] = A;  L62: W[6,4] = A; etc.

Seek 8.0: W[i,j] >= W[i+,j]; W[i,j] and W[i+,j] are congruent modulo 10, where i+ > i.

To show: Not A(CMP,UPO,URR,WW) and not A(CMP,UPO,URR,CC3).

Seek 8.1. W[-,i] < W[-,j] < ... < W[-,i]

where each element of {1,2,3,4} occurs at most once as one of the subscripts i, j, ....

To show: Not A(CMP,UPO,CC3) and not A(CMP,UPO,WW,WR).

Seek 8.2. W[-,i] > W[-,j] > ... > W[-,i]

where each element of {1,2,3,4} occurs at most once as one of the subscripts i, j, ....

To show: Not A(CMP,UPO,CC3) and not A(CMP,UPO,RW).

There are two different forms of circuit that can be seen in the data coming from Test T8 when run on a machine which relaxes the rule of CC3. The easiest way to understand them is to work through examples.

Case 8.1. W[-,i] < W[-,j] < ... < W[-,i].

Example of Case 8.1.

Assume these values exist in array W after running the above code.

     W[3,1] = 42 < 52  (52 determines the subscripts in W[5,2], etc.)
     W[5,2] = 23 < 33
     W[3,3] = 54 < 64
     W[6,4] = 21 < 31

       P1                P2                P3                P4
L21: A     = 21;  L41: A     = 42;  L21: A     = 23;  L51: A     = 54;
    .   .   .         .   .   .         .   .   .         .   .   .
L31: A     = 31;  L51: A     = 52;  L31: A     = 33;  L61: A     = 64;
L32: W[3,1] = A;  L52: W[5,2] = A;  L32: W[3,3] = A;  L62: W[6,4] = A;

Before dealing with CC3, it is easy first to exhibit a circuit which shows that the execution could not occur on a machine which obeys A(CMP,UPO,CC1).

     (P1,L31,W,A,31,S1) <upo (P1,L32,R,A,42,S1)
                        <crw (P2,L51,W,A,52,S1)   (<crw by CC1)
                        =cc1 (P2,L51,W,A,52,S2)
                        <upo (P2,L52,R,A,23,S2)
                        <crw (P3,L31,W,A,33,S2)
                        =cc1 (P3,L31,W,A,33,S3)
                        <upo (P3,L32,R,A,54,S3)
                        <crw (P4,L61,W,A,64,S3)
                        =cc1 (P4,L61,W,A,64,S4)
                        <upo (P4,L62,R,A,21,S4)
                        <crw (P1,L31,W,A,31,S4)
                        =cc1 (P1,L31,W,A,31,S1)

Now suppose the execution occurs on a machine which is only CC3, not CC1. We still know that these four paths occur among the events of the execution.

(P1,L31,W,A,31,S1) <upo (P1,L32,R,A,42,S1) <crw (P2,L51,W,A,52,S1)
(P2,L51,W,A,52,S2) <upo (P2,L52,R,A,23,S2) <crw (P3,L31,W,A,33,S2)
(P3,L31,W,A,33,S3) <upo (P3,L32,R,A,54,S3) <crw (P4,L61,W,A,64,S3)
(P4,L61,W,A,64,S4) <upo (P4,L62,R,A,21,S4) <crw (P1,L31,W,A,31,S4)
                                             (<crw by CC3)

From the first path

(P1,L31,W,A,31,S1) <upo (P1,L32,R,A,42,S1) <crw (P2,L51,W,A,52,S1)

we conclude that:

(P1,L31,W,A,31,S1) <cc3 (P2,L51,W,A,52,S1)

and from the definition of CC3 we have:

(P1,L31,W,A,31,Si) <cc3 (P2,L51,W,A,52,Si), for i=1,2,3,4.

(P2,L51,W,A,52,Si) <cc3 (P3,L31,W,A,33,Si),
(P3,L31,W,A,33,Si) <cc3 (P4,L61,W,A,64,Si), and
(P4,L61,W,A,64,Si) <cc3 (P1,L31,W,A,31,Si), for i=1,2,3,4.

Thus these circuits must exist among the events of the execution:

(P1,L31,W,A,31,Si) <cc3 (P2,L51,W,A,52,Si)
                   <cc3 (P3,L31,W,A,33,Si)
                   <cc3 (P4,L61,W,A,64,Si)
                   <cc3 (P1,L31,W,A,31,Si), for i=1,2,3,4.

Consequently, these values could not have been calculated on a machine which obeys A(CMP,UPO,CC3).

Assume the machine obeys A(CMP,UPO,WW,WR). Then the following circuit exists among the events of the execution:

     (P1,L31,W,A,31,S4) <wr  (P1,L32,R,A,42,S1)
                        <crw (P2,L51,W,A,52,S1)   (<crw by WW)
                        <wr  (P2,L52,R,A,23,S2)
                        <crw (P3,L31,W,A,33,S2)
                        <wr  (P3,L32,R,A,54,S3)
                        <crw (P4,L61,W,A,64,S3)
                        <wr  (P4,L62,R,A,21,S4)
                        <crw (P1,L31,W,A,31,S4)

Consequently, the execution could not have occurred on a machine which obeyed A(CMP,UPO,WW,WR).

Case 8.2. W[-,i] > W[-,j] > ... > W[-,i].

Seek a relaxation of A(CMP,UPO,CC3).

     W[-,i] > W[-,j] > ... > W[-,i]

where each element of {1,2,3,4} occurs at most once as one of the subscripts i, j, ....

Assume these values exist in array W after running the above code.

     W[1,4] = 53 > 43  (43 determines the subscript in W[4,3], etc.)
     W[4,3] = 32 > 22
     W[2,2] = 51 > 41
     W[4,1] = 24 > 14

       P1                P2                P3                P4
L42: W[4,1] = A;  L22: W[2,2] = A;  L42: W[4,3] = A;  L12: W[1,4] = A;
L51: A     = 51;  L31: A     = 32;  L51: A     = 53;  L21: A     = 24;

Then this circuit exists among the events of the execution:

     (P4,L21,W,A,24,S1) <cwr (P1,L42,R,A,24,S1)
                        <upo (P1,L51,W,A,51,S1)
                        =cc1 (P1,L51,W,A,51,S2)
                        <cwr (P2,L22,R,A,51,S2)
                        <upo (P2,L31,W,A,32,S2)
                        =cc1 (P2,L31,W,A,32,S3)
                        <cwr (P3,L42,R,A,32,S3)
                        <upo (P3,L51,W,A,53,S3)
                        =cc1 (P3,L51,W,A,53,S4)
                        <cwr (P4,L12,R,A,53,S4)
                        <upo (P4,L21,W,A,24,S4)
                        =cc1 (P4,L21,W,A,24,S1)

If these values occur on a machine which obeys A(CMP,UPO,CC3), then the following paths exist:

(P4,L21,W,A,24,S1) <cwr (P1,L42,R,A,24,S1) <upo (P1,L51,W,A,51,S1)
(P1,L51,W,A,51,S2) <cwr (P2,L22,R,A,51,S2) <upo (P2,L31,W,A,32,S2)
(P2,L31,W,A,32,S3) <cwr (P3,L42,R,A,32,S3) <upo (P3,L51,W,A,53,S3)
(P3,L51,W,A,53,S4) <cwr (P4,L12,R,A,53,S4) <upo (P4,L21,W,A,24,S4)

From the first path we deduce that

(P4,L21,W,A,24,S1) <cc3 (P1,L51,W,A,51,S1)

By the definition of CC3, it must be that

(P4,L21,W,A,24,Si) <cc3 (P1,L51,W,A,51,Si), for i=1,2,3,4.

(P1,L51,W,A,51,Si) <cc3 (P2,L31,W,A,32,Si),
(P2,L31,W,A,32,Si) <cc3 (P3,L51,W,A,53,Si), and
(P3,L51,W,A,53,Si) <cc3 (P4,L21,W,A,24,Si), for i=1,2,3,4.

Combining the above we have the existence of the following circuits:

     (P4,L21,W,A,24,Si) <cc3 (P1,L51,W,A,51,Si)
                        <cc3 (P2,L31,W,A,32,Si)
                        <cc3 (P3,L51,W,A,53,Si)
                        <cc3 (P4,L21,W,A,24,Si), for i=1,2,3,4.

Consequently, the execution could not have occurred on a machine obeying A(CMP,UPO,CC3).

Assume the same values occur on a machine which obeys A(CMP,UPO,RW). Then the following circuit exists.

     (P4,L21,W,A,24,S1) <cwr (P1,L42,R,A,24,S1)
                        <rw  (P1,L51,W,A,51,S2)
                        <cwr (P2,L22,R,A,51,S2)
                        <rw  (P2,L31,W,A,32,S3)
                        <cwr (P3,L42,R,A,32,S3)
                        <rw  (P3,L51,W,A,53,S4)
                        <cwr (P4,L12,R,A,53,S4)
                        <rw  (P4,L21,W,A,24,S1)

Consequently, the execution could not have occurred on a machine obeying A(CMP,UPO,RW).

Assume the same values occur on a machine which obeys A(CMP,UPO,WW). Then the following circuit exists.

     (P4,L21,W,A,24,S1) <cwr (P1,L42,R,A,24,S1)
                        <srw (P1,L42,W,A,24,S2)
                        <ww  (P1,L51,W,A,51,S2)
                        <cwr (P2,L22,R,A,51,S2)
                        <srw (P2,L22,W,A,51,S3)
                        <ww  (P2,L31,W,A,32,S3)
                        <cwr (P3,L42,R,A,32,S3)
                        <srw (P3,L42,W,A,32,S4)
                        <ww  (P3,L51,W,A,53,S4)
                        <cwr (P4,L12,R,A,53,S4)
                        <srw (P4,L12,W,A,53,S1)
                        <ww  (P4,L21,W,A,24,S1)

Consequently, the execution could not have occurred on a machine obeying A(CMP,UPO,WW).

The output from T8 allows one to see a phenomenon called convoys: two or more processors see all or part of the same sequence of values for the operand A. Different machines produce convoys in different frequencies and lengths; each machine is quite consistent in the frequency and length of convoy it produces. No explanation for the occurence of convoys has yet been offerred. No judgement has yet been made on whether or not the existence of convoys on a machine is desirable. ARCHTEST prints out the five longest convoys a machine produces (up to some limit on the lines of output). A complete analysis of the output from T8 can be obtained with the program T8CONVOY. (For further information on T8CONVOY see the file HOWTORUN.

Test T820 is identical to Test T800, except that A is a floating point operand.

Test T900.  Seek a relaxation of A(CMP,UPO,URR,CC3).


       P1       P2         P3         P4         P5        P6
     A = 0;  U[0] = A;  V[0] = A;  X[0] = A;  Y[0] = A;  A = 1;
     A = 2;  U[1] = A;  V[1] = A;  X[1] = A;  Y[1] = A;  A = 3;
     A = 4;  U[2] = A;  V[2] = A;  X[2] = A;  Y[2] = A;  A = 5;
     A = 6;  U[3] = A;  V[3] = A;  X[3] = A;  Y[3] = A;  A = 7;
     A = 8;  U[4] = A;  V[4] = A;  X[4] = A;  Y[4] = A;  A = 9;
     A =10;  U[5] = A;  V[5] = A;  X[5] = A;  Y[5] = A;  A =11;
     A =12;  U[6] = A;  V[6] = A;  X[6] = A;  Y[6] = A;  A =13;
     A =14;  U[7] = A;  V[7] = A;  X[7] = A;  Y[7] = A;  A =15; etc.

     Seek 9.0.  U[i] > U[j], i<j, U[i],U[j] with same parity. d=U[j]-U[i].
     Seek 9.0.  V[i] > V[j], i<j, V[i],V[j] with same parity. d=V[j]-V[i].
     Seek 9.0.  X[i] > X[j], i<j, X[i],X[j] with same parity. d=X[j]-X[i].
     Seek 9.0.  Y[i] > Y[j], i<j, Y[i],Y[j] with same parity. d=Y[j]-Y[i].

       To show:  Not A(CMP,UPO,URR,CC3).

     Seek       U[i] < V[j] and V[j+] < U[i-1], j+>j.  d = V[j+] - U[i-1].

       9.1.  U[i] and V[j] are odd,  and V[j+] and U[i-1] are even.
       9.2.  U[i] and V[j] are even, and V[j+] and U[i-1] are odd.

             where j+ > j.

       To show:  Not A(CMP,UPO,CC3).

Repeat the above tests for arrays U & X, U & Y, V & X, V & Y, and X & Y.

 Case 9.1.  U[5] = 6,
            U[6] = 7 < V[4] = 9, and
                       V[5] = 4.

         P1          P2             P3             P6
     L2: A = 4;  L5: U[5] = A;  L4: V[4] = A;  L3: A = 7;
     L3: A = 6;  L6: U[6] = A;  L5: V[5] = A;  L4: A = 9;

Assume the machine obeys A(CMP,UPO,URR,CC1). Then the following circuit exists.

     (P2,L5,R,A,6,S2) <urr (P2,L6,R,A,7,S2)
                      <crw (P6,L4,W,A,9,S2) (<crw by CC1)
                      =cc1 (P6,L4,W,A,9,S3)
                      <cwr (P3,L4,R,A,9,S3)
                      <urr (P3,L5,R,A,4,S3)
                      <crw (P1,L3,W,A,6,S3)
                      =cc1 (P1,L3,W,A,6,S2)
                      <cwr (P2,L5,R,A,6,S2)

Assume the machine obeys A(CMP,UPO,URR,CC3).

We know that

     (P6,L4,W,A,9,S3) <cwr (P3,L4,R,A,9,S3)
                      <urr (P3,L5,R,A,4,S3)
                      <crw (P1,L3,W,A,6,S3) (<crw by CC3)

and so

     (P6,L4,W,A,9,Si) <cc3 (P1,L3,W,A,6,Si), for i=1,2,...,6.

Similarly,

     (P1,L3,W,A,6,S2) <cwr (P2,L5,R,A,6,S2)
                      <urr (P2,L6,R,A,7,S2)
                      <crw (P6,L4,W,A,9,S2)

leads to

     (P1,L3,W,A,6,Si) <cc3 (P6,L4,W,A,9,Si), for i=1,2,...,6.

Thus, the following circuits exist:

     (P1,L3,W,A,6,Si) <cc3 (P6,L4,W,A,9,Si)
                      <cc3 (P1,L3,W,A,6,Si), for i=1,2,...,6.

and so the machine could not have obeyed A(CMP,UPO,URR,CC3).

 Case 9.2.  U[6] = 5,
            U[7] = 6 < V[3] = 8, and
                       V[4] = 3.

         P1          P2             P3             P6
     L3: A = 6;  L6: U[6] = A;  L3: V[3] = A;  L1: A = 3;
     L4: A = 8;  L7: U[7] = A;  L4: V[4] = A;  L2: A = 5;

Assume the machine obeys A(CMP,UPO,URR,CC1). Then the following circuit exists.

     (P2,L6,R,A,5,S2) <urr (P2,L7,R,A,6,S2)
                      <crw (P1,L4,W,A,8,S2) (<crw by CC1)
                      =cc1 (P1,L4,W,A,8,S3)
                      <cwr (P3,L3,R,A,8,S3)
                      <urr (P3,L4,R,A,3,S3)
                      <crw (P6,L2,W,A,5,S3)
                      =cc1 (P6,L2,W,A,5,S2)
                      <cwr (P2,L6,R,A,5,S2)

Assume the machine obeys A(CMP,UPO,URR,CC3).

We know that

     (P1,L4,W,A,8,S3) <cwr (P3,L3,R,A,8,S3)
                      <urr (P3,L4,R,A,3,S3)
                      <crw (P6,L2,W,A,5,S3) (<crw by CC3)

and so

     (P1,L4,W,A,8,Si) <cc3 (P6,L2,W,A,5,Si), for i=1,2,...,6.

Similarly,

     (P6,L2,W,A,5,S2) <cwr (P2,L6,R,A,5,S2)
                      <urr (P2,L7,R,A,6,S2)
                      <crw (P1,L4,W,A,8,S2)

and so

     (P6,L2,W,A,5,Si) <cc3 (P1,L4,W,A,8,Si), for i=1,2,...,6.

Thus, the following circuits exist:

     (P1,L4,W,A,8,Si) <cc3 (P6,L2,W,A,5,Si)
                      <cc3 (P1,L4,W,A,8,Si), for i=1,2,...,6.

and so the machine could not have obeyed A(CMP,UPO,URR,CC3).

Test T900 is similar to Test T600, except that there is only one shared operand instead of two. The test seeks to find a failure to obey CC3, instead of CC1.

Test T920 is identical to Test T900, except that A is a floating point operand.

Test T10.  Performance impact of cache misses.

These programs are experimental.

The intent behind these programs is to provide an indication of the delay caused to a processor by a cache miss. As of this date (June 1, 1996) the programs have not produced useful information. Further experimentation will follow. The following information will serve to explain the intent behind the tests. Further information will follow.

In each of these tests the value of A is set only by P1; it serves as a clock value. Of course, P1 can be interrupted and so there can be large gaps in successive values of A, but if large gaps are ignored and if many data points are considered, then the value of A can be taken as a reasonably valid clock value. (While accesssing a system clock would yield more precision, it might not yield greater accuracy, since the instruction to access the clock may be a synchronized instruction itself.)

P2 alternately records the value of A in array U and changes the value of operand B. If B was in the invalid state, it will move to the exclusive state.

P3 changes the value of B and then dwells idly. If B was in the invalid state, it will move to the exclusive state.

The increasing values of B in P2 and P3 are irrelevant.

The changes in the states of B in P2 and P3 are summarized in the title line: inv -> excl & inv -> excl.

The differences in the successive values in the array U should consist of two distributions. Most of the differences will be short. Some will be longer. The longer differences reflect the impact of a hit by P3 on P2's cache. The shorter differences reflect the lack of a hit. The differences in the averages for the two distributions reflects the effect of a cache hit (measured in terms of iterations of the loop in P1).

See the file HOWTORUN to set runtime paramters for T10.

Test T1000. P2: inv -> excl, and P3: inv -> excl.

     Initially, A = B = U[i] = V[i] = 0, i = 0 to K.

       P1       P2        P3
     A = 0;   U[0] = A;   B = 1;
     A = 1;   B    = 1;   B = 1;
     A = 2;   U[1] = A;   B = 1;
     A = 3;   B    = 2;   B = 1;
     A = 4;   U[2] = A;   B = 1;
     A = 5;   B    = 3;   B = 1;
     A = 6;   U[3] = A;   B = 1;
     A = 7;   B    = 4;   B = 1;
     A = 8;   U[4] = A;   B = 1; etc.



Test T1010.  P2: inv -> excl, and P3: inv -> rr.

     Initially, A = B = U[i] = V[i] = 0, i = 0 to K.

       P1       P2        P3
     A = 0;   U[0] = A;   - = B;
     A = 1;   B    = 1;   - = B;
     A = 2;   U[1] = A;   - = B;
     A = 3;   B    = 2;   - = B;
     A = 4;   U[2] = A;   - = B;
     A = 5;   B    = 3;   - = B;
     A = 6;   U[3] = A;   - = B; .
     A = 7;   B    = 4;   - = B;
     A = 8;   U[4] = A;   - = B; etc.



Test T1020.  P2: inv -> rr, and P3: inv -> excl.

     Initially, A = B = U[i] = V[i] = 0, i = 0 to K.

       P1       P2        P3
     A = 0;   U[0] = A;   B = 1;
     A = 1;   -    = B;   B = 1;
     A = 2;   U[1] = A;   B = 1;
     A = 3;   -    = B;   B = 1;
     A = 4;   U[2] = A;   B = 1;
     A = 5;   -    = B;   B = 1;
     A = 6;   U[3] = A;   B = 1;
     A = 7;   -    = B;   B = 1;
     A = 8;   U[4] = A;   B = 1; etc.



Test T1030.  P2: rr -> rr, and P3: rr -> rr.

     Initially, A = B = U[i] = V[i] = 0, i = 0 to K.

       P1       P2        P3
     A = 0;   U[0] = A;   - = B;
     A = 1;   -    = B;   - = B;
     A = 2;   U[1] = A;   - = B;
     A = 3;   -    = B;   - = B;
     A = 4;   U[2] = A;   - = B;
     A = 5;   -    = B;   - = B;
     A = 6;   U[3] = A;   - = B;
     A = 7;   -    = B;   - = B;
     A = 8;   U[4] = A;   - = B; etc.

This test, of course, will show only a single maximum since B will enter and remain in the read_only state on both processors for the entire duration of the run.

Test T1100.  Seek a relaxation of
both A(CMP,UPO,WW,WR) and A(CMP,UPO,WR,CC3), or of
both A(CMP,UPO,WW)    and A(CMP,UPO,RW).


     Initially, A = B = C = D =
                U[i] = V[i] = X[i] = Y[i] = 0, i = 0 to K.

         P1           P2           P3           P4
     A     = 0;   B     = 0;   C     = 0;   D     = 0;
     U[ 1] = B;   V[ 1] = A;   X[ 1] = D;   Y[ 1] = C;
     U[ 2] = C;   V[ 2] = D;   X[ 2] = A;   Y[ 2] = B;
     U[ 3] = D;   V[ 3] = C;   X[ 3] = B;   Y[ 3] = A;
     A     = 1;   B     = 1;   C     = 1;   D     = 1;
     U[ 5] = B;   V[ 5] = A;   X[ 5] = D;   Y[ 5] = C;
     U[ 6] = C;   V[ 6] = D;   X[ 6] = A;   Y[ 6] = B;
     U[ 7] = D;   V[ 7] = C;   X[ 7] = B;   Y[ 7] = A;
     A     = 2;   B     = 2;   C     = 2;   D     = 2;
     U[ 9] = B;   V[ 9] = A;   X[ 9] = D;   Y[ 9] = C;
     U[10] = C;   V[10] = D;   X[10] = A;   Y[10] = B;
     U[11] = D;   V[11] = C;   X[11] = B;   Y[11] = A; etc.

     Seek 1100.0.  U[i] > U[i+4].                 d = U[i+4] - U[i].
     Seek 1100.0.  V[i] > V[i+4].                 d = V[i+4] - V[i].
     Seek 1100.0.  X[i] > X[i+4].                 d = X[i+4] - X[i].
     Seek 1100.0.  Y[i] > Y[i+4].                 d = Y[i+4] - Y[i].

       To show:  Not A(CMP,UPO,URR,WW) and not A(CMP,UPO,URR,CC3).

     Seek 1100.1.  U[4i+1] < j and V[4j+1] < i.   d = V[4j+1] - i.
     Seek 1100.2.  U[4i+1] > j and V[4j+1] > i.   d = i - V[4j+1].

     Seek 1100.1.  U[4i+2] < j and X[4j+2] < i.   d = X[4j+2] - i.
     Seek 1100.2.  U[4i+2] > j and X[4j+2] > i.   d = i - X[4j+2].

     Seek 1100.1.  U[4i+3] < j and Y[4j+3] < i.   d = Y[4j+3] - i.
     Seek 1100.2.  U[4i+3] > j and Y[4j+3] > i.   d = i - Y[4j+3].

     Seek 1100.1.  V[4j+3] < j and X[4j+3] < i.   d = X[4j+3] - i.
     Seek 1100.2.  V[4j+3] > j and X[4j+3] > i.   d = i - X[4j+3].

     Seek 1100.1.  V[4j+2] < j and Y[4j+2] < i.   d = Y[4j+2] - i.
     Seek 1100.2.  V[4j+2] > j and Y[4j+2] > i.   d = i - Y[4j+2].

     Seek 1100.1.  X[4j+1] < j and Y[4j+1] < i.   d = Y[4j+1] - i.
     Seek 1100.2.  X[4j+1] > j and Y[4j+1] > i.   d = i - Y[4j+1].

     Seek xxxx.1

       To show:  Not A(CMP,UPO,WW,WR) and not A(CMP,UPO,WR,CC3).

     Seek xxxx.2

       To show:  Not A(CMP,UPO,WW) and not A(CMP,UPO,RW).

Test T1100 tests relaxations in the same manner as Test T400. However, for this test there are four shared operands instead of just two, and so there are 4!2!/2! = 6 pairs of arrays to test.

See the discussion of Test T700 for further information that can be deduced from the results of running Test T1100.

Test T1110 is identical to Test T1100, except that C and D are floating point operands while A and B are still fixed point operands.

Test T1120 is identical to Test T1100, except that A, B, C, and D are all floating point operands.

Test T1200.  Seek a relaxation of A(CMP,UPO,RR,CC1)
or of both A(CMP,UPO,WW) and A(CMP,UPO,RW).


     Initially, A = B = C = D =
                U[i] = V[i] = X[i] = Y[i] = 0, i = 0 to K.

         P1           P2           P3           P4
     A     = 0;   B     = 0;   C     = 0;   D     = 0;
     U[ 0] = A;   V[ 0] = B;   X[ 0] = C;   Y[ 0] = D;
     U[ 1] = B;   V[ 1] = A;   X[ 1] = D;   Y[ 1] = C;
     U[ 2] = C;   V[ 2] = D;   X[ 2] = A;   Y[ 2] = B;
     U[ 3] = D;   V[ 3] = C;   X[ 3] = B;   Y[ 3] = A;
     A     = 1;   B     = 1;   C     = 1;   D     = 1;
     U[ 4] = A;   V[ 4] = B;   X[ 4] = C;   Y[ 4] = D;
     U[ 5] = B;   V[ 5] = A;   X[ 5] = D;   Y[ 5] = C;
     U[ 6] = C;   V[ 6] = D;   X[ 6] = A;   Y[ 6] = B;
     U[ 7] = D;   V[ 7] = C;   X[ 7] = B;   Y[ 7] = A;
     A     = 2;   B     = 2;   C     = 2;   D     = 2;
     U[ 8] = A;   V[ 8] = B;   X[ 8] = C;   Y[ 8] = D;
     U[ 9] = B;   V[ 9] = A;   X[ 9] = D;   Y[ 9] = C;
     U[10] = C;   V[10] = D;   X[10] = A;   Y[10] = B;
     U[11] = D;   V[11] = C;   X[11] = B;   Y[11] = A; etc.

     Seek 1200.0.  U[i] > U[i+4].                 d = U[i+4] - U[i].
     Seek 1200.0.  V[i] > V[i+4].                 d = V[i+4] - V[i].
     Seek 1200.0.  X[i] > X[i+4].                 d = X[i+4] - X[i].
     Seek 1200.0.  Y[i] > Y[i+4].                 d = Y[i+4] - Y[i].

       To show:  Not A(CMP,UPO,URR,WW) and not A(CMP,UPO,URR,CC3).

     Seek 1200.1.  U[4i+1] < j and V[4j+1] < i.   d = V[4j+1] - i.
     Seek 1200.2.  U[4i+1] > j and V[4j+1] > i.   d = i - V[4j+1].

     Seek 1200.1.  U[4i+2] < j and X[4j+2] < i.   d = X[4j+2] - i.
     Seek 1200.2.  U[4i+2] > j and X[4j+2] > i.   d = i - X[4j+2].

     Seek 1200.1.  U[4i+3] < j and Y[4j+3] < i.   d = Y[4j+3] - i.
     Seek 1200.2.  U[4i+3] > j and Y[4j+3] > i.   d = i - Y[4j+3].

     Seek 1200.1.  V[4j+3] < j and X[4j+3] < i.   d = X[4j+3] - i.
     Seek 1200.2.  V[4j+3] > j and X[4j+3] > i.   d = i - X[4j+3].

     Seek 1200.1.  V[4j+2] < j and Y[4j+2] < i.   d = Y[4j+2] - i.
     Seek 1200.2.  V[4j+2] > j and Y[4j+2] > i.   d = i - Y[4j+2].

     Seek 1200.1.  X[4j+1] < j and Y[4j+1] < i.   d = Y[4j+1] - i.
     Seek 1200.2.  X[4j+1] > j and Y[4j+1] > i.   d = i - Y[4j+1].

     Seek xxxx.1

       To show:  Not A(CMP,UPO,RR,CC1).

     Seek xxxx.2

       To show:  Not A(CMP,UPO,WW) and not A(CMP,UPO,RW);

Test T1200 tests for relaxations in the same manner as Test T700. However, for this test there are four shared operands instead of just two, and so there are 4!2!/2! = 6 pairs of arrays to test.

See the discussion of Test T700 for further information that can be deduced from the results of running Test T1200.

Test T1210 is identical to Test T1200, except that C and D are floating point operands while A and B are still fixed point operands.

Test T1220 is identical to Test T1200, except that A, B, C, and D are all floating point operands.

Analysis of the Output from ARCHTEST

Case 1.0. U[i] > U[i+1].

Explanation of the circuit

Case 2.0. U[i] > U[i+1].

Explanation of the circuit

Case 2.1. V[i] > U[i+1]

Explanation of the circuit

Case 2.2. U[i] > V[i] + 1.

Explanation of the circuit

Case 3.1. 0 < U[i+] < V[j] < V[j+] < U[i] is is is is odd even odd even

Explanation of the circuit

Case 3.2. 0 < U[i+] < V[j] < V[j+] < U[i] is is is is even odd even odd

Cases 4.1 and 4.2.

Explanation of the circuit

Cases 4.3 and 4.4.

Explanation of the circuit

Case 5.1. U[i] > V[i].

Explanation of the circuit

Cases 6.1 and 6.2.

Explanation of the circuit

Cases 7.1 and 7.2.

Explanation of the circuit

Cases 7.3 and 7.4.

Explanation of the circuit

Example from a test of a Sun Sparc 20.

Case 1. P1 failed to be CC1.

Case 2. P2 failed to be CC1.

Case 8.1. W[-,i] < W[-,j] < ... < W[-,i].

Case 8.2. W[-,i] > W[-,j] > ... > W[-,i].

Case 9.1. U[5] = 6, U[6] = 7 < V[4] = 9, and V[5] = 4.

Case 9.2. U[6] = 5, U[7] = 6 < V[3] = 8, and V[4] = 3.

Test T1000. P2: inv -> excl, and P3: inv -> excl.

Test T1010. P2: inv -> excl, and P3: inv -> rr.

Test T1020. P2: inv -> rr, and P3: inv -> excl.

Test T1030. P2: rr -> rr, and P3: rr -> rr.