Output from ARCHTEST is tagged with "> ".
For a description of the various forms of data generated by ARCHTEST, see OUTPUT. For an explanation of how to interpret the results of running ARCHTEST, see ANALYSIS. The first step is to compile ARCHTEST. If you have not done this before, you may need to write some code to start and stop multithreading. See Compile-time Parameters for directions. Execute ARCHTEST. This initiates what in the following is called a job. A job consists of a sequence of one or more runs. Each run consists of ARCHTEST performing some set of tests and producing output which describes the results of the tests.
The first step for ARCHTEST is to read the parameters from a parameter file and to identify any errors.
> Read parameters from file: parmfile. > > Parm bogus1= was not recognized. > Parm bogus2 was not recognized. > > Error#66. GET_PARM_FILE(): 2 errors were observed while interpreting the parm file. > > Read parameters from file: pf2Then ARCHTEST reads any secondary parameter file, in this case, PF2.
> This is a header line from the PARMFILE file. > There can be up to 10 header lines per run, coming either > from the parm files or from the user at run time. > This is a header line from secondary parm file: pf2.Header lines allow a user to record the particulars of a given run. Header lines can be specified by either of the parameter files, or interactively by the user. A blank line terminates the lines from the user.
Once the header lines are specified, ARCHTEST presents from 1 to 8 screens (called pages) where run-time parameters can be specified. (These are the same parameters that can be specified in the parameter files.)
Page 1 consists mostly of a Table of Contents for the other pages. Page 2 allows a user to define the basic run-time parameters and to commence execution of ARCHTEST. Once the user has created a satisfactory parameter file, pages other than Page 2 will rarely be of interest. Therefore, ARCHTEST presents Page 2 first.
>Page 2. Set basic run time parameters. > 1. Go to Page _?_ (2 to execute; 1 for Table of Contents). > 2. Execute the tests shown below. > 3. Specify how to generate data: run_mode = [1] > 1. Perform a real multiprocessor test. > 2. Execute the mp test code serially and with only one thread. > 3. Generate relaxed test data. > 4. Generate strong test data (no relaxed events). > 5. Process output from simulation. > 4. 4 Number of threads (from 2 to 8). > 5. 200000 Length of arrays. > 6. 0 batch_run. > 7. 4 batch_lim. > 8. Add header lines for this run only. > 9 Select tests to run (Page 3). > These tests are set to run: > [ 200 400 700The full set of pages is described at Run-time Parameters.
For the simplest run, enter '2'. The tests will then execute. If no other options are chosen, ARCHTEST runs the specified tests and then prints a summary of the run.
Under the assumption that the results of the run may be disseminated elsewhere, the licensee and the area in which ARCHTEST is to be used are identified. Also the web site containing documentation on ARCHTEST is identified.
> ============================================================================== > > Summary of run: 00:28:16.609 020423 > ARCHTEST. Version 5.6. 020422. By W. W. Collier, collier@acm.org > > License No. Sxxxx from Multiprocessor Diagnostics authorizes > use of this copy of ARCHTEST by the xxxxx Company > in xxxx county of xxxx state. > Documentation is available via www.mpdiag.com.The parameters for the number of threads and the length of the arrays are repeated.
> Number of threads: 2. > Length of the arrays: 200000.Other parameters listed here are explained in Compile-time Parameters and Run-time Parameters.
> Frequency of fetching shared operands before stores: sometimes > Line count: 0. > Action count: 0. > Interference count: 0.Since the technology behind ARCHTEST is novel, the documentation attempts to educate users in the significance of the results of each run.
First, the tests that have just been run are identified. Along with each test is a description of the architectures whose relaxation the test sought to detect.
> Each test in ARCHTEST seeks to observe a relaxation of behavior > defined by one or another subset of the following seven rules: > URR, RR, WW, RW, WR, CC1, and CC3. Here are the tests that have > just been run and the architectures whose relaxation > the tests have sought to detect. > T2xx. A(CMP,UPO,RR,WW) > T3xx. A(CMP,UPO,URR,WW) > T4xx. Both A(CMP,UPO,WW,WR) and (A(CMP,UPO,WR,CC3), > or both A(CMP,UPO,WW) and (A(CMP,UPO,RW) > T5xx. A(CMP,UPO,RR,CC1) >A second list of the tests just run shows the time it took to execute each test, the time taken to analyze the results, the number of integer and/or floating point operands employed in the test, and, most important, a number called the d value. The d value measures how relaxed a machine was found to be on a given test. Zero or positive values indicate no relaxation, and are indicated by a blank in the table below. The more negative a d value, the more relaxed the machine's behavior.
> The tests performed on this run are listed below. Shown for each test > are (1) the durations of the test and of its analysis, and (2) the number of > integer operands and the number of floating point operands used in the test. > The d value, if blank, indicates strong behavior. Otherwise, the more > negative the d value, the more relaxed the behavior observed by the test. > > Execution Analysis Operands > Duration Duration Test d int flt > 0.541 2.314 T200 2 0 > 1.592 2.394 T300 2 0 > 0.541 3.395 T400 2 0 > 1.302 2.313 T500 2 0Finally, there is a section which summarizes which architectures the tests found that the machine did not obey.
Since the tests increment the values of operands monotonically, the values saved for each operand should increase monotonically. (A formal argument is not quite this simple.) No machine has been seen to relax any of the monotonicity tests.
> Results: > > No monotonicity relaxations seen.Finally, a summary is given of the relaxations detected.
Each test can reveal the relaxation of one or more architectures. (See the file ANALYSIS for details.) And a given architectural relaxation may be detected by possibly more than one test. At this point ARCHTEST prints some subset of the following nine lines to show which relaxations have been seen.
> WW > URR WW > WW WR > WW RR > RW > CC3 > URR CC3 > WR CC3 > RR CC1Or, if no relaxations have been seen, ARCHTEST prints:
> No other relaxations seen.Here is the point at which one can create a parameter file. It can be very convenient to be able to define a set of parameters once, and then to be able to recall those parameters again and again on subsequent test runs. If you enter "n", a parms file named "n" will be created.
> Save the parameters from this run in a file? > Press Enter for No. Otherwise, enter the name of the file.At this point you can either quit or run again. If you run again, the output file will be called "axxxxx01.out". You will be allowed to set/change any of the parameters.
> Run this job another time? [n]
ARCHTEST comes prepared to run on Windows NT and any Posix-compliant Unix system. To compile under one system or the other, the user needs to include one or the other of the following two statements at the beginning of ARCHTEST:
> #define SYSTEMID_WNT > #define SYSTEMID_SOLARISIf running under Windows NT, include one of the following three lines depending on the version of Visual C/C++ you are using. VISUALC51 works for Visual C 6.0 also.
#define VISUALC2 #define VISUALC5 #define VISUALC51Under Visual C/++ 2.0 ARCHTEST successfully uses _cwait to wait for the completion of a thread. This function no longer works under Visual C/C++ 5.0. Microsoft's recommended solution is to use WaitForSingleObject, but trying this resulted in problems. An alternative that works is to use _sleep to let the main thread cycle between sleeping and testing completion flags from the created threads.
Here are two sequences of code which are indistinguishable to a programmer (X is a local variable which is never referenced again):
P1 A = 1; P2 X = A; A = 1;However, to an engineer there is a significant difference. If A is not in the cache at the time of initial reference, then in the first case A is brought into the cache in the exclusive state; in the second case A is brought into the cache in the read-only state. The different states cause different subsequent actions in the hardware; conceivably, one set of actions could involve an error not present in the other set.
ARCHTEST now provides a user a compile-time option to control the initial state of operands in the cache. Each write into a shared operand is ALWAYS, NEVER, or SOMETIMES/SOMETIMES-NOT preceded by a fetch of the shared operand into a local variable to force it into read-only state. The three cases are initiated by the three statements:
#define SNK_ALWAYS #define SNK_NEVER #define SNK_SOMETIMESThis option is based on ideas presented in [mntz96]. So far, only two machines, a Sun 2-way machine and a Sun 4-way machine, have been tested with these new features; no differences were found. See the files sun20, sun21, sun22, sun40, sun41, and sun42.
#define COMPILED_BATCH_JOB 0 #define COMPILED_BATCH_LIM 1There is in ARCHTEST a variable, called PARM_FILE. It specifies the name of a file, call it PARMFILE, to be used to specify the value of parameters for the job. When COMPILED_BATCH_JOB is set to 1, ARCHTEST will not interact with the user. Instead it will read the input parameters specified in PARMFILE and then run through all the tests a number of times equal to COMPILED_BATCH_LIM. (It will stop early if an unexpected error occurs.)
It may occur that a user wants to run multiple copies of ARCHTEST at the same time in order to really stress a system, but may also want somewhat distinct parameters to apply to each job. This can be achieved as follows. In the parameter file specified by PARM_FILE there is this parameter:
secondary_parm_file= pfAfter reading the parms from PARMFILE, ARCHTEST will read file PF. The parameters specified in PF will override those in PARMFILE. To initiate a set of ARCHTEST jobs with distinct parameters, the user will start ARCHTEST, then copy parameter file PF1 into PF, start another ARCHTEST job, copy parameter file PF2 into PF, start another ARCHTEST job, etc.
If ARCHTEST is compiled with COMPILED_BATCH_JOB set to 0, then ARCHTEST will still first read PARMFILE and PF (if specified), but then it will invite the user to modify the parms so far specified. The user can make numerous individual runs, and then switch to running a batch job.
Running ARCHTEST on a Previously Unsupported System
Different operating systems require different functions for
controlling multiprocessing. If your operating system is not yet
supported, there are four points in the code where you may need to
supply code
At the beginning of an interactive run (not a batch run) ARCHTEST offers the user a chance to modify the run-time parameters. These choices are presented in 8 screens (or 'pages').
Page 1 presents a Table of Contents for the 8 pages. To go to page x, enter '1 x'.
To exit ARCHTEST without executing any (further) runs, enter '2'.
To view the online help, enter '3'.
================================================================= > Page 1. Table of Contents, Exit, Help. > 1. Go to Page _?_: > Page 1. Table of Contents, Exit, Help. > Page 2. Set basic run time parameters. > Page 3. Specify tests to run. > Page 4. Specify tests to NOT stop on in case of error. > Page 5. Specify dump parameters. > Page 6. Set output parms and debug flags. > Page 7. Set parms for extraneous cache traffic. > Page 8. Set parms for Test T10. > 2. Exit without executing any tests. > 3. Help. > > ==> =================================================================Page 2 presents the user a chance to modify basic run-time parameters.
To go to page x, enter '1 x'.
To execute the tests listed at the bottom of the page, enter '2'.
To change the run_mode to x, enter '3 x'.
To change the number of threads to create for each test to x, enter '4 x'.
To change the length of the arrays to x, enter '5 x'.
To cause ARCHTEST to run x times in succession, enter '6' to toggle batch_run to become 1, and then enter '7 x'.
To enter additional header lines for this run only, enter '8' and follow the prompts.
To modify the tests to be run (shown at the bottom of the page 2), enter '9'. This will take you to page 3.
================================================================= > Page 2. Set basic run time parameters. > 1. Go to Page _?_ (2 to execute; 1 for Table of Contents). > 2. Execute the tests shown below. > 3. Specify how to generate data: run_mode = [1] > 1. Perform a real multiprocessor test. > 2. Execute the mp test code serially and with only one thread. > 3. Generate relaxed test data. > 4. Generate strong test data (no relaxed events). > 5. Process output from simulation. > 4. 4 Number of threads (from 2 to 8). > 5. 200000 Length of arrays. > 6. 0 batch_run. > 7. 4 batch_lim. > 8. Add header lines for this run only. > 9 Select tests to run (Page 3). > These tests are set to run: > [ 200 400 700 > > > ] > > ==> =================================================================Page 3 parameters:
To add tests x, y, and z to the list of tests to run, enter '2 x y z'.
To cause all tests to be selected for running, enter '2 a'.
To delete tests x, y, and z from the list of tests to run, enter '3 x y z'.
To cause no tests to be selected for running, enter '3 a'.
================================================================= > Page 3. Specify tests to run. > 1. Go to Page _?_ (2 to execute; 1 for Table of Contents). > 2. Add tests to run ('a' for all.). > 3. Delete tests to run ('a' for all.). > List of ALL tests: > [ 100 200 300 400 500 600 700 800 900 1000 1100 1200 > 210 310 410 510 610 710 1010 1110 1210 > 120 220 320 420 520 620 720 820 920 1020 1120 1220 > 1030 ] > Tests which are now set to execute: > [ 200 400 700 > > > ] > > ==> =================================================================Page 4 parameters:
To add tests x, y, and z to the list of tests to NOT stop on in case of error, enter '2 x y z'.
To cause all tests to be selected for NOT stopping on in case of error, enter '2 a'.
To delete tests x, y, and z from the list of tests to NOT stop on in case of error, enter '3 x y z'.
To cause no tests to be selected for NOT stopping on in case of error, enter '3 a'.
================================================================= > Page 4. Specify tests to NOT stop on in case of error. > 1. Go to Page _?_ (2 to execute; 1 for Table of Contents). > 2. Add tests to NOT stop on in case of an error. ('a' for all.) > 3. Delete tests to NOT stop on in case of an error. ('a' for all.) > > List of ALL tests: > [ 100 200 300 400 500 600 700 800 900 1000 1100 1200 > 210 310 410 510 610 710 1010 1110 1210 > 120 220 320 420 520 620 720 820 920 1020 1120 1220 > 1030 ] > Tests which are now set to execute: > [ 200 400 700 > > > ] > Tests which will NOT stop the run after an error is found: > [ 400 700 1100 1200 > 410 710 1110 1210 > 420 720 1120 1220 > ] > > ==> =================================================================Page 5 parameters:
To add tests x, y, and z to the list of tests to dump, enter '2 x y z'.
To cause all tests to be selected for dumping, enter '2 a'.
To delete tests x, y, and z from the list of tests to dump, enter '3 x y z'.
To cause no tests to be selected for dumping, enter '3 a'.
To cause arrays to be dumped at the beginning of an analysis routine, enter '4' to toggle the parameter to a value of 1.
To cause arrays to be dumped at the end of an analysis routine, enter '5' to toggle the parameter to a value of 1.
Enter '6' to toggle the value of the format parameter between 0 and 10. A value of 0 causes each line of the dump output file to consist of only one entry from each data array. A value of 10 causes more than one entry from each array to be written into one line of the output file, thus resulting in a shorter file.
================================================================= > Page 5. Specify dump parameters. > 1. Go to Page _?_ (2 to execute; 1 for Table of Contents). > 2. Add tests to dump arrays after error. ('a' for all.) > 3. Delete tests to dump arrays after error. ('a' for all.) > 4. 0 Dump arrays at the beginning of the analysis routine. > 5. 1 Dump arrays at the end of the analysis routine. > 6. 0 Define format of dumped arrays. > List of ALL tests: > [ 100 200 300 400 500 600 700 800 900 1000 1100 1200 > 210 310 410 510 610 710 1010 1110 1210 > 120 220 320 420 520 620 720 820 920 1020 1120 1220 > 1030 ] > Tests which are now set to execute: > [ 200 400 700 > > > ] > Tests which are now set to dump: > [ > > > ] > > ==> =================================================================Page 6 parameters:
At the end of a run ARCHTEST will execute a loop with and without one of the basic arithmetic operations (both integer and floating point). This yields a rough idea of the performance of a machine. To cause the loop to be iterated x times, enter '2 x'. To skip this function, enter '2 0'.
ARCHTEST shows the relative progress of the threads through each test in an 80 by 80 plot. To change the dimensions of the plot to x by x, enter '3 x'.
By looking at the arrays one can tell which threads started late or finished early. Looking at the entire array is unfeasible. To cause ARCHTEST to display x uniformly spaced entries from each array, enter '4 x'.
To cause ARCHTEST to terminate execution immediately upon discovering a failure of the machine to obey the monotonicity rules, toggle the parameter to a value of 1.
To cause ARCHTEST to display the progress of the synchronization function in the procedures which initiate and terminate threads, enter '6' to toggle the flag_show_sync parameter to a value of 1.
To cause ARCHTEST to display progress through the procedures which initiate and terminate threads, enter '7' to toggle the flag_par_diags parameter to a value of 1.
To cause ARCHTEST to flush output after (almost) all fprintf statements, enter '8' to toggle the value of the flag_flush_fout parameter to a value of 1.
To cause ARCHTEST to stop the output of (almost) all printf statements, enter '9' to toggle the value of the flag_telneting parameter to a value of 1.
To cause ARCHTEST to display the unformatted value of the time obtained in the record_time procedure, enter '10' to toggle the value of the flag_record_time parameter to a value of 1.
================================================================= > Page 6. Set parms to format and control output. > 1. Go to Page _?_ (2 to execute; 1 for Table of Contents). > 2. Set performance loop count: 0 > 3. 80 Display the arrays in a _?_ by _?_ plot. > 4. 20 Print _?_, uniformly distributed, array entries. > 5. 0 End the analysis routines after a mono event occurs. > 6. 0 Print Entr Sync/Exit Sync messages. > 7. 0 Print diagnostic info in PARBEGIN and PAREND. > 8. 1 FLUSH output to FOUT in Q routine. > 9. 1 Telneting; minimize PRINTF output. > 10. 0 Display time from record_time(). > > ==> =================================================================For information on setting the parameters on lines 3, 4, 5, and 6, enter '2'.
To read a summary statement of the values specified on lines 3, 4, 5, and 6, enter '7'.
To reset the parameters on lines 3, 4, 5, and 6 to zero, enter '8'.
================================================================= > Page 7. Set parms for extraneous cache traffic. > 1. Go to Page _?_ (2 to execute; 1 for Table of Contents). > 2. Information on generating extraneous cache traffic. > 3. 0 thrash_flag. Turn on/off extraneous traffic. > 4. 0 thrash_line_count = number of extraneous lines (<21). > 5. 0 thrash_action_count = frequency of extraneous accesses. > 6. 0 trash_count = interference count. > 7. Review the effects of the above settings. > 8. Turn off extraneous cache traffic (set above parms to zero. > > ==> =================================================================For information on setting the parameters on lines 3, 4, 5, and 6, enter '2'.
================================================================= > Page 8. Set parms for Test T10. > 1. Go to Page _?_ (2 to execute; 1 for Table of Contents). > 2. Information on setting T10 parameters. > 3. 5000 t10t0cnt. Set count for thread T1. > 4. 5000 t10t1cnt. Set count for thread T2. > 5. 5000 t10t2cnt. Set count for thread T3. > 6. -1 t10_now. Check for errors on the fly (1) or later (0). > > ==> =================================================================
Output from simulation can be saved in either of two formats. In the simpler and longer format, arrays are saved as columns in a file. In the other format the data is condensed. Information on the two formats can be seen in SIMDOC.
Also available is information on a short test case. SIMORIG shows the output from ARCHTEST operating on data generated (with errors) by ARCHTEST. SIMIN shows the output from this run saved in a file. SIMOUT shows the output from ARCHTEST operating on data in SIMIN. SIMOUT and SIMORIG match in their analysis of the data.
Appendix. Parameter Files
The contents of a typical parameter file are described below. Note
that there is no space between a keyword and the following equal
sign.
header= This is a header line from the PARMFILE file. header= There can be up to 10 header lines per run, coming either header= from the parm files or from the user at run time. * Override these parms with the parms in this file: secondary_parm_file= pf2 * 1 => real multiprocessing test. * run_mode = 1 => Perform a real multiprocessor test. * run_mode = 2 => Execute the mp test code serially and with only one thread. * run_mode = 3 => Generate relaxed test data. * run_mode = 4 => Generate strong test data (no relaxed events). * run_mode = 5 => Process output from simulation. run_mode= 1 * number of threads to run simultaneously. May be any number between 2 and 8. max_threads= 4 * length of the arrays of saved data. Max is 5000000. * See defined variable ARRAY_LENGTH. K= 200000 * if run_mode = 2 or 3 and fuzzfreq = N > 0, then generate data every N * or so test loop iterations. N correlates with nothing in the real world. * fuzzfreq = 0 => no errors generated. fuzzfreq= 0 * If batch_run =1, then run ARCHTEST batch_lim times, even if ARCHTEST was * compiled to run interactively. batch_run= 0 batch_lim= 3 * 1 => Execute all tests. This overrides the dotest parms. dotestall= 1 * Perform these tests. The test names can be in any format or order. dotest= 100 200 300 400 500 600 700 800 900 1000 1100 1200 dotest= 210 310 410 510 610 710 1010 1110 1210 dotest= 120 220 320 420 520 620 720 820 920 1020 1120 1220 dotest= 1030 * Dump into a file the data gathered during the following tests. dodump= dodump= dodump= dodump= * Do not stop if an error is found on these tests. nostop= 400 700 1100 1200 nostop= 410 710 1110 1210 nostop= 420 720 1120 1220 nostop= * 1 => Cause cache thrashing. ARCHTEST will read and write cache lines * which have no logical connection to the test being conducted. thrash_flag= 0 * N = the number of lines to use in thrashing the cache. thrash_line_count= 0 * N = the number of the above lines to be read or written for each * R/W operation done as part of the real, logical test. thrash_action_count= 0 * N = the number of read and write operations to be performed for each * R/W operation done as part of the real, logical test. trash_count= 0 * Turn on all debug flags. flag_show_all= 0 * Turn off all debug flags. flag_show_none= 0 * 1 => Show progress through the synchronization code during thread initiation * and termination. Useful when adapting ARCHTEST to a new system. flag_show_sync= 0 * 1 => Dump the arrays of test data before they have been analyzed. flag_show_dump_before= 0 * 1 => Dump the arrays of test data after they have been analyzed. flag_show_dump_after= 1 * 1 => Print out a description of each test along with the test results. flag_show_test_prog= 1 * 1 => Print out the results of each test for a monotonicity error. flag_show_mono= 1 * 1 => Print out the count of the number and length of strings of equal values * in the entries of an array. flag_show_delta_string= 1 * 1 => Print out SKIM1 uniformly spaced entries of each data array. flag_show_skim= 1 * 1 => Print out arrays which show the relative progress of the threads. flag_show_plot_arrays= 1 * 1 => Print out histograms showing the times (measured in loop iterations) * between interactions on shared data. Positive times show behaviors obeying * architectural rules; negative times show violations. flag_show_hist= 1 * Print out documentation for the tests into the output file. flag_show_doc= 1 * 1 => Print out snap dumps (typically, dumps of the arrays at the point * at which errors occurred. flag_show_snap_print= 1 * 1 => Print out a nonstandard plot of the intereractions between threads * in Test T6. flag_show_t6_plot= 1 * 1 => Print out additional descriptive information about any relaxed events * which occur. flag_show_relax_events= 1 * 1 => look for convoys in t8 data. flag_do_t8_convoys= 0 * 1 => show five longest convoys seen. flag_show_t8_convoys= 1 * Dimension of the plot array is N x N. pa_dim= 80 * 0 => dump one entry from each array into one line of a dump file. * 10 => dump more than one entry from each array . . . . sim_mod= 0 * The arrays of test data are too large to print out on every run, so print a * summary, namely, print SKIM1 uniformly spaced lines of the arrays. skim1= 20 * Stop a test as soon as a monotonicity error occurs. flag_mono_exit= 0 * Show progress in initiating and terminating threads. flag_par_diags= 0 * Flush output after every fprintf statement; useful for debugging. flag_flush_fout= 0 * Stop printing output from printf statements. flag_telneting= 1 * Test T10 uses 3 threads to measure cache performance. These three counts * define the number of loop iterations performed by each thread, respectively. t10t0cnt= 5000 t10t1cnt= 5000 t10t2cnt= 5000 * Check for T10 errors on the fly (1) or later (0). t10_now= 1; flag_record_time= 0 * N => Measure the time to execute some simple instructions N times. * 0 => Do not measure the time. perf_loop_count= 0 * Show how ARCHTEST handles unrecognized parameters. bogus1= 99 bogus2 = 98Last updated January 4, 2006.