CSc 422 Assignment 4

Traces of Memory

Assigned
Due

Nov 15
65 pts
Dec 9

This assignment is optional. If you don't submit it, your grade will be averaged as though this does not exist.

This assignment is to write a program which reads traces of memory references produced by running programs and computes statistics on their behavior. Here's mine, with all the the reporting turned on:
[tom@localhost meas]$ ./stats ccrdr.txt dist=100 === ccrdr.txt === 8190844 references to 526 pages, 15571.9 ref/page, 1 to 1788422. 3.80174 refs/byte By type: IF 72.7744%, DF 18.8588%, ST 8.36685% Within 100 bytes: 94.3535% instruction fetches, 43.5496% data refs.
It's worth noting that this can take a while to run. The above took about 9 seconds, but this is a relatively small trace file. The meaning of the runes is as follows:
./stats ccrdr.txt dist=100
The command line gives the name of the file of traces to analyze, ccrdr.txt. It's a small C++ compile. The other parameters turn on certain additional measurements, as described below. The trace file is plain text, this one containing 8190844 memory references (more on format later).
8190844 references to 526 pages,
As represented in the trace, the program produced 8190844 memory references. These references were to 526 distinct 4k pages. The page size can be set by adding the command line parameter pagebits=N, but this run uses the default of 12.
15571.9 ref/page, 1 to 1788422. 3.80174 refs/byte
Each of the 526 pages was referenced an average number of 15571.9 times, and the numbers of references to pages ranged from a minimum of 1 to a maximum of 1788422. The references work out to 3.80174 references per byte of the total page space. That's just 8190844÷(4k×526) in this case. It gives a measure of how much use the system gets out of the data it brings into memory. It tends to increase when a smaller page size is used.
By type: IF 72.7744%, DF 18.8588%, ST 8.36685%
The references are classified by type, instruction fetch, data fetch, and data store. This gives the percentages of each type within the trace.
Within 100 bytes: 94.3535% instruction fetches, 43.5496% data refs.
This is a measure spacial locality: How close one reference is to another. Within the instruction fetches, 94.3535% are within 100 bytes of the previous instruction fetch, and 43.5496% data operations are within 100 bytes of the previous one. The distance is specified by the dist=100 flag, which also enables the measurement.

Reading Traces

The trace reader read method returns each reference as a Reference. This small class is defined in rdr.h, and is really more of a struct, having all public fields. Each reference has a type, InstrFetch DataFetch or Stor. The class also defines a type None, which is only used for invalid objects, not references from the program trace. It also has a (virtual) address, and size in bytes. These traces are from a Pentium or x64 architecture, so the sizes vary widely. The addresses are typed as unit64_t, defined in the header cstdint. This is an alias for a an unsigned integer type having at least 64 bits. Be careful doing arithmetic with the unsigned type in C. In particular, subtraction will never give you a negative number. If you subtract a larger unsigned from a smaller one, you just silently get the wrong answer. Ahhh, C.

This existing rprint.cpp code uses the reader and handles thes objects. Have a look. All fields are public, so there are no getters. Refer directly to the data fields.

What To Do

First, download the starting code as start.zip. This contains a class to read trace files, and a test program rprint.cpp which simply reads and prints a specified file. It will also serve quite will as a starting place for your program. It already opens the trace file, and loops through the references therein. You will also need some trace files. The linked page contains several for download, and more information about where they come from and how to make your own.

Unpack the start and get the reader to compile. On a Unix-like environment, just put the files in a directory, go there, and say make. For most IDEs, including CodeBlocks, unpack the files to some convenient place, then create a project under your IDE and add the files to it. You should then be able to build the rprint executable.

Once you have that working, make a copy of rprint.cpp under another name (I called mine uncreatively stats.cpp) to work on. You should be able to compile it also. If you're using my make file, say “make stats”. (If you called your file something else, you'll need to edit the Makefile. An IDE will probably just follow whatever name you give it.) Then modify it to build the statistics shown above:
  • The number of references. This is just a counter.
  • The minimum, maximum and average numbers of references to a page.
  • The counts of the references in each of the classes instruction fetch, data fetch, data store.
  • You will need counters for the numbers of instruction fetches and data references which are “close”. You won't need to update the counters if the dist parameter is not set.

To keep a count of the number of references to each page, you should make a map from page number to count (int to int) to hold the count for each page. You could use a vector, but most of the entries will be unused, so the map saves a lot of space. For each reference, extract its page number from its address. For reference r, and b-bit pages, the expression r.addr >> b will yield the page number. (So look up the shift operator if you've never used it.) Use this page number the increment the correct entry in the map. Your map can then be used to find the min, max and average references per page, and the number of pages referenced (just the map's size).

The Reference object also gives the type of reference. You will need counters for the types (actually two, since you can subtract for the other). The type will tell you which to increment.

You need to count the “close” instructions when dist is set. Here, instruction fetches are one counter, and data stores and fetches to together for the other. Make two local Reference variables to always hold the last instruction or data reference. Use these to determine if each new reference is close to the last of the same type.

Command Line Parameters To Support

The program accepts a number of parameters on the command line. The first one is always the name of the trace file to read, and the ones which follow set parameters for the computation. The presence of some parameters also enable certain counts. In a command environment, you simply type these on the command line when running the program. An IDE will generally have some place buried in the menus where you can enter the command line parameters. In CodeBlocks, look under Project/Set Programs's Arguments, once you have created a project.

The existing rprint.cpp contains code to read the command line arguments, and process a few parameters itself, mostly as examples: If you specify limit=N, the program stops after that many references. If you say count, the program will report the number of references printed. Do have a look at it to see how this is done.

Implement the following parameters:
pagebits=b
Specify the number of bits on the right of each address which constitute the page offest (the rest are the page number). The default is 12, which specifies the usual 4K page, since 212=4k.
rpt=n
In addition to reporting the statistics at the end of the trace, report the current values after each n references. This does not imply a reset; simply print the current accumulation and continue to accumulate.
dist=n
This activates the temporal locality measurement, and specifies the distance. The distance is in bytes.

Meaning of Close

If a distance is specified, you need to check whether each reference (after the first) is “close” to the previous one of the same type, and count those so you can compute the percentage. More specifically, that means that any byte covered by the first one is the address of the other, in either direction. In the picture to the right, the reference B is close to reference A, since it falls in the area pictured. Reference B is close above A if it is greater than or equal Address A and less than Address A + Size A + distance. To test for closeness, check this in each direction.

Where Do Traces Come From, Anyway?

They come from using the valgrind tool with a plugin called (for some reason) lackey. Lackey seems to be pretty simple, intended mainly as an example of writing plugins for Valgrind. Valgrind is available for Linux and Mac; there's no Windows port. If you have a supported platform, and have acquired valgrind, you can create a trace with this command:

valgrind --tool=lackey --trace-mem=yes command 2> filename
This will run any command under the tracer, and direct the standard error output to filename. The trace is written to the error stream. The command will generally run quite slowly, and the trace file will be quite large. I've also had trouble getting any large GUI program to work; I suspect there is some issue that comes from slowing things down so much, but I don't know for certain if that's the problem.

Submission

When your program works, and is properly commented and indented, submit it over the web using this form.