CSc 422 Assignment 3

The Secret of Threads

Assigned
Due

Oct 16
70 pts
Nov 4

This project is a small threading exercise. We'll start with a simple program to guess an encryption key by exhaustive enumeration, and convert it use threads. This can make the search faster when multiple cores are available. We'll be using C++ 2011 standard threading. It's a nice interface which retains the pattern of the low-level operating interface, but makes good use of C++ templates and type system to make it much cleaner that either pthreads or the native win32 interface.

The encryption algorithm is AES, based on the open source tiny AES project. This source is provided in the starting code download (see below), so there are no external libraries needed. Our guessing exercise will assume relatively short keys, since a long key is generally impossible to find in any practical amount of time.

Getting Started

The code is in a3start.zip. The code compiles on Linux, and has a Makefile:
bennet@bennet:~/test$ mkdir asst3 bennet@bennet:~/test$ cd asst3 bennet@bennet:~/test/asst3$ wget http://sandbox.mc.edu/~bennet/cs422b/asst/a3start.zip Saving 'a3start.zip' HTTP response 200 OK [http://sandbox.mc.edu/~bennet/cs422b/asst/a3start.zip] a3start.zip 100% [=============================>] 19.86K --.-KB/s [Files: 1 Bytes: 19.86K [2.77M] bennet@bennet:~/test/asst3$ unzip a3start.zip Archive: a3start.zip inflating: Makefile inflating: AES.h ... inflating: monoguesser.cpp extracting: in1.aes extracting: gettysburg.aes extracting: secret.aes inflating: multiguesser.cpp bennet@bennet:~/test/asst3$ make g++ -g -c -o monoguesser.o monoguesser.cpp g++ -g -c -o guesser.o guesser.cpp g++ -g -c -o generator.o generator.cpp g++ -g -c -o AES.o AES.cpp cc -g -c -o tiny_aes.o tiny_aes.c g++ -g -o monoguesser monoguesser.o guesser.o generator.o AES.o tiny_aes.o g++ -g -c -o multiguesser.o multiguesser.cpp g++ -g -o multiguesser multiguesser.o guesser.o generator.o AES.o tiny_aes.o -lpthread g++ -g -c -o aestest.o aestest.cpp g++ -g -o aestest aestest.o AES.o tiny_aes.o bennet@bennet:~/test/asst3$ ./aestest de BC in1.aes This is a test inputfile. This is more of it, and it does go on. And stops. bennet@bennet:~/test/asst3$
The code is pretty standard C and C++, so it should compile fine on any system that supports those, though if you are using a different build system, you'll have to figure it out. The above shows the needed compile commands.

Download a3start.zip, and get it to compile. If you're using the Linux, the commands listed above will work fine. Of course, you can always download the zip with your browser instead of wget, and even use one of those wimpy GUI unzip programs to unpack it.

Make a directory and put all the files there. Go to it and build the files. If you have the compilers and GNU make installed, the make command will build everything. On Mac, you might need to say gmake instead of make. The build should produce several executables: aestest, monoguesser and multiguesser. The multiguesser is the program you will complete; this version is a stub that doesn't do anything. The other two are complete programs.

The aestest program encrypts and decrypts files using the AES cipher in CBC mode. The download includes a few encrypted files, and you can make additional ones. To decrypt, you can do something like:
bennet@bennet:~/test/asst3$ ./aestest de BC in1.aes This is a test inputfile. This is more of it, and it does go on. And stops.
The aestest program takes either en or de for “encrypt” or “decrypt,” a password, and a file name. It decrypts the file, and prints it on the console. With en, it reads the plaintext from the console, and writes the encrypted text to the named file. You can use redirection to read plaintext from files, or write it there.
bennet@bennet:~/test/asst3$ cat foo.txt Before a command is executed, its input and output may be redirected using a special notation interpreted by the shell. Redirection allows commands' file handles to be duplicated, opened, closed, made to refer to different files, and can change the files the command reads from and writes to. bennet@bennet:~/test/asst3$ ./aestest en mypass foo.aes < foo.txt bennet@bennet:~/test/asst3$ ls -l foo.txt foo.aes -rw-r--r--. 1 tom tom 312 Nov 6 21:12 foo.aes -rw-r--r--. 1 tom tom 293 Nov 6 21:11 foo.txt bennet@bennet:~/test/asst3$ ./aestest de mypass foo.aes Before a command is executed, its input and output may be redirected using a special notation interpreted by the shell. Redirection allows commands' file handles to be duplicated, opened, closed, made to refer to different files, and can change the files the command reads from and writes to. bennet@bennet:~/test/asst3$ ./aestest de badpass foo.aes terminate called after throwing an instance of 'std::runtime_error' what(): AES Decrypt stream padding error Aborted (core dumped) bennet@bennet:~/test/asst3$

For the present project, you need a file encrypted with a short password. You can make your own, or use one of the files in the download. As noted above, the short file in1.aes has the key BC. The file gettysburg.aes is the Gettysburg address with password AbeL, and secret.aes has a five-character alphanumeric password.

Now, try to guess a key:
bennet@bennet:~/test/asst3$ ./monoguesser in1.aes The key is BC, 188 keys guessed. bennet@bennet:~/test/asst3$ ./monoguesser gettysburg.aes 65536 keys; current BCQ 131072 keys; current DFh 196608 keys; current FIy ... 2818048 keys; current XFyK 2883584 keys; current ZIFL 2949120 keys; current bLWL The key is AbeL, 2980837 keys guessed.
In addition to reporting the password, it writes the plain text in a file with the same name, adding the extension .plain.

The guesser tries passwords made of alphanumeric characaters of increasing length. The first password it guesses is A. The set of characters tried is given as a string near the top of monoguesser.cpp, should you wish to change it. (Could be of use for testing, but no change is needed for this project.)

The guesser does not have a direct way to know when the key is correct. If the decrypt object throws an exception, the key is wrong, but this is not guaranteed. The guesser assumes the file is correct if the plaintext contains only old-fashoned 7-bit ASCII. (Under more realistic circumstances, you'd need a better way to detect success.) If you build your own test file, it's best to use an old-school editor that won't try to help out with UTF-8 or another newer encoding.

The time it takes to find a key varies with the key (particularly its length) and the speed of your computer. My little laptop can find the the AbeL key in about two minutes, trying about 3 million keys. A larger server that I have access to can find it in about 15 and a half seconds.

The actual assignment is to get multiguesser to do what monoguesser does, only faster. You are given a small framework in multiguesser, which does very little. It does suggest that you take both a file name and a number of threads on the command line. Probably, the optimal number of threads to choose is the number of CPU cores on your machine. The command ./multiguesser crypted 2 will perform the guessing with two worker threads. On my laptop at home, multiguesser using three threads can find the AbeL key in about half the time of monoguesser. The above-mentioned server has eight cores, and the multiguesser with eight threads can solve the problem in about 2 and a half seconds.

What To Do

The assignment is to make multiguesser do what monoguesser does, only faster. This is to be done using the threading facility which is part of the 2011 C++ standard. You should be able to show some speedup compared to monoguesser. Do it however you like, but I will make some suggestions here.

First, simply copy most of the code from the monoguesser. Create a program that's mostly the same, except that takes a number of threads on the command line, which it ignores (for now). Get this program to work. Second, modify the program by moving the guessing loop into a separate function and call it from main. This will become your thread function, but for now just call it normally. The main will become small, not having much else left to do. Get this second version to work also. It should work just like the monoguesser, except it takes that extra parameter on the command line and ignores it.

Now you might want to introduce a single worker thread. This won't speed anything up, but it's a useful evolution. Let the main start your function as a thread, instead of making a normal call. (Continue ignoring the number of threads; just start one.) Also, your main must join the thread after it exits. Here's a simple example of starting and joining threads. (The example starts two.)

Now, if you didn't before, arrange the program so that the gen object is not created in the searching function. You might declare it as global, or send it from the main as a reference parameter.

Next, try to increase speed by creating the number of threads specified. Make a loop, and store the threads in some sort of list structure. Here is a page containing an example (the second code block), which creates 20 simple threads and stores them in a vector. (The & before the name of the function in the thread call is not required, and doesn't change the meaning.) Note also the second loop, which iterates through the list of threads, and performs a join on each one. (The ampersand there has a different meaning, and is needed here. Ah, C++!)

But when you create the multiple threads, you create an important shared data structure: the gen object which enumerates the password guesses. You must use a mutex object to protect this so that only one thread can call gen.next() at a time. You may add the mutex in the the main file, probably as a new global variable, or you might add it to the generator class. The simple example linked earlier uses a mutex, and the more complicated one cited above also shows one a bit further on in the page. This will give you multiple threads, each repeatedly getting a key from the generator and trying it. Getting the key candidate will be done in a critical section, but checking will not, so most of the work can be done in parallel.

You will also need a way for the threads to stop. Each thread keeps looking for the key, and only one will find it. (If multiple threads find the correct key, you're not synchronizing correctly, and wasting effort.) So there needs to be a way for the lucky finding thread to notify the others. You should use some sort of shared flag, which must also be synchronized. Each thread must check it each iteration, and return if the key was found. The finder will set the flag, announce the true key, and return. Alternatively, you can have the main announce the key, but you will need some way to send the value to it.

The found flag is also a shared variable. You might protect it with the existing mutex, with its own mutex, or look into using an atomic variable.

Once you manage to get all this put together, you should have a program that will find the key, and do so faster when run with multiple working threads on a multi-core CPU.

Submission

When your program works correctly and looks nice, submit it here. Send your multiguesser.cpp, and any other file you changed or added.