This is an group homework. You may work with in a group of up to three people (or by yourself if you prefer). Only hand in one write-up per group.

  1. Stable Sorting

    For each of the following sorting algorithms, tell whether it is stable. Briefly explain why. (If the algorithm isn't stable, a good way to explain why is to give an example. If it is, I'll probably accept convincing hand-waving.)

    (a) Heap Sort
    (b) Quick Sort
    (c) Counting Sort

  2. MH2 analysis of Transpose

    Consider this simple program for forming the transpose of a LxK matrix:

         for i = 1 to L
            for j = 1 to K
               B[i][j] = A[j][i];
    

    (a) For a square matrix with N being the number of elements in the array (so N = LK = L2), what is the RAM-model complexity of this program in Big-Theta notation?

    (b) Using the model MH2 model of computation with the parameter b = M^(1/3), (where M is the size of memory needed for the problem instance) what is the complexity of the above transpose program in big-Theta notation?

    (c) Write a program that gives the same result as the simple transpose program, but which has better MH2 complexity, and give its MH2 complexity in big-Theta notation.

  3. Project: Timing Real Programs

    The purpose of this problem is to design and run experiments that will see whether the effects predicted by MH2 analysis are observed in practice.

    You should choose a pair of programs that solve the same problem. This can be any ONE of the following. [Note: they are listed roughly in order of difficulty. The more difficult problem you attack, the greater opportunity to earn "bonus points".]

    (Be sure to see the problem specific notes below.)

    You should write the two programs and time their execution on a variety (perhaps 6) of problem sizes, from small enough to fit in the smallest cache of your machine up to big enough to fill up most of main memory and perhaps so big that they spill out onto disk. But don't get so carried away that a single run takes hours.

    You should report your results both as tables of runtimes and graphs (preferably log-log graphs, since the slope of the line suggests the degree of the polynomial that approximates the data).

    Report any conclusions that you can draw about whether the data behaves as predicted by the RAM model, the MH2 model, or neither.

    Some things to keep in mind about designing your experiments:

    - You should use a language (such as C or FORTRAN) that is compiled and doesn't introduce a lot of overhead. In particular, DON'T use Java. Be sure to use compiler options that give efficient object code, but avoid loop-restructuring optimizations (since they may improve the program so much it invalidates the experiments.)

    - If possible, use a computer that isn't being used by other people at the same time so your timings aren't influenced by their activity. Use the same computer for all your experiments. It might be interesting if the computer has at least two levels of cache, though that isn't necessary. If you can find the information, describe the memory hierarchy of the computer you are using and its clock speed.

    - To make sure that your program is moderately efficient, you should do a rough estimate of how many cycles the program is taking per element. For transposing, say, a 16x16 matrix (small enough to fit in cache, large enough to amortize some of the loop overhead), the computer should probably take somewhere between 2 and 6 cycles per element. If it takes 20 or 100 cycles per element,

    - There are a lot of pitfalls to avoid when timing programs. In particular, the first time you run a problem at a given size, it may take longer than subsequent runs, since the computer needs to allocate data and move various things (including the program's code) into cache. I suggest that your program should first generate the data and put it into an input array, then have a big outer loop that executes the code perhaps 5 times. Each time through, it should start the timer (e.g. call gettimeofday), then run the program, the stop the timer (e.g. with another call to gettimeofday) and save the start and stop times (or their difference). When you're done, you'll have five runtimes. You should print out all of them, not just their average, since this will tell you how consistent the times are. If they vary wildly, you need to figure out what's wrong with the experiment. If the numbers are all about the same except for one or two that are much larger, then probably the large ones were caused by start-up costs or a deamon interrupting your program or some other cause that shouldn't be charged to your program.

    Some comments relevant to specific choices of programs: