CS 840 Assignment 1

Due March 3, 2005

Instructor: I. Munro

 

1.      Consider the idea of using a hash function in the van Emde Boas “priority queue” to reduce the space requirements. Assume that hashing a key (or any other value) to a location takes O(1) time and  space for the hash table is proportional to the size of the data entries and keys,. Take the scheme for hashing as a “black box”. How can you use such hashing to implement the van Emde Boas method efficiently?

a.       Outline your scheme, perhaps by showing how it differs from the standard version.

b.      Assuming the hashing works in constant time for an insert, delete or search, how much time is taken for each of the operations: insert, delete and successor.

c.       What is the space requirement of your method? Assume you have a universe of u elements and a word size of lg u bits. n elements are currently stored in the data structure. Clearly this should depend on u as well as n.

2.      We have discussed, in great detail, the Move to Front heuristic for improving linear search. This question deals with the very simple notion of counting accesses to elements and maintaining the list in order by Frequency Count (FC).

a.       Outline a method for keeping frequency counts and maintaining the elements of the list in decreasing order by their counts. This method should require only O(1) time to update a count and to restore and to move such an updated element to its appropriate place in the list.

b.      If elements are drawn from a fixed (and independent) distribution, it can be shown that the expected behaviour of FC is close to that of the optimal. You don’t have to prove this. We will concentrate on the amortized (worst case) behaviour, and compare FC with (i) the static optimal (Sopt) which keeps elements in decreasing order of frequency of the entire sequence and (ii) the dynamic offline optimal.

                                                                        i.      Prove that the amortized case behaviour of the FC heuristic is within a constant factor of that of Sopt. (As in class, assume FC starts with an empty list and adds an element to the end of the list when it is first accessed)

                                                                       ii.      Consider the model of Sleator and Tarjan in which a list may be rearranged by moving the requested element toward the front of the list at no charge, or swapping two adjacent elements at a cost of 1. Prove that under this model, FC is not necessarily within a constant factor of the off line optimal on an amortized basis.

3.      We briefly discussed an approximation scheme for the optimal binary search tree. The root of a tree on a given range is node xi, such that the maximum of the probability of a search in the range either before xi or after xi is minimized (break ties arbitrarily). This question deals with the time and space required to compute this tree.

a.       Sketch an efficient algorithm for performing the task. Your method should take O(n lg n) time. (Hint: computing the cumulative probabilities  Pi= SUMj<=i pi +qi in a preprocessing phase will help a lot)

b.      Improve the method to take linear time. (OK …  if you just do part b) you really don’t have to do a)) (Hint: the amount you are willing to spend on finding the root in range will have to depend on where the root actually is .. be willing to pay more for an “even split” than for a really skewed one))

c.       An easier version of b): Rather than recursively finding an element in the “middle” of each subrange we take a more global approach. The root of the entire tree is the one that straddles the cumulative .5 mark (or comes as close as possible) as before. The children of the root are now at the .25 and .75 marks respectively, and their children at .125, .375, .625 and .875. Children at lower levels are assigned in a similar manner. How can you determine this tree in linear time?