22C:030/115 Computer Science III

    Fall 2001

    Homework Assignment 5: Hashing, Heaps, and Sorting

    For this assignment, you may work in groups of size 2 or 3. Everyone in the group is expected to contribute to and understand every answer.

    DUE DATE: 12/12/01 (60 points) 


  1. Consider the problem of finding the kth largest value for a sequence of keys values stored in an array. One way to do this is to first sort the array Data[0],...,Data[n-1] and then return Data[k-1]. This does more work than is necessary. Write an algorithm using the quicksort partitioning scheme to find the kth largest value. At each step you will either find the kth value, or determine that it is in either the left or the right partitions. Subsequent calls should only focus on the part of the array in which the value is known to be. Your function should return the kth largest value.

  2.  

     

  3. Write the status of the following arrary at the end of each phase of the following algorithms;

  4.  

     

    1. insertion sort
    2. quick sort (with partitioning based on the first entry)
    3. merge sort
    4. heap sort

    5.  

       

    Data[ ] = [30, 62, 7, 35, 90, 19, 38, 79, 24, 11, 43]
     
  5. Draw a hash table of size 13. Use double hashing to insert the keys 10, 2, 46, 37, 26, 78, 49, and 59 into the table. Use the hash function "k%13" to computer the home address of a key and the function "k%11" for rehashing.
  6.  
  7. Write an efficient algorithm for deleting an item at location i from a max heap having n elements in it. Your algorithm should produce a max heap with n-1 elements. Assume the heap is stored in data[0], data[1],...,data[n-1]. You can assume the following functions are already defined and use them in your function:
  8.   shiftdown(int data[], size_t pos, size_t n)
    
         Shift the element in data[pos] down through the heap of size n
         until the heap constraint is satisfied.
    
    
      shiftup(int data[], size_t pos, size_t n)
    
         Shift the element in data[pos] up through the heap until the
         heap constraint is satisfied.
    
    
     
  9. You are hired as a consultant to advise a client on the design of software for a driving simulator. The simulator simulates traffic flow on a race track. For the purpose of this problem, assume that all vehicles travel in the same direction and that the track contents is fixed (i.e. no cars leave the track and no new cars enter the track). However, the relative order of vehicles on the track can change as cars periodically pass one another.

  10. Cars are to be stored in an vector V. To simplify queries about the spatial relationships among cars, this array must be maintained in sorted order based on car mileage (i.e. distance from the starting line.) On each step of the simulation, car mileage is updated and the array must be re-sorted. It is important that the sorting time be as small as possible.

    You must compare two sorting methods: the standard template library sort function and an insertion sort function that you must code. Your assignment is to assess run times for the two sorting methods under the condition that adjacent values change order with some probability p. The vector to be sorted, V, will contain integers. Initially, create a sorted vector by setting V[i] = i. Then jumble the vector by swapping adjacent values with probability p. You should consider three values of p ( .001, .01, and .1) and six different vector sizes (10, 50, 100, 500, 1000, 5000). A demonstration program is available here.

    The Unix operating system provides a useful means to measure execution time through the time command:

                         %time command

    This statement will print a report on the execution time and system resources used to run the command command. The exact form of the report varies across our Unix platforms. You should read the manual page on your machine for a description of the output format. You should compare user times in your assessment.

    The precision with which time is measured also depends on the the machine you use. The accuracy is no worse that 100Hz. This means that you cannot accurately measure the execution time for commands that take less that .01 seconds to run. To be sure that your error is less that 5%, the time to execute a command must be greater than .20 seconds. This means that you must repeatedly sort small vectors until the total sorting time is greater than .20 seconds. You must then divide the total time by the number of times the vector was sorted.