20091102

Greedy Algorithms

When we were studying Dynamic Programming, we observed that we could often characterize the process of finding an optimal solution as a sequence of decisions (such as "Should I put item 1 in the knapsack?").  Because we don't know what the first decision should be, we try all the possibilities.  As you know, this requires the solution of multiple subproblems, which contributes to the complexity of the algorithm.

For some problems, it is possible to determine what the first decision should be.  This makes the algorithms much simpler.  The choice is usually based on choosing an item from a set that is the "largest" or "smallest" or "most valuable" or "least expensive" in some sense, and for this reason the algorithms are called Greedy Algorithms.

For example, consider a version of the Knapsack Problem called The Fractional Knapsack Problem.  Here, we are able to carry fractions of the items.  More formally, we have a knapsack of capacity K, and a set of n items, each with a mass and a value.  Our goal is to fill the knapsack with the most valuable total load.  We can use any fraction of each item: if the value of a particular item is v and we carry fraction x of the item, then the value of that fraction is x*v.

It seems intuitive that we should carry as much as we can of the items that have the highest value/mass ratio, and in fact this idea does lead to the optimal solution.  We can state the algorithm as follows:

Greedy Fractional Knapsack:
    Sort the items in order of increasing value/mass ratio
    Repeat
           Take as much as possible of the next item in the sorted list
    Until the knapsack is full, or we run out of items

As with any algorithm, we must address two questions:
    1.  Does it always find the right answer?
    2.  What is its computational complexity?

Question 2 is the easier to answer:  sorting the items takes O(n*log n) time.  The loop executes at most n times, and each iteration takes constant time, so the loop is O(n).  Thus the whole algorithm is O(n*log n).

Question 1 is a little bit trickier, but the proof is virtually the same for every Greedy algorithm  First, we establish that the algorithm's first choice is "safe", ie that there is some optimal solution that contains that choice.  Second, we establish that the safe first choice can be combined with an optimal solution to the reduced problem that results from the first choice, and the result will be an overall optimal solution.  (This part is virtually identical to the proof that a problem satisfies the Principle of Optimality for dynamic programming.)  We use induction to put it all together and get a proof that the algorithm always finds an optimal solution.

For the Fractional Knapsack Problem, the proof looks like this:
    Step 1 (prove that the first choice is safe).  Let x be the first item chosen by the algorithm, and let m be the fraction of x that the algorithm chooses.  Let O be an optimal solution to the problem.  If O contains fraction m of item x, we are done.  On the other hand, suppose O contains less than m of x (do you see why it cannot contain more than m of x?).  Then O must contain some amount of another item.  If we take out some of that item and put in more of item x, we have another solution whose value must be >= the value of O (since we are adding more of the most valuable item).  We can continue this until we have added as much of x as possible, without ever decreasing the value of the solution.  This ends up with a solution that contains m of x, and has total value >= the value of O.  But O is optimal, so no solution can have value > the value of O.  Thus the new solution has value = value of O, which means it is an optimal solution.  Thus there does exist an optimal solution that has the same amount of x as the algorithm chooses .... and thus the algorithm's first choice is safe.

    Step 2:  Consider an optimal solution O that contains the algorithm's first choice.  The rest of O (ie the amounts of the other items) is a solution for a smaller set of items, and a knapsack of a smaller size: K minus the amount of mass filled by the algorithm's first choice.  Suppose that the rest of O is not an optimal solution for this reduced problem.  Then we could substitute an optimal solution for the reduced problem, and this would increase the value of O ... but that is not possible since O is optimal.  Thus the rest of O must be an optimal solution for the reduced problem.  Thus when we combine the algorithm's first choice with an optimal solution to the reduced problem, we get an optimal solution to the original problem.

    Step 3:  Clearly the algorithm works correctly if there is only 1 item.
                  IH: Assume the algorithm works on all sets of t items, where t is >= 1.
                  Suppose we have a set of t+1 items.
                  We have shown that the algorithm's first choice is safe.  This reduces the problem to a smaller knapsack and a set of just t items.  By our inductive hypothesis (IH), we know the algorithm will find an optimal solution to this reduced problem.  We have shown that the the algorithm's first choice can be combined with an optimal solution to the reduced problem to give an overall optimal solution.  Thus the algorithm finds an optimal solution for the set of t+1 items.
                   Thus the algorithm finds the optimal solution for all sets.


The Greedy Algorithm Paradigm

Greedy algorithms are used to solve optimization problems that involve choosing some elements of a set, subject to some constraints.  They pretty much always have the same basic design:

1.  Sort the elements according to some criterion.
2.  Repeat
        Select the next element in the sorted list, if it does not violate the constraints
     Until no more elements can be selected.

(Grok check - make sure you see how the Fractional Knapsack algorithm fits this paradigm.  What is the constraint on the selections?)


Minimum Spanning Trees
   
Finding minimum spanning trees is one of the most important applications of the Greedy paradigm.  Minimum spanning trees are important in communications networks.

Definitions:  

A spanning tree of a graph is a set of edges that join up all the vertices, without containing any cycles.

If the edges of the graph have numeric weights attached to them, the weight of a spanning tree is the sum of the weights of the edges it contains.  A minimum spanning tree is a spanning tree with the smallest possible total weight.

The basic idea behind the MST algorithms we will discuss is to choose edges from the edge set of the graph so that we are always confident that the set of  edges chosen so far are contained in some MST of the graph. We can formalize this idea:  Let A be a set of edges.  A is safe if there is some MST T that contains A.  An edge e that is not in A is safe with respect to A if A+e is safe.

A cut of a graph is a partition of the vertex set into two non-empty partitions.  We also use the word cut to refer to the set of edges that go between the two partitions (this is a very natural sense of the word - the cut goes through these edges).  A cut C respects an edge set A if none of the edges in A go between the two partitions of C.

Now, suppose we have a safe set A, and a cut C that respects A.  Any MST T that contains A (and since A is safe, we know there is such a T) must contain at least one edge that crosses C - if not, the two parts of C would not be connected by T, so T would not be a spanning tree.  The goal of the MST algorithms we will look at is to choose an edge e that crosses C, so that A+e is safe.

A completely generic MST algorithm can be stated as follows:

Let A be an empty set of edges
Repeat
    add a safe edge to A
Until |A| = n-1                    // all spanning trees of a graph with n vertices must have n-1 edges


Here is where the Greedy part of the algorithm comes in.  When designing Greedy algorithms, we first ask "If I choose the smallest item that is available (or largest, if we are maximizing), can I prove that this decision is ok?"

It turns out that when finding a MST, we can simply choose the lightest edge that crosses any cut that respects A.  This is formalized in the following theorem:

Theorem:  Let G be a graph, let A be a safe set of edges of G, let C be a cut that respects A, and let e be the lightest edge that crosses C.  Then A+e is safe.

Proof:  This theorem is proved very nicely in the text.  Here I will just sketch the proof.

Suppose A+e is not safe, ie suppose that no MST contains A+e.  Since A is safe, there does exist a MST containing A - let T be such a MST.  Consider T+e.  Since T is a MST, adding e to it must create a cycle.  This cycle must contain at least one edge of T that crosses C (since e itself crosses C) and is not in A (since C respects A).  Let f be such an edge.  Consider T+e-f.  This must be a spanning tree, and since the weight of e is <= the weight of f (remember, e is the lightest edge that crosses C) the weight of T+e-f <= the weight of T.  Since T is a MST, T+e-f is also a MST ... but T+E-f contains A+e, which contradicts our assumption.  Therefore A+e is safe.

Kruskal's MST Algorithm

Kruskal's algorithm is very straightforward:

1.    Sort the edges of G according to increasing weight.
2.    Let A be an empty set of edges
3.    Repeat
            let the next edge in the sorted list be e.
            if the ends of e are not already connected in A, A = A+e
       Until |A| = n-1

To prove that Kruskal's algorithm is correct, all we need to do is show that at each stage, the edge we choose is safe.  But this is easy.  Since the ends of e are not connected in A, there is a cut C that has the two ends of e on opposite sides of the partition.  Furthermore, e must be the lightest edge crossing C because we are considering the edges in increasing-weight-order.  If there were a lighter edge than e, we would have considered that edge before considering e.  Thus we can apply our previous theorem and conclude immediately that A+e is safe.

Implementing Kruskal's algorithm is not difficult.  The only non-trivial part is determining if the ends of e are already connected in A.  We will look at two ways of doing this.

Method 1:  We can use an n*n array R to keep track of which vertices are connected in A.  
R[i,j] = 0 if vertex i and vertex j are not connected in A, and R[i,j] = 1 if vertex i and vertex j are connected in A.
Initially the array is all 0's, but as vertices get connected by the edges the algorithm chooses, the array fills up with 1's.
To keep track of which vertices are in each connected group, we can use a linked list for each group.  Initially each vertex is in a separate group so we have n separate lists, each containing a single vertex.  We can join two groups by appending the list for one to the end of the list for the other.

Testing to see if the ends of e are already connected is now trivial - we simply refer to the appropriate element of R.  To update R when we choose an edge e, we walk through the lists representing the groups containing the two ends of e, and update all the relevant elements of R from 0 to 1.  Then we append one of the lists to the end of the other.  (There are some minor pointer operations that need to be settled but they do not affect the analysis of the algorithm.)  Each element of R gets updated exactly once, so the total number of updates to R is O(n^2).

Let G have n vertices and m edges.  We can characterize the complexity of this implementation of Kruskal's algorithm as
    O(m*log m             // the sort
        + m                    // test each edge, if necessary
        + n^2                 // update R
        )

Observing that m < (n^2)/2, and therefore log m < log (n^2) = 2 log n, , and that the "+ m" is irrelevant, we can write the complexity as O(m*log n + n^2)

Now if the graph is dense and has O(n^2) edges, the first term dominates.  But if the graph is sparse and has only O(n) edges, the second term dominates.  If we are a bit crafty about how we do the implementation, we can avoid this n^2 term.

Suppose we represent each connected group in A by something sort of like a binary tree, with a randomly chosen member vertex of the group at the root.  Then to see if two vertices are connected in A, we simply trace upwards from each, in its own set-tree.  If we arrive at the same root, the vertices are connected in A.  If we arrive at different roots, the vertices are not connected in A.  This takes longer than simply looking up an element of an n*n array, but if the set-trees are compact it will only take O(log n) time.  We can make sure they are compact.  When we select an edge e to add to A, the two set-trees containing e's ends need to be combined.  We add a pointer from the root of the smaller set-tree to the root of the larger set-tree, making it the root of both trees.  It is easy to see that as the set-trees get combined, only vertices near the bottom can have fewer than two children, so the trees are at least binary and most likely even more dense than that.  Thus each set-tree is O(log n) in height.  This combining operation takes constant time each time we do it.

We can characterize the complexity of this implementation of Kruskal's algorithm as
    O(m*log n            // the sort, as before
        + m* log n        // test each edge, if necessary
        + n-1                // each time an edge is selected, we combine two set trees
       )

This of course is equivalent to O(m*log n)

When the graph is sparse, this implementation is asymptotically faster than the first one.  When the graph is dense, they have the same complexity.  Thus this implementation is superior to the first one.


Prim's MST Algorithm

We talked about this algorithm VERY BRIEFLY in class.  In its basic form it looks like this

1.    Let A be an empty set of edges
2.    Choose an arbitrary starting vertex v.  Let Q = {v}
3.    Repeat
            Let e be the lightest edge that joins a vertex x in Q to a vertex y not in Q.
            A = A + e
            Q = Q + y
       Until |A| = n-1


To see that Prim's algorithm is correct, we observe that on each iteration of the loop we can define a cut that has all the vertices on Q in one partition and all the other vertices in the other partition.  This cut respects A, and we are choosing the lightest edge that crosses it.  By our earlier theorem, all of our choices are safe and so we end up with a MST.

We did not do this in class, but it turns out that the complexity of Prim's algorithm is the same as the second implementation of Kruskal's algorithm.  There may be reasons for preferring Prim's algorithm, since it gives us the freedom of choice for the starting vertex.  We could for example choose a particularly important vertext to start with, then run the algorithm for a limited number of iterations.  We would end up with a tree that did not span the whole graph, but spanned the neighbourhood of the vertex where we started.

Activity Selection

We concluded our discussion of Greedy algorithms with a short look at a simple problem (which was featured on last year's Greedy algorithms test).  Suppose we have a set of n activities, each with a fixed start and end time.  They can overlap.  We want to choose the largest possible set of non-overlapping activities.  We are trying to maximize the number of activities we choose, not the total amount of time we spend at them.  Thus we would prefer to attend two short activities over one long one.

This problem illustrates the challenge of finding a good Greedy algorithm: deciding what criterion to use to sort the set of objects we choosing from.  Here, we could sort the activities according to their length.  This does not lead to optimal solutions - if you were not in class on November 2, make sure you can find an example in which choosing the shortest activity first is a mistake (ie. it cannot lead to an optimal solution).  

It turns out that if we sort the activities according to their finish time, we get a simple algorithm that is guaranteed to find an optimal solution.  The algorithm looks like this

1.    Sort the activities according to their finish time
2.    Let A be an empty set of activities
3.    Repeat
            let x be the next activity in the sorted list.  If x does not overlap with any activity in A, A = A+x
      Until there are no more activities to consider

The proof of correctness is virtually identical to the proof of correctness for the Fractional Knapsack problem.  Consider an optimal solution O that does not contain the algorithm's first choice.  Then the activity in O that has the earliest end time could be replaced by the algorithm's first choice, without creating any overlaps in the chosen set.  This set has the same size as O, so it is also an optimal solution.  Thus there does exist an optimal solution that contains the algorithm's first choice.  From that point on, the proofs are basically identical.