20091102
Greedy Algorithms
When
we were studying Dynamic Programming, we observed that we could often
characterize the process of finding an optimal solution as a sequence
of decisions (such as "Should I put item 1 in the knapsack?").
Because we don't know what the first decision should be, we try
all the possibilities. As you know, this requires the solution of
multiple subproblems, which contributes to the complexity of the
algorithm.
For some problems, it is possible
to determine what the first decision should be. This makes the
algorithms much simpler. The choice is usually based on choosing
an item from a set that is the "largest" or "smallest" or "most
valuable" or "least expensive" in some sense, and for this reason the
algorithms are called Greedy Algorithms.
For example, consider a version of the Knapsack Problem called The Fractional Knapsack Problem.
Here, we are able to carry fractions of the items. More
formally, we have a knapsack of capacity K, and a set of n items, each
with a mass and a value. Our goal is to fill the knapsack with
the most valuable total load. We can use any fraction of each
item: if the value of a particular item is v and we carry fraction x of
the item, then the value of that fraction is x*v.
It seems
intuitive that we should carry as much as we can of the items that have
the highest value/mass ratio, and in fact this idea does lead to the
optimal solution. We can state the algorithm as follows:
Greedy Fractional Knapsack:
Sort the items in order of increasing value/mass ratio
Repeat
Take as much as possible of the next item in the sorted list
Until the knapsack is full, or we run out of items
As with any algorithm, we must address two questions:
1. Does it always find the right answer?
2. What is its computational complexity?
Question
2 is the easier to answer: sorting the items takes O(n*log n)
time. The loop executes at most n times, and each iteration takes
constant time, so the loop is O(n). Thus the whole algorithm is
O(n*log n).
Question 1 is a little bit trickier, but the proof
is virtually the same for every Greedy algorithm First, we
establish that the algorithm's first choice is "safe", ie that there is
some optimal solution that contains that choice. Second, we
establish that the safe first choice can be combined with an optimal
solution to the reduced problem that results from the first choice, and
the result will be an overall optimal solution. (This part is
virtually identical to the proof that a problem satisfies the Principle
of Optimality for dynamic programming.) We use induction to put
it all together and get a proof that the algorithm always finds an
optimal solution.
For the Fractional Knapsack Problem, the proof looks like this:
Step 1 (prove that the first choice is safe). Let x be the first
item chosen by the algorithm, and let m be the fraction of x that the
algorithm chooses. Let O be an optimal solution to the problem.
If O contains fraction m of item x, we are done. On the
other hand, suppose O contains less than m of x (do you see why it
cannot contain more than m of x?). Then O must contain some
amount of another item. If we take out some of that item and put
in more of item x, we have another solution whose value must be >=
the value of O (since we are adding more of the most valuable item).
We can continue this until we have added as much of x as
possible, without ever decreasing the value of the solution. This
ends up with a solution that contains m of x, and has total value >=
the value of O. But O is optimal, so no solution can have value
> the value of O. Thus the new solution has value = value of
O, which means it is an optimal solution. Thus there does exist
an optimal solution that has the same amount of x as the algorithm
chooses .... and thus the algorithm's first choice is safe.
Step 2: Consider an optimal solution O that contains the
algorithm's first choice. The rest of O (ie the amounts of the
other items) is a solution for a smaller set of items, and a knapsack
of a smaller size: K minus the amount of mass filled by the algorithm's
first choice. Suppose that the rest of O is not an optimal
solution for this reduced problem. Then we could substitute an
optimal solution for the reduced problem, and this would increase the
value of O ... but that is not possible since O is optimal. Thus
the rest of O must be an
optimal solution for the reduced problem. Thus when we combine
the algorithm's first choice with an optimal solution to the reduced
problem, we get an optimal solution to the original problem.
Step 3: Clearly the algorithm works correctly if there is only 1 item.
IH:
Assume the algorithm works on all sets of t items, where t is >= 1.
Suppose we have a set of t+1 items.
We have
shown that the algorithm's first choice is safe. This reduces the
problem to a smaller knapsack and a set of just t items. By our
inductive hypothesis (IH), we know the algorithm will find an optimal
solution to this reduced problem. We have shown that the the
algorithm's first choice can be combined with an optimal solution to
the reduced problem to give an overall optimal solution. Thus the
algorithm finds an optimal solution for the set of t+1 items.
Thus the algorithm finds the optimal solution for all sets.
The Greedy Algorithm Paradigm
Greedy
algorithms are used to solve optimization problems that involve
choosing some elements of a set, subject to some constraints.
They pretty much always have the same basic design:
1. Sort the elements according to some criterion.
2. Repeat
Select the next element in the sorted list, if it does not violate the constraints
Until no more elements can be selected.
(Grok
check - make sure you see how the Fractional Knapsack algorithm fits
this paradigm. What is the constraint on the selections?)
Minimum Spanning Trees
Finding
minimum spanning trees is one of the most important applications of the
Greedy paradigm. Minimum spanning trees are important in
communications networks.
Definitions:
A spanning tree of a graph is a set of edges that join up all the vertices, without containing any cycles.
If the edges of the graph have numeric weights attached to them, the weight of a spanning tree is the sum of the weights of the edges it contains. A minimum spanning tree is a spanning tree with the smallest possible total weight.
The
basic idea behind the MST algorithms we will discuss is to choose edges
from the edge set of the graph so that we are always confident that the
set of edges chosen so far are contained in some MST of the
graph. We can formalize this idea: Let A be a set of edges.
A is safe if there is some MST T that contains A. An edge e that is not in A is safe with respect to A if A+e is safe.
A cut of a graph is a partition of the vertex set into two non-empty partitions. We also use the word cut
to refer to the set of edges that go between the two partitions (this
is a very natural sense of the word - the cut goes through these
edges). A cut C respects an edge set A if none of the edges in A go between the two partitions of C.
Now,
suppose we have a safe set A, and a cut C that respects A. Any
MST T that contains A (and since A is safe, we know there is such a T)
must contain at least one edge that crosses C - if not, the two parts
of C would not be connected by T, so T would not be a spanning tree.
The goal of the MST algorithms we will look at is to choose an
edge e that crosses C, so that A+e is safe.
A completely generic MST algorithm can be stated as follows:
Let A be an empty set of edges
Repeat
add a safe edge to A
Until
|A| = n-1
// all spanning trees of a graph
with n vertices must have n-1 edges
Here is where the Greedy
part of the algorithm comes in. When designing Greedy algorithms,
we first ask "If I choose the smallest item that is available (or
largest, if we are maximizing), can I prove that this decision is ok?"
It
turns out that when finding a MST, we can simply choose the lightest
edge that crosses any cut that respects A. This is formalized in
the following theorem:
Theorem:
Let G be a graph, let A be a safe set of edges of G, let C be a
cut that respects A, and let e be the lightest edge that crosses C.
Then A+e is safe.
Proof: This theorem is proved very nicely in the text. Here I will just sketch the proof.
Suppose A+e is not safe, ie suppose that no MST contains A+e. Since A is safe, there does
exist a MST containing A - let T be such a MST. Consider T+e.
Since T is a MST, adding e to it must create a cycle. This
cycle must contain at least one edge of T that crosses C (since e
itself crosses C) and is not in A (since C respects A). Let f be
such an edge. Consider T+e-f. This must be a spanning tree,
and since the weight of e is <= the weight of f (remember, e is the
lightest edge that crosses C) the weight of T+e-f <= the weight of
T. Since T is a MST, T+e-f is also a MST ... but T+E-f
contains A+e, which contradicts our assumption. Therefore
A+e is safe.
Kruskal's MST Algorithm
Kruskal's algorithm is very straightforward:
1. Sort the edges of G according to increasing weight.
2. Let A be an empty set of edges
3. Repeat
let the next edge in the sorted list be e.
if the ends of e are not already connected in A, A = A+e
Until |A| = n-1
To
prove that Kruskal's algorithm is correct, all we need to do is show
that at each stage, the edge we choose is safe. But this is easy.
Since the ends of e are not connected in A, there is a cut C that
has the two ends of e on opposite sides of the partition.
Furthermore, e must be the lightest edge crossing C because we
are considering the edges in increasing-weight-order. If there
were a lighter edge than e, we would have considered that edge before
considering e. Thus we can apply our previous theorem and
conclude immediately that A+e is safe.
Implementing Kruskal's
algorithm is not difficult. The only non-trivial part is
determining if the ends of e are already connected in A. We will
look at two ways of doing this.
Method 1: We can use an n*n array R to keep track of which vertices are connected in A.
R[i,j]
= 0 if vertex i and vertex j are not connected in A, and R[i,j] = 1 if vertex i and vertex j are connected in A.
Initially
the array is all 0's, but as vertices get connected by the edges the
algorithm chooses, the array fills up with 1's.
To
keep track of which vertices are in each connected group, we can use a
linked list for each group. Initially each vertex is in a
separate group so we have n separate lists, each containing a single
vertex. We can join two groups by appending the list for one to
the end of the list for the other.
Testing to see if the ends of
e are already connected is now trivial - we simply refer to the
appropriate element of R. To update R when we choose an edge e,
we walk through the lists representing the groups containing the two
ends of e, and update all the relevant elements of R from 0 to 1.
Then we append one of the lists to the end of the other.
(There are some minor pointer operations that need to be settled
but they do not affect the analysis of the algorithm.) Each
element of R gets updated exactly once, so the total number of updates
to R is O(n^2).
Let G have n vertices and m edges. We can characterize the complexity of this implementation of Kruskal's algorithm as
O(m*log m // the sort
+ m
// test each
edge, if necessary
+
n^2
// update R
)
Observing
that m < (n^2)/2, and therefore log m < log (n^2) = 2 log n, ,
and that the "+ m" is irrelevant, we can write the complexity as
O(m*log n + n^2)
Now if the graph is dense and has O(n^2) edges,
the first term dominates. But if the graph is sparse and has only
O(n) edges, the second term dominates. If we are a bit crafty
about how we do the implementation, we can avoid this n^2 term.
Suppose
we represent each connected group in A by something sort of like a
binary tree, with a randomly chosen member vertex of the group at the
root. Then to see if two vertices are connected in A, we simply
trace upwards from each, in its own set-tree. If we arrive at the
same root, the vertices are connected in A. If we arrive at
different roots, the vertices are not connected in A. This takes
longer than simply looking up an element of an n*n array, but if the
set-trees are compact it will only take O(log n) time. We can
make sure they are compact. When we select an edge e to add to A,
the two set-trees containing e's ends need to be combined. We add
a pointer from the root of the smaller set-tree to the root of the
larger set-tree, making it the root of both trees. It is easy to
see that as the set-trees get combined, only vertices near the bottom
can have fewer than two children, so the trees are at least binary and
most likely even more dense than that. Thus each set-tree is
O(log n) in height. This combining operation takes constant time
each time we do it.
We can characterize the complexity of this implementation of Kruskal's algorithm as
O(m*log n // the sort, as before
+ m* log n // test each edge, if necessary
+ n-1
// each time an edge is selected,
we combine two set trees
)
This of course is equivalent to O(m*log n)
When
the graph is sparse, this implementation is asymptotically faster than
the first one. When the graph is dense, they have the same
complexity. Thus this implementation is superior to the first one.
Prim's MST Algorithm
We talked about this algorithm VERY BRIEFLY in class. In its basic form it looks like this
1. Let A be an empty set of edges
2. Choose an arbitrary starting vertex v. Let Q = {v}
3. Repeat
Let e be the lightest edge that
joins a vertex x in Q to a vertex y not in Q.
A = A + e
Q = Q + y
Until |A| = n-1
To
see that Prim's algorithm is correct, we observe that on each iteration
of the loop we can define a cut that has all the vertices on Q in one
partition and all the other vertices in the other partition. This
cut respects A, and we are choosing the lightest edge that crosses it.
By our earlier theorem, all of our choices are safe and so we end
up with a MST.
We did not do this in class, but it turns out
that the complexity of Prim's algorithm is the same as the second
implementation of Kruskal's algorithm. There may be reasons for
preferring Prim's algorithm, since it gives us the freedom of choice
for the starting vertex. We could for example choose a
particularly important vertext to start with, then run the algorithm
for a limited number of iterations. We would end up with a tree
that did not span the whole graph, but spanned the neighbourhood of the
vertex where we started.
Activity Selection
We
concluded our discussion of Greedy algorithms with a short look at a
simple problem (which was featured on last year's Greedy algorithms
test). Suppose we have a set of n activities, each with a fixed
start and end time. They can overlap. We want to choose the
largest possible set of non-overlapping activities. We are trying
to maximize the number of activities we choose, not the total amount of time we spend at them. Thus we would prefer to attend two short activities over one long one.
This
problem illustrates the challenge of finding a good Greedy algorithm:
deciding what criterion to use to sort the set of objects we choosing
from. Here, we could sort the activities according to their
length. This does not lead to optimal solutions - if you were not
in class on November 2, make sure you can find an example in which
choosing the shortest activity first is a mistake (ie. it cannot lead
to an optimal solution).
It turns out that if we sort the
activities according to their finish time, we get a simple algorithm
that is guaranteed to find an optimal solution. The algorithm
looks like this
1. Sort the activities according to their finish time
2. Let A be an empty set of activities
3. Repeat
let x be the next activity in the
sorted list. If x does not overlap with any activity in A, A = A+x
Until there are no more activities to consider
The
proof of correctness is virtually identical to the proof of correctness
for the Fractional Knapsack problem. Consider an optimal solution
O that does not contain the algorithm's first choice. Then the
activity in O that has the earliest end time could be replaced by the
algorithm's first choice, without creating any overlaps in the chosen
set. This set has the same size as O, so it is also an optimal
solution. Thus there does exist an optimal solution that contains
the algorithm's first choice. From that point on, the proofs are
basically identical.