20091118


Approximation Algorithms

The Vertex Cover Problem

Given a graph G on n vertices and m edges, we want to find  the smallest set A of vertices such that each edge of G has at least one end in the set A.  (This has applications in network monitoring and maintenance.)

This problem is NP-Complete (to be more precise, it is the related decision problem that is NP-complete but the decision problem and the optimization problem are so closely related that we can treat them as the same problem) so we should not expect to find a polynomial-time algorithm that will correctly solve all instances.  The question to be addressed is whether we can find a polynomial-time algorithm which always gets reasonably close to the optimal answer.

We might consider approaching this problem with a greedy method, such as
    repeat
        choose a vertex with the largest number of incident edges
        add this vertext to A
        remove from G all the edges incident on the vertex just chosen
    until G has no remaining edges

Exercise 35.1-3 of the text asks you to determine if this algorithm always finds a good solution.


We will consider an algorithm which initially seems too simple-minded to be likely to give a good result:

    repeat
        arbitrarily select an edge of G
        add both ends of the selected edge to A
        remove from G all edges incident on either of the two vertices just added to A
    until G has no remaining edges

There are some obvious questions to ask about this algorithm

1.  Does it run in polynomial time?
        The repeat loop executes at most O(n^2) times, and each iteration of the loop requires no more work than a traversal of the list of the edges ... this clearly takes polynomial time.

2.  Does it find a vertex cover?
        The edges that are removed on each iteration are all covered by the selected vertices.  By repeating the loop until all edges are removed from G, we ensure that all edges of G are covered by the selected set of vertices.

3.  Does it find a good vertex cover?

          This is the interesting question, and it turns out to be pretty straightforward to answer.

Let's let C* be an optimal solution to this problem.  A is the set of vertices chosen by the algorithm.  Let B be the set of edges chosen during the execution of the algorithm.  (Note that B is not the solution returned by the algorithm.  That solution is A.  The set B is useful for our analysis of the algorithm.)

Since C* is a vertex cover, we know it covers the edges in B.  Furthermore, since none of the edges in B share endpoints, it is clear that no vertex in C* can cover more than one edge in B.  Thus for each edge in B, we can find a vertex in C* that covers that edge and no other edge in B.  This leads to the conclusion that |C*| >= |B|

The significance of this is that the size of B is a lower bound on the size of the optimal solution.  Note that because the algorithm chooses edges arbitrarily, we have no way of predicting the size of B - and it may be different if we run the algorithm twice on the same graph.  However, as we have seen, the size of B will never exceed the size of the optimal solution.

Now we will relate the size of B to the size of A.  This is very easy - for each edge in B, two vertices are added to A.  So |A| = 2|B|

Putting these two pieces together, we see that |A| <= 2|C*|

This is exactly where we want to be: we have shown that no matter how the algorithm chooses its edges, the size of A will never be more than twice the size of the optimal solution.