CISC-365, December 2, 2009

20091202

Approximation Algorithms

Approximating the Travelling Salesperson Problem - general case.

Let us suppose that we have a polynomial time algorithm A that is a RHO-approximation for the general TSP. We will show that A lets us solve an NP-Complete problem in polynomial time. Thus A can only exist if P = NP.

Suppose we have an instance of the Hamilton Cycle problem: Given a graph G with n vertices, does G contain a cycle that passes through each vertex exactly once?

We will convert this HC instance into an instance of TSP - and we will do it in such a way that the only way our hypothetical algorithm A can get within a factor of RHO of the optimal solution is by correctly answering the original HC question on the graph G.

To construct an instance of TSP, we need a complete graph (ie no missing edges) and the edges must all have weights. We start by giving all edges of G the weight 1. Note that if G does have a Ham Cycle, the total weight of the cycle is n (it consists of n edges, each with weight 1).

Now add in all the missing edges, and give each new edge the weight n*RHO + 1. Note that if G does not have a Ham Cycle, then any Ham Cycle of the graph after the new edges have been added must include at least one of the new edges, and so its weight is > n*RHO.

Now apply our hypothetical algorithm A to this instance of TSP. There are only two possibilities:

1. G has a Ham Cycle. In this case the optimal solution to the TSP has weight n, and since A must get within a factor of RHO of the optimal solution, A cannot choose any of the new edges (doing so would result in a solution that weighs more than n*RHO). Thus the only way A can get within a factor of RHO of the optimal solution is to actually find the Ham Cycle in G, and return a solution with total weight = n.

2. G does not have a Ham Cycle. In this case A's solution must contain at least one of the new edges, and so A's solution will have weight > n*RHO

Thus, A will either return a TSP solution with weight n, which tells us that G has a Ham Cycle, or A will return a TSP solution with weight > n*RHO, which tells us that G does not have a Ham Cycle.

Thus, A solves the Ham Cycle problem on G in polynomial time. Since Ham Cycle is NP-Complete, this implies that P=NP.

The conclusion is that if A exists, then P=NP. Since most researchers believe that P != NP, this is taken as strong evidence that algorithm A cannot exist.

Randomized Approximation Algorithms

Our final topic (loud cheers from long-suffering students!) looks at approximation algorithms which cannot be guaranteed to get within a factor of RHO of the optimal solution, but which contain some random choices, the result of which is a solution that we expect to be within a factor of RHO of the optimal solution.

As an example, consider the MAX-3-CNF-SAT problem: it's an optimization version of the satisfiability problem. More formally we can state the problem as follows: given a boolean expression with n literals and m clauses in CNF form, in which each clause contains exactly 3 literals, what is the maximum number of clauses that can be simultaneously true? If the expression is satisfiable, the answer to MAX-3-CNF-SAT is m, but if the expression is not satisfiable, we are trying to determine how close to satisfiable it is. (As a motivation, we might imagine that each clause represents a specification requirement on a piece of software. We are asking how many of these requirements can be simultaneously satisfied.)

We will put some simple restrictions on the boolean expression - these do not affect the problem, but they simplify our analysis. We will assume
    - each clause contains distinct literals - for example, a clause like (x3 or x3 or x4) is not allowed
    - no clause contains a literal and its negation - for example, a clause like (x3 or not x3 or x4) is not allowed

Our algorithm is almost laughably simple. We assign the value True to each literal with probability 50% (and therefore, False is assigned to each literal with probability 50% also). This is obviously a polynomial-time algorithm - we will show that the expected number of clauses that are satisfied under this truth assignment is within a constant factor of the maximum number of clauses that can be satisfied.

For each clause, the probability that the clause is satisfied is 7/8 (since the only way the clause can be unsatisfied is if all three literals are false).

Let Yi be a new variable such that
        Yi = 1 if clause i is satisfied
        Yi = 0 if clause i is not satisfied

The expected value of Yi is E(Yi) = 7/8

Let Y = Y1 + Y2 + ... +Ym

Y is the number of clauses that are satisfied.

By the linearity of expectation, we see that E(Y) = E(Y1) + E(Y2) + ... + E(Ym) = m*7/8

Now we need an upper bound on the number of clauses satisfied in the optimal solution. We could try for something clever, but there is no need - we can use trivial upper bound: since the expression has m clauses, the optimal solution must satisfy <= m clauses.

Let C* be the number of clauses satisfied in the optimal solution, and let C be the number of clauses satisfied by our algorithm. Then the ratio C*/C <= m/C.

As we have seen above, the expected value of C is m*7/8, so the expected value of m/C is 8/7. Thus we see that the expected result of the algorithm is that C*/C <= 8/7, and the algorithm is a randomized 8/7-approximation algorithm for MAX-3-CNF-SAT.