Approximation Algorithms
Approximating the Travelling Salesperson Problem - general case.
Let
us suppose that we have a polynomial time algorithm A that is a
RHO-approximation for the general TSP. We will show that A lets
us solve an NP-Complete problem in polynomial time. Thus A can
only exist if P = NP.
Suppose we have an instance of the
Hamilton Cycle problem: Given a graph G with n vertices, does G
contain a cycle that passes through each vertex exactly once?
We
will convert this HC instance into an instance of TSP - and we will do
it in such a way that the only way our hypothetical algorithm A can get
within a factor of RHO of the optimal solution is by correctly
answering the original HC question on the graph G.
To construct
an instance of TSP, we need a complete graph (ie no missing edges) and
the edges must all have weights. We start by giving all edges of
G the weight 1. Note that if G does have a Ham Cycle, the total
weight of the cycle is n (it consists of n edges, each with weight 1).
Now
add in all the missing edges, and give each new edge the weight n*RHO +
1. Note that if G does not have a Ham Cycle, then any Ham Cycle
of the graph after the new edges have been added must include at least
one of the new edges, and so its weight is > n*RHO.
Now apply our hypothetical algorithm A to this instance of TSP. There are only two possibilities:
1.
G has a Ham Cycle. In this case the optimal solution to the
TSP has weight n, and since A must get within a factor of RHO of the
optimal solution, A cannot choose any of the new edges (doing so would
result in a solution that weighs more than n*RHO). Thus the only
way A can get within a factor of RHO of the optimal solution is to
actually find the Ham Cycle in G, and return a solution with total
weight = n.
2. G does not have a Ham Cycle. In this
case A's solution must contain at least one of the new edges, and so
A's solution will have weight > n*RHO
Thus, A will either
return a TSP solution with weight n, which tells us that G has a Ham
Cycle, or A will return a TSP solution with weight > n*RHO, which
tells us that G does not have a Ham Cycle.
Thus, A solves the Ham Cycle problem on G in polynomial time. Since Ham Cycle is NP-Complete, this implies that P=NP.
The
conclusion is that if A exists, then P=NP. Since most researchers
believe that P != NP, this is taken as strong evidence that algorithm A
cannot exist.
Randomized Approximation Algorithms
Our
final topic (loud cheers from long-suffering students!) looks at
approximation algorithms which cannot be guaranteed to get within a
factor of RHO of the optimal solution, but which contain some random
choices, the result of which is a solution that we expect to be within a factor of RHO of the optimal solution.
As
an example, consider the MAX-3-CNF-SAT problem: it's an optimization
version of the satisfiability problem. More formally we can state
the problem as follows: given a boolean expression with n literals and
m clauses in CNF form, in which each clause contains exactly 3
literals, what is the maximum number of clauses that can be
simultaneously true? If the expression is satisfiable, the answer
to MAX-3-CNF-SAT is m, but if the expression is not satisfiable, we are
trying to determine how close to satisfiable it is. (As a
motivation, we might imagine that each clause represents a
specification requirement on a piece of software. We are asking
how many of these requirements can be simultaneously satisfied.)
We
will put some simple restrictions on the boolean expression - these do
not affect the problem, but they simplify our analysis. We will
assume
- each clause contains distinct literals - for example, a clause like (x3 or x3 or x4) is not allowed
- no clause contains a literal and its negation - for example, a clause
like (x3 or not x3 or x4) is not allowed
Our algorithm is almost
laughably simple. We assign the value True to each literal with
probability 50% (and therefore, False is assigned to each literal
with probability 50% also). This is obviously a polynomial-time
algorithm - we will show that the expected number of clauses that are
satisfied under this truth assignment is within a constant factor of
the maximum number of clauses that can be satisfied.
For each
clause, the probability that the clause is satisfied is 7/8 (since the
only way the clause can be unsatisfied is if all three literals are
false).
Let Yi be a new variable such that
Yi = 1 if clause i is satisfied
Yi = 0 if clause i is not satisfied
The expected value of Yi is E(Yi) = 7/8
Let Y = Y1 + Y2 + ... +Ym
Y is the number of clauses that are satisfied.
By the linearity of expectation, we see that E(Y) = E(Y1) + E(Y2) + ... + E(Ym) = m*7/8
Now
we need an upper bound on the number of clauses satisfied in the
optimal solution. We could try for something clever, but there is
no need - we can use trivial upper bound: since the expression
has m clauses, the optimal solution must satisfy <= m clauses.
Let
C* be the number of clauses satisfied in the optimal solution, and let
C be the number of clauses satisfied by our algorithm. Then the
ratio C*/C <= m/C.
As we have seen above, the expected value
of C is m*7/8, so the expected value of m/C is 8/7. Thus we see
that the expected result of the algorithm is that C*/C <= 8/7, and
the algorithm is a randomized 8/7-approximation algorithm for
MAX-3-CNF-SAT.