CISC-365 September 16, 2009

20090916

To formalize the concepts of "easy" and "difficult" problems, we need a measure of difficulty. The measure which was eventually accepted is the computational complexity of the best algorithm to solve the problem. This is certainly sufficient to show that a problem is easy: the existence of a low-order polynomial-time algorithm that solves a problem shows that the problem is "easy". But to show that a problem is "difficult", it seems that we would have to prove that there cannot exist an efficient algorithm that solves it. This is sometimes possible, but the problems for which we have been able to prove that efficient algorithms cannot exist are few in number, and generally not very interesting.

Of course this is not merely a problem of theoretical interest. As practicing computer scientists, we will forever be faced with new problems to solve. What we need is a practical method for determining if a new problem is one that we can solve efficiently, or whether it belongs to the set of problems that are too hard to solve. The first steps towards developing the tools we need were taken around 1970 by Steve Cook and Leonid Levin, working independently. However before we can appreciate their discoveries, we need to define some terms.

First, we will restrict the types of problems under consideration. We will focus on the class of decision problems.

Definition: A decision problem is one in which the answer is either Yes or No, and the answer is completely determined by the input information.

For example "Given a set of integers, is the number 17 in the set?" is a decision problem. However, "Given a coin with a head on one side and a wapiti on the other, will the coin land wapiti side up the next time it is tossed?" is not a decision problem. The answer is either Yes or No, but the answer is not completely determined by the input.

The class P: P is the set of all decision problems that can be solved (i.e. the complete details of the solution, if there is one, can be found) using a "real" computer in O(n^k) time, where k is constant for each problem.

For example, the problem "Given a graph on n vertices, are there three vertices that are all mutually adjacent?" can be solved by an algorithm that examines each of the (n choose 3) possible solutions. Since such an algorithm clearly runs in O(n^3) time, this problem is in the class P.

The class NP: This class is a bit more complex than P. First, we distinguish between solving a problem - actually finding the answer - and verifying a solution - checking the details to make sure they are correct. For example, if the problem is "Given a set of integers, is the number 17 in the set?" the answer might be "Yes, it is in position 5 in the set". To verify this we would check the appropriate value to see if it really is 17.

Second, we imagine a type of computer, called a Non-Deterministic Turing Machine (this is where then N in NP comes from) which has the magic ability to guess the right answer to any decision problem, and which will provide us with the details if the answer is Yes. Once the NDTM has performed this magic trick, it relapses back into being a normal computer.

Now we can define NP. NP is the set of all decision problems that can be solved by a NDTM which verifies all Yes answers in O(n^k) time, where k is constant for each problem.

In other words, NP is the set of all decision problems for which if someone tells you the answer, if the answer is No you don't have to do anything, but if the answer is Yes and they give you the details of the answer, you can verify the correctness of that answer in polynomial time.

The first response when one hears about the class NP is often "What is the point of talking about problems that are solved on imaginary computers?"

We'll answer that question in two steps. First, it is important to see that P is contained in NP. If we have an algorithm that correctly solves a problem and provides details of the solution, then we could use that same algorithm to verify the details of the solution "guessed" by an NDTM.

The question to be addressed is, are there any problems in NP that are not in P? In other words, are there any problems that can be verified using an NDTM that cannot be solved using a "real" computer? It would seem that there absolutely must be some - after all, the NDTM can magically guess right answers to everything, and all it has to do is verify the Yes answers.