20090916
To
formalize the concepts of "easy" and "difficult" problems, we
need a measure of difficulty. The measure which was eventually
accepted is the computational complexity of the best algorithm to solve
the problem. This is certainly sufficient to show that a problem
is easy: the existence of a low-order polynomial-time algorithm that
solves a problem shows that the problem is "easy". But to show
that a problem is "difficult", it seems that we would have to prove
that there cannot exist an efficient algorithm that solves it.
This is sometimes
possible, but the problems for which we have been able to prove that
efficient algorithms cannot exist are few in number, and generally not
very interesting.
Of course this is not merely a problem
of theoretical interest. As practicing computer scientists, we
will forever be faced with new problems to solve. What we need is
a practical method for determining if a new problem is one that we can
solve efficiently, or whether it belongs to the set of problems that
are too hard to solve. The first steps towards developing the
tools we need were taken around 1970 by Steve Cook and Leonid Levin,
working independently. However before we can appreciate their
discoveries, we need to define some terms.
First, we will restrict the types of problems under consideration. We will focus on the class of decision problems.
Definition:
A decision problem is one in which the answer is either Yes or
No, and the answer is completely determined by the input information.
For
example "Given a set of integers, is the number 17 in the set?" is a
decision problem. However, "Given a coin with a head on one side
and a wapiti on the other, will the coin land wapiti side up the next
time it is tossed?" is not a decision problem. The answer is
either Yes or No, but the answer is not completely determined by the
input.
The class P: P
is the set of all decision problems that can be solved (i.e. the
complete details of the solution, if there is one, can be found) using
a "real" computer in O(n^k) time, where k is constant for each problem.
For
example, the problem "Given a graph on n vertices, are there three
vertices that are all mutually adjacent?" can be solved by an algorithm
that examines each of the (n choose 3) possible solutions. Since
such an algorithm clearly runs in O(n^3) time, this problem is in the
class P.
The class NP: This class is a bit more complex than P. First, we distinguish between solving a problem - actually finding the answer - and verifying
a solution - checking the details to make sure they are correct.
For example, if the problem is "Given a set of integers, is the
number 17 in the set?" the answer might be "Yes, it is in position 5 in
the set". To verify this we would check the appropriate value to
see if it really is 17.
Second, we imagine a type of computer, called a Non-Deterministic Turing Machine (this is where then N in NP
comes from) which has the magic ability to guess the right answer to
any decision problem, and which will provide us with the details if the
answer is Yes. Once the NDTM has performed this magic trick, it
relapses back into being a normal computer.
Now we can define NP. NP
is the set of all decision problems that can be solved by a NDTM which
verifies all Yes answers in O(n^k) time, where k is constant for each
problem.
In other words, NP
is the set of all decision problems for which if someone tells you the
answer, if the answer is No you don't have to do anything, but if the
answer is Yes and they give you the details of the answer, you can
verify the correctness of that answer in polynomial time.
The first response when one hears about the class NP is often "What is the point of talking about problems that are solved on imaginary computers?"
We'll answer that question in two steps. First, it is important to see that P is contained in NP.
If we have an algorithm that correctly solves a problem and
provides details of the solution, then we could use that same algorithm
to verify the details of the solution "guessed" by an NDTM.
The question to be addressed is, are there any problems in NP that are not in P?
In other words, are there any problems that can be verified using
an NDTM that cannot be solved using a "real" computer? It would
seem that there absolutely must be some - after all, the NDTM can
magically guess right answers to everything, and all it has to do is
verify the Yes answers.