20090930
We
know that the Subset Sum problem is NP-Complete, and we know that the
practical implication of this is that nobody has found (and most
believe that nobody will ever find) a polynomial time algorithm to
solve this problem.
But suppose we absolutely have to solve this
problem. Let's call the original set S, and the target value k.
We could use the naive Brute Force and Ignorance (BFI) algorithm:
enumerate all the subsets of S, and check the sum of each one to see if
any of them sum to k. Since there are 2^n subsets, this algorithm
is going to take O(2^n) time.
It turns out we can do better ...
Suppose
we split the set S into two equal sized subsets A and B. If S has
a subset that sums to k, then the subset is either completely in A,
completely in B, or partly in A and partly in B.
Since we aren't using any kind of complicated selection process to split S in half, the split just take O(n) time.
Now
we compute the sum of every subset of A, and the sum of every subset of
B. We can simply use the BFI algorithm to list all these subset
sums - it takes O(2^(n/2)) time.
We can't do this any faster than O(2^(n/2)) time, since that is the number of subsets of each of A and B.
If we find any subset of A or B that sums to k, we can stop right away - we have found what we are looking for.
If
none of the subset of A or of B sums to k, we need to deal with the
possibility that S has a subset that sums to k, but that subset is
partly in A and partly in B. If that is the case, then the set of
subset sums from A will contain some value a, and the set of subset
sums from B will contain some value b, such that a+b = k. The
challenge is to find such a pair ... quickly.
The trick to doing
this is to sort the two sets of subset sums (note that we are sorting
the set of sum values for A and for B, not the sets A and B
themselves). Each of these sets has 2^(n/2) elements, so sorting
each of them takes O(n * 2^(n/2)) time.
Call the sorted set of A's sums A*, and the sorted set of B's sums B*.
Now
we can search for a winning pair (an "a" in A* and a "b" in B* such
that a+b = k) by starting with the smallest value in A* and the largest
value in B*. If the sum of these values = k, we win and we stop!
If the sum of these values is < k, then we can forget about
this value in A* - it cannot be part of a winning pair.
Similarly, if the sum of these values is > k, we can forget
about this value in B* - it cannot be part of a winning pair.
Thus after adding these numbers together, we either stop, or we
eliminate a value from A* or a value from B*. Repeating this
operation can only be done 2*2^(n/2) times - after that many iterations
we will have eliminated all values in A* and B*. Since each
iteration takes constant time, this part of the algorithm takes
O(2^(n/2)) time.
We see that the dominant term in the time
analysis is the time for the sort: O(n * 2^(n/2)), and thus this
is the complexity of the algorithm. Comparing this to the BFI
algorithm, we see that this algorithm is an enormous improvement.
It still requires exponential time, but the exponent has been cut
in half.
This algorithm illustrates the Divide and Conquer paradigm:
To solve a problem of size n (typically involving a set of n elements)
- divide the set into 2 or more disjoint subsets
- solve the problem on the subsets independently
- combine the subproblems' solutions to get the solution to the original problem
As we saw with the Subset Sum algorithm, D&C can be an effective technique for designing efficient algorithms.