Searching Arrays: Algorithms and Efficiency

Linear Versus Binary Search

Background
Linear Search
Linear Search Program
Linear Search Efficiency Analysis
Binary Search
Binary Search Program
Binary Search Efficiency Analysis
Efficiency Comparison
Key Points

1. Background

Searching a list for a particular item is a common task. In real applications, the list items typically are records (e.g. student records), and the list is implemented as an array of objects. The goal is to find a particular record, identified by name or an ID number such as a student number. Finding the matching list element provides access to target information in the record - the student's address, for example. The following discussion of search algorithms adopts a simpler model of the search problem - the lists are just arrays of integers. Clearly, the search techniques could generalize to more realistic data.

The concept of efficiency is important when comparing algorithms. For long lists and tasks, like searching, that are repeated frequently, the choice among alternative algorithms becomes important because the alternatives may differ in efficiency. To illustrate the concept of algorithm efficiency (or complexity), we consider two common algorithms for searching lists: linear search and binary search.
Back To Top

2. Linear Search

What is the simplest method for searching a list of elements? Check the first item, then the second item, and so on until you find the target item or reach the end of the list. That's linear search. Assume an array called list, filled with student ID numbers (integers), with numItems IDs in the list. The task is to search for the location of an ID that is the value of a variable target. Here's how to search for the location of target in list:

int location = -1;
int i = 0;
while ((i < numItems) && (list[i] != target))
   i++;
if (i < numItems)
   location = i;

A while loop is appropriate since the target could be anywhere in the list, or perhaps not in it at all. The program must be able to exit the search loop as soon the target is located, and to iterate over all the items in whole array, if necessary.

The initial value of the control variable i is 0 and increments by 1 each time through the loop, since valid index values range from 0 to the number of items minus one. The loop repeats as long as the target is not found and i is less than the number of items (the value of numItems). The following tests whether the value of the target variable is the same as that of the current list element:

list[i] != target

The condition that allows the loop to continue is the fact that the current element of the list is not the same as the value of target. Hence we use the "not equals" comparison: !=.

At loop exit, either target has been found, (list[i] == target), or the entire list has been checked without finding it, and i == numItems. In the latter case, location retains its initial value (-1). In the former, the value of location needs to change to the value of i:

if (i < numItems)
   location = i;

3. Linear Search Program

Here is a small Java Application Program that demonstrates linear search. It reads a list from the user, prompts for a search target, and then uses linear search to look for the target in the list. The program includes extra output statements to display steps in the search. It uses an array defined as a static variable, accessed directly by the readList and searchList methods.

The array is partially-filled. Since the integers in the list are meant to represent student ID numbers, it is reasonable to assume that they are all positive. Thus a sentinel entry of -1 can mark the end of the list.

// Program to illustrate Linear Search algorithm

public class LinearSearch{
   public static final int MAX_SIZE = 100;
   public static int[] list = new int [MAX_SIZE];
   // keep track of number of items in partially filled array
   public static int numItems;
   public static final int SENTINEL = -1;

   public static void readList( ) {
      // reads list items until sentinel, record total number read in
      int next;
      System.out.println("Enter a list of positive integers, one per line,\n ending with:" + SENTINEL);
      int i = 0;
      next = Helper.readInt( );
      while ((next != SENTINEL) && (i < MAX_SIZE)){
         list[i] = next;
         i++;
         next = Helper.readInt( );
      } // end while
      numItems = i;
   } // end readList

   public static int searchList(int target){
      // searches list of int's for target, up to position numItems,
      // returns location of target if found, otherwise -1
      int location = -1;
      int i = 0;
      while ((i < numItems) && (list[i] != target)) {
         System.out.println("Now checking position:" + i + "\t value: " + list[i]);
         i++;
      } // end while
      if (i < numItems)
         location = i;
      return location;
   } // end searchList

   public static void main (String[] args){
      int searchID;
      System.out.println("A program to illustrate the Linear Search Algorithm");
      readList( );
      System.out.print("Enter the number to search for:");
      searchID = Helper.readInt( );
      int position = searchList(searchID);
      if (position == -1)
         System.out.println("\n" + searchID + " is not in the list");
      else{
         System.out.println("\n" + searchID + " is at position:" + position + " in the list.");
      } // end else
   } // end main method
} // end LinearSearch class

4. Efficiency of Linear Search

Having implemented the Linear Search Algorithm, how would you measure its efficiency? A useful metric would be general, applicable to any (search) algorithm. Since a more efficient algorithm would take less time to execute, one approach would be to write a program for each of the algorithms to be compared, execute them, and measure the time each takes to finish. However, a more efficient metric would be one that allows algorithms to be evaluated before implementing them.

One such metric is the number of main steps the algorithm will require to finish. Of course, the exact number of steps depends on the input data. For the linear search algorithm, the number of steps depends on whether the target is in the list, and if so, where in the list, as well as on the length of the list.

For search algorithms, the main steps are the comparisons of list values with the target value. Counting these for data models representing the best case, the worst case, and the average case produces the following table. For each case, the number of steps is expressed in terms of N, the number of items in the list.

Case Comparisons for N = 100,000 Comparisons in terms of N

Best Case
(fewest comparisons possible) 1 (target is first item in list) 1

Worst Case
(most comparisons possible) 100,000 (target is last item in list) N

Average Case
(avg number of comparisons) 50,000 (target is middle item in list) N/2

The best case analysis doesn't tell us much. If the first element checked happens to be the target, any algorithm will take only one comparison. The worst and average case analyses give a better indication of algorithm efficiency.

Notice that if the list grows in size, the number of comparisons required to find a target item in both worst and average cases grows linearly. In general, for a list of length N, the worst case is N comparisons and the average case is N/2 comparisons. The algorithm is called linear search because its efficiency can be expressed as a linear function, with the number of comparisons to find a target increasing linearly as the size of the list.
Back To Top

5. Binary Search

First, note that a more efficient algorithm is only possible if the list is sorted, for instance in increasing order. Otherwise, there's no better strategy than simply checking every item in the list. For a sorted list, there is information about the location of the target even from comparisons that don't locate the target. They reveal whether the target is before or after that position in the list. That information can be used to narrow the search.

The strategy of binary search is check the middle (approximately) item in the list. If it is not the target and the target is smaller than the middle item, the target must be in the first half of the list. If the target is larger than the middle item, the target must be in the last half of the list. Thus, one comparison, reduces the number of items left to check by half!

The search continues by checking the middle item in the remaining half of the list. If it's not the target, the search narrows to the half of the remaining part of the list that the target could be in. The splitting process continues until the target is located or the remaining list consists of only one item. If that item is not the target, it's not in the list.

Here are the main steps of the binary search algorithm, expressed in pseudocode (a mixture of English and programming language syntax).

Main steps of Binary Search

1.  location = -1;
2.  while ((more than one item in list) and (haven't yet found target))
    2A.  look at the middle item
    2B.  if (middle item is target)
              have found target
         else
              2C.  if (target < middle item)
                        list = first half of list
                   else (target > middle item)
                        2D. list = last half of list
    end while
3.  if (have found target)
        location = position of target in original list
4.  return location as the result

It might appear that this algorithm changes the list. Each time through the while loop half of the list is "discarded", so that at the end the list may contain only one item. There are two reasons to avoid actually changing the list. First, it may be needed again, to search for other targets, for example. Second, it creates extra work to delete half of the items in a list, or to make a copy of a list. This could offset any gains in efficiency.

Rather than changing the list, the algorithm changes the part of the list to search. Two integer variables, first and last, record the first and last positions of the part of the list remaining to be searched. The integer variable, middle, stores the middle position between first and last. Each time through the loop, the target is compared to the middle item. If the target is less than the middle item, the next iteration searches the lower half of the remaining part of the list by setting last to the position just before middle. Thus, the next part to search is bound by positions first and middle -1. If target is greater than the middle item, the next iteration searches the upper half of the remaining part of the list by setting first to the position just after middle. Thus the next part to search is bound by middle + 1 and last.

The search ends when the target item is found or the values of first and last cross over, so that last < first, indicating that there are no list items left to check.

Here is the revised pseudocode for the algorithm:

location = -1; 
first = 0; 
last = number of items in list minus 1;
 
while ((number of items left to search >= 1) and (target not found)) 
    middle = position of middle item, halfway between first and last 
    if (item at middle position is target) 
        target found 
    else 
        if (target < middle item) 
            search lower half of list next
            last = middle - 1; 
        else 
            search upper half of list next
            first = middle + 1; 
end while
 
if (target found) (i.e., middle item == target) 
    location = position of target in list (i.e., middle)
  
return location as the result

Here is a sample of the operation of the binary search algorithm, displayed as a table. The rows of the table, starting from the top, are the array indices, the data stored at the indexed location, and the index values for first (F), last (L) and middle (M) at each iteration of the algorithm. If the target value is 52, its location is found on the 3^rd iteration.

index	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15
data	23	27	29	32	34	41	46	47	49	52	55	68	71	74	77	78
1^st iteration	F							M								L
2^nd iteration									F			M				L
3^rd iteration									F	M	L

To modify the linear search program to use binary search, we only need to change the body of the method searchList.

Here is the revised searchList method, with the header line unchanged. Since the method can be updated to a more efficient algorithm without changing the header line, no program that uses the method needs to change either. This is a benefit of modularity. Using methods as modules to organize the main steps of the program isolates the algorithms used, so that changes to them can be made independently.

   public static int searchList(int target){
      // searches list of integers for target
      // up to position numItems - 1 in partially filled array
      // returns location of target if found, otherwise -1
      boolean found = false;
      int location = -1;
      int first = 0, last = numItems - 1, middle = 0;
      int midItem=0;
      while ((!found) && (last >= first)) {
         middle = (first + last)/2;
         midItem = list[middle];
         System.out.println("Checking middle position " + middle + ", value:" + midItem);
         if (midItem == target){
            found = true;
            System.out.println("Target found");
         } // end if
         else if (target < midItem){
            last = middle - 1;
            System.out.println("Search lower half, bounded by:" + first + " .. " + last);
         } // end else if
         else{
            first = middle + 1;
            System.out.println("Search upper half, bounded by:" + first + " .. " + last);
         } // end else
      } // end while
      if (midItem == target)
         location = middle;
      return location;
   } // end searchList

6. Binary Search Program

To produce a complete Application Program for demonstrating the Binary Search algorithm, substitute the revised searchList method for the one using linear search. Since the implementation assumes the list is sorted in increasing order, the readList method should be revised to prompt the user appropriately. Here is the outline of the complete Binary Search program.

// Program that illustrates the Binary Search algorithm

public class BinarySearch{
   // insert variable declarations as in LinearSearch
   
   // insert readList method with revised prompts
   
   // insert searchList method as above
   
   // insert main method as in LinearSearch
 
} // end BinarySearch class

7. Efficiency of Binary Search

Is binary search more efficient than linear search? If so, how much more efficient is it? To evaluate binary search, count the number of comparisons in the best case and worst case. This analysis omits the average case, which is a bit more difficult, and ignores any differences between algorithms in the amount of computation corresponding to each comparison.

The best case occurs if the middle item happens to be the target. Then only one comparison is needed to find it. As before, the best case analysis does not reveal much.

When does the worst case occur? If the target is not in the list then the process of dividing the list in half continues until there is only one item left to check. Here is the pattern of the number of comparisons after each division, given the simplifying assumptions of an initial list length that's an even power of 2 (1024) and exact division in half on each iteration:

Number of Items Left to Search Number of Comparisons So Far

1024 0

512 1

256 2

128 3

64 4

32 5

16 6

8 7

4 8

2 9

1 10

For a list size of 1024, there are 10 comparisons to reach a list of size one, given that there is one comparison for each division, and each division splits the list size in half.

In general, if N is the size of the list to be searched and C is the number of comparisons to do so in the worst case, C = log₂N. Thus, the efficiency of binary search can be expressed as a logarithmic function, in which the number of comparisons required to find a target increases logarithmically with the size of the list.

To summarize, here is a table showing our analysis:

Case Number of Comparisons if N = 100, 000 Number of Comparisons in terms of N

Best Case (least comparisons possible) 1 (target is middle item in list) 1

Worst Case (most comparisons possible) 16 log₂N

Case	Number of Comparisons if N = 100, 000	Number of Comparisons in terms of N
Best Case (least comparisons possible)	1 (target is middle item in list)	1
Worst Case (most comparisons possible)	16	log₂N

8. Efficiency Comparison

Note that the worst case number of comparisons is just 16 for a list with 100,000 items, versus 100,000 for linear search! Furthermore, if the list were doubled in size to 200,000, the maximum number of comparisons for binary search would only increase by 1 to 17, whereas for linear search it would double from 100,000 to 200,000.

Here is a table showing how the maximum number of comparisons increases for binary search and linear search:

Size of List Maximum Number of Comparisons

Linear Search Binary Search

100,000 100,000 16

200,000 200,000 17

400,000 400,000 18

800,000 800,000 19

1,600,000 1,600,000 20

Size of List	Maximum Number of Comparisons
Linear Search	Binary Search
100,000	100,000	16
200,000	200,000	17
400,000	400,000	18
800,000	800,000	19
1,600,000	1,600,000	20

9. Key Points

A common operation on arrays of records is to search the list to retrieve a particular record.
There are alternative algorithms for searching an array. The number of comparisons required is a useful metric for their efficiency.
Linear search examines each list item in turn until the target is located or the end of the list is reached.
Linear search efficiency for the worst case data model is a linear function of the number of items in the list.
If the list is sorted, a binary search strategy may be used.
Binary search uses the result of each comparison to eliminate half of the list from further searching.
Binary search efficiency for the worst case is a logarithmic function of list size.

Case	Comparisons for N = 100,000	Comparisons in terms of N
Best Case (fewest comparisons possible)	1 (target is first item in list)	1
Worst Case (most comparisons possible)	100,000 (target is last item in list)	N
Average Case (avg number of comparisons)	50,000 (target is middle item in list)	N/2

Number of Items Left to Search	Number of Comparisons So Far
1024	0
512	1
256	2
128	3
64	4
32	5
16	6
8	7
4	8
2	9
1	10