Learning never exhausts the mind
Home >  Coding > Software Engineering > Comparison of Search and Sort Techniques

# Comparison of Search and Sort Techniques

Published 25th October 2001 by

In 1973, Knuth said, "There are many best methods depending on what is to be sorted, on what machine, and for what purpose".

Since there are many different sort and search algorithms, it is important that software engineers have a knowledge of their basic operation, relevant efficiencies etc in order to select the best one for a particular application. One efficiency criteria that gives an idea of the relationship between processing time and the size of the array is the simple big-O (O - Order of magnitude) calculation.

The three basic types of sorting (insertion, exchange and selection) will be compared with regard to the basic differences in their operation, their relative efficiencies and how they are affected with small and large array sizes and the initial ordered state.

The different search techniques (linear and binary) will be compared and the basic operation outlined. This includes how the absence of the required item can be found and under what condition each would be used.

## big-O Notation

The big-O Notation is also sometimes called Landau's Symbol. It is a technique used to mathematically describe the asymptotical behaviour of functions. It is a guide to how fast the output of the function grows with the increase of the input. The results are relative times.

Some common big-O functions are shown below.

• `O(1)` - Constant
• `O(log(n))` - Logarithmic
• `O((log(n))c)` - Polylogarithmic
• `O(n)` - Linear
• `O(n2)` - Quadratic
• `O(nc)` - Polynomial
• `O(cn)` - Exponential

## Sorting Techniques

### Insertion Sort

The insertion sort extracts the second data element from the data. It then compares the extracted element with the next and if the next is larger, the next element is moved up an index. The extracted element then takes its place. The process is repeated until no passes are made.

If the data is almost sorted, for example the numbers (8, 1, 2, 3, 4, 5, 6, 7, 9) then the 8 is extracted, 1 is moved to the first index, 2 into the second and so on until a number greater than 8 is encountered (or end of data). The 8 is then inserted into the data.

In general the maximum number of passes made is given by the formula N - 1, where N is the number of data elements.

The relative time of the algorithm is `O(N2)` for unsorted data. As the array size increases the time the algorithm takes to execute increases.

### Exchange Sort

The Exchange (or bubble) sort is the simplest algorithm.

The algorithm works taking an item and comparing it with the next adjacent item. If the first item is greater than the next item then they are swapped. If the first item is less than or equal to the next then no change is made. There are several passes through the data.

In general the maximum number of passes made is given by the formula N - 1, where N is the number of data elements.

The relative sort time of the algorithm is `O(N2)` for unsorted data.

There is also a bi-directional exchange sort that repeatedly makes a pass downwards then upwards sorting the data.

### Selection Sort

Sorts data by repeatedly comparing remaining items to find the smallest and moving it to the end of the data. The effective size (area of data unsorted) is then reduced and the process repeated on the sub array until the effective size is one (an array of one is sorted).

In general the maximum number of passes made is given by the formula N - 1, where N is the number of data elements.

The relative sort time of the algorithm is `O(N2)` for unsorted data.

### Shell Sort

The shell sort is the optimised version of the insertion sort. It takes the smallest value and moves all values above it down an index. The smallest value is then inserted at the start of the array. The process is repeated, with the next smallest value inserted below the last smallest value.

The shell sort has an average sort time of `O(N1.25)`.

### Quick sort

The Quick Sort uses the divide and conquer paradigm to sort the data. The first element is the central one. The data on the left is the sorted in the same way, then the data on the right. The algorithm is recursive so the data is recursively broken down into half's until the data cannot be broken down further.

The quick sort has a sort time of `O(n log n)`.

As the initial ordered state becomes sorted the time increase.

### Heap Sort

The heap sort builds a heap of the data and repeatedly extracts the maximum item.

To build the heap the algorithm loops through all items in the array and places the value on the heap. The item is pushed into its position on the tree (highest at the top and lowest at the bottom).

When all of the items have been placed in the tree, the values are then extracted back into their indexes in the array.

The heap sort has an time of `O(n log n)`

### Visual Analysis

The graph below shows a comparison between the `O(N2)` un-optimised sorts and the optimised `O(n log n)` and `O(N1.25)` sorts. The graph shows the time taken to sort the data.

It is clear that for the un-optimised sorts are nearly infinitely long to execute as the data size approaches 100. This would cause the algorithm to take a very long time to sort the data. The time increases quadratically. (The value is actually about 500 seconds or 8 minutes).

On the other hand the optimised algorithms still perform well with over 500 data items.

### Summary

1. For small amounts of unsorted data the basic sort types can be used.
2. For Large amounts of unsorted data the optimised types should be used
3. As the Array size increases, the time taken increases
4. Time increases as the data becomes more sorted

## Searching Techniques

### Linear Search

A linear search starts at the first data item and moves towards the last item trying to find an item. Linear searches can be done on data that is unsorted or sorted; there is no difference in performance.

The absence of the item can only be found once all of the items have been looked at.

The time taken to perform the search is `O(N/2)`.

### Binary Search

Binary search will only work on sorted data. The technique is similar to the quick sort in that is partitions the array into progressive half's until the item is found. If the search item is less than the middle item then the array is narrowed to the lower half. If the search item is greater than the middle value then the array is narrowed to the upper half.

The absence of a data item can be found easily, as the data is partitioned. The number of partitions to find the data depends on the number of data items.

The time taken to perform the search is `O(Log N)`.

### Visual Analysis of Search Techniques

The graph below compares the search times of the two search techniques described above. It can clearly be seen that the binary search is much faster than the linear search.

Binary search should be used for sorted data and large number of items. Linear search can be used for small amounts of data that is unsorted.

• Binary Search data must be sorted.
• Linear Search data can sorted or unsorted.