Elementary Sorting Algorithms
There is a suprisingly diverse collection of algorithms that have been
developed to solve the apparently simple problem of "Sorting".
The general sorting problem is simple enough to describe: Given an
initially unordered array of N records, with one field distinguished as the
key, rearrange the records so they are sorted into increasing (or
decreasing) order, according to each record's key.
Sorting algorithms are used in all kinds of applications and are necessary,
for instance, if we plan to use efficient searching algorithms like Binary
Search or Interpolation Search, since these require their data to be
Sorting algorithms are often subdivided into "elementary" algorithms that
are simple to implement compared to more complex algorithms that, while
more efficient, are also more difficult to understand, implement, and
It is not always true that the more complex algorithms are the preferred
ones. Elementary algorithms are generally more appropriate in the following
- less than 100 values to be sorted
- the values will be sorted just once
- special cases such as:
- the input data are "almost sorted"
- there are many equal keys
In general, elementary sorting methods require O(N2) steps for N random key
values. The more complex methods can often sort the same data in just O(N
log N) steps. Although it is rather difficult to prove, it can be shown
that roughly N log N comparisons are required, in the general case.
Internal vs. External Sorting
Normally, when considering a sorting problem, we will assume that the
number of records to be sorted is small enough that we can fit the entire
data set in the computer's memory (RAM) all at once. When this is true, we
can make use of an internal sorting algorithm, which assumes that any key
or record can be accessed or moved at any time. That is, we have "random
access" to the data.
Sometimes, when sorting an extremely large data set such as Canada's Census
Data, there are simply too many records for them to all fit in memory at
once. In this case, we have to resort to external sorting algorithms that
don't assume we have random access to the data. Instead, these algorithms
assume the data is stored on magnetic tapes or disks and only portions of
the data will fit in memory. These algorithms use "sequential access" to
the data and proceed by reading in, processing, and writing out blocks of
records at a time. These partially sorted blocks need to be combined or
merged in some manner to eventually sort the entire list.
When a sorting algorithm is applied to a set of records, some of which
share the same key, there are several different orderings that are all
correctly sorted. If the ordering of records with identical keys is always
the same as in the original input, then we say that the sorting algorithm
used is "stable".
This property can be useful. For instance, consider sorting a list of
student records alphabetically by name, and then sorting the list again,
but this time by letter grade in a particular course. If the sorting
algorithm is stable, then all the students who got "A" will be listed
Stability is a difficult property to achieve if we also want our sorting
algorithm to be efficient.
One final issue to keep in mind when implementing a sorting algorithm is
the size of the records themselves. Many sorting algorithms move and
interchange records in memory several times before they are moved into
their final sorted position. For large records, this can add up to lots of
execution time spent simply copying data.
A popular solution to this problem is called "indirect sorting". The idea
is to sort the indices of the records, rather than the records themselves.