There are two main things that faster sorting algorithms tackle:
Quicksort is a recursive sorting function which is "easy" to implement. It works well with a variety of input data, and uses additional memory (plus memory for the additional recursive stack frames).
Quicksort is a divide and conquer algorithm.
The base case: Arrays of length 0 or 1 are trivially sorted.
The recursive step:
void quicksort(int a[], int left, int right) {
// Base case
if (left >= right) return;
// Recursive step
// This returns an index where everything to the left of
// that index is less than it, and everything to the right
// of it is greater than it.
int pivot = partition(a, left, right);
quicksort(a, left, pivot - 1);
quicksort(a, pivot + 1, right);
}
Note how you never have to sort the pivot location. The main part of the quicksort research is this: how do we partition?
It would be ideal if you could pivot at the middle. If you've done that, you've found the median. Why don't you just find the median and pivot from there? This is a tad hard to find.
Simple alternative: just pick any element. If the array is random, you can always pick the first item and you are just as likely to get the median as anything else. This isn't guaranteed to be a good pick, but the quality is amortized.
Let's say your dataset is . Let's say you use a simple heuristic, and choose the last element as the pivot: 6.
You start from the left hand side. As long as what you're looking at left of 6 is less than the pivot, then it's in the right place. As long as what you're looking at right of 6 is greater than the pivot, it's in the right place.
First, since 9 is left of 6 and shouldn't be there, you go from the right and find something less than your pivot. The first of these values is 5. Swap 9 and 5.
Next, since 7 is greater than your pivot, the only value you can swap it with is 6. So you swap 6 and 7.
Now, you just have to quicksort and .
A sorted array:
int partition(int a[], int left, int right) {
int pivot = right--;
while (true) {
while (a[left] < a[pivot])
left++;
while (a[right] >= a[pivot] && left < right)
right--;
if (left >= right) break;
swap(a[left], a[right]);
}
swap(a[left], a[pivot]);
return left;
}
Since the last item was just a random pick, you would be just as likely to find the median at the end, the middle, whatever. Maybe your data is partially sorted, so why don't we try something closer to the middle?
int partitionMiddle(int a[], int left, int right) {
// Find physical middle of range
int pivot = (left + right)/2;
// Move this to the end
swap(a[pivot], a[right]);
// Go on as normal
partition(int a[], int left, int right);
}
Cost of partitioning elements: .
Advantages:
Disadvantages:
If it's worst case if you choose a bad split, then choosing a good split is really important. Any single choice could always be the worst one. However, it's too expensive to actually compute the best one (median).
Rather than compute the median, sample it! Pick three elements, take their medians. This is very likely to give you better performance. Sampling is a very powerful technique!
In divide and conquer, most sorts are "little". Reduce the cost of these. Insertion sort is faster than quicksort on small arrays. Bail out of quicksort when size < k. Either insertion sort each small array or use a single (fast!) insertion pass at the end.
What if many elements are equal?
Sorting algorithms that have a worst-case time:
Memory usage: