The purpose of this module is to test the performance of the different sort (and related) algorithms in Julia. See https://github.com/kmsquire/SortPerf.jl/raw/master/sortperf.pdf for an example output from Version 0.3.0-prerelease+125.
Run with:
std_sort_tests(;sort_algs=SortPerf.sort_algs, # [InsertionSort, HeapSort, MergeSort,
# QuickSort, RadixSort, TimSort]
types=SortPerf.std_types, # [Int32, Int64, Int128, Float32, Float64, String]
range=6:20, # Array size 2^6 through 2^20, by powers of 2
replicates=3, #
lt::Function=isless, # \
by::Function=identity, # | sort(...) options
rev::Bool=false, # |
order::Ordering=Forward, # /
save::Bool=false, # create and save timing tsv and pdf plot
prefix="sortperf") # prefix for saved files
You can also test individual algorithms with
sortperf(Algorithm(s), data, [size,] [replicates=xxx])
Some examples:
sortperf(QuickSort, Int, 10_000) # Test QuickSort on 10,000 random ints
sortperf(MergeSort, [Float32, String], 6:2:10) # Test MergeSort on 2^6, 2^8, and 2^10 float 32s and strings
sortperf([QuickSort, MergeSort, TimSort], # Test QuickSort, MergeSort, and TimSort on
[Int, Float32, Float64, String], # Arrays of Int, Float32, Float64, and String
6:20; # ranging from 2^6 elements to 2^20 elements, by
replicates=5) # powers of 2, and run each test 5 times
Ordering parameters accepted by sort!(...) will be passed through.
The actual tests run include sorting arrays with the following characteristics:
- random
- sorted
- reversed
- sorted, but with 3 random exchanges
- sorted, with 10 random values appended
- 4 unique values
- all equal
- quicksort median killer: first half descending, second half ascending
The tests were inspired by similar tests used by sortperf in Python. See http://svn.python.org/projects/python/trunk/Objects/listsort.txt for more details.
Here is a table and some notes on the Julia implementations of the
various algorithms. The table indicates the recommended sort
algorithm for the given size (small < ~2^12 (=8,192) items < large)
and type (string, floating point, or integer) of data.
- Random means that the data is permuted randomly.
- Structured here means that the data contains partially sorted runs (such as when adding random data to an already sorted array).
- Few unique indicates that the data only contains a few unique values.
| (Un)stable (small) | Stable (small) | (Un)stable (large) | Stable (large) | In-place (large) | |
|---|---|---|---|---|---|
| Strings | |||||
| - Random | M | M | M | M | Q |
| - Structured | M | M | T | T | Q |
| - Few Unique | Q | M | Q | M | Q |
| Float64 | |||||
| - Random | Q | M | R | R | Q |
| - Structured | M | M | T | T | Q |
| - Few Unique | Q | M | Q | R | Q |
| Int64 | |||||
| - Random | Q | M | R | R | Q |
| - Structured | Q | M | uT | R/T | Q |
| - Few Unique | Q | M | R | R | Q |
Key:
| Symbol | Algorithm |
|---|---|
| H | HeapSort |
| I | InsertionSort |
| M | MergeSort |
| Q | QuickSort |
| T | TimSort |
| uT | TimSortUnstable |
| R | RadixSort |
-
Except for pathological cases, small arrays are sorted best with
QuickSort(unstable) or `MergeSort`` (stable) -
When sorting large arrays with sections of already-sorted data, use
TimSort. The only structured case it does not handle well is reverse-sorted data with large numbers of repeat elements. An unstable version ofTimSort(to be contributed to Julia soon) will handle this case -
For numerical data (Ints or Floats) without structure,
RadixSortis the best choice, except for 1) 128-bit values, or 2) 64-bit integers which span the full range of values. -
When memory is tight,
QuickSortis the best in-place algorithm. If there is concern about pathological cases, useHeapSort. All stable algorithms use additional memory, butTimSortis (probably) the most frugal. -
Composite types may behave differently. If sorting is important to your application, you should test the different algorithms on your own data. This package facilitates that.