Linear O(N) sorting with Radix Sort (2000)

saurik · on Feb 1, 2022

The bottom of this article links to an article by Michael Herf (of f.lux fame, but also having had developed Picassa) on his radix sort optimizations. I considered these two articles to be absolutely critical to my understanding of algorithms back 20 years ago, when I was finally transitioning to "more complex" development while working with/for Jake Cannell (who had worked under Herf and was paying close attention to Pierre). I came to believe that sorts that don't at least try to use don't use radix somewhere are somehow "missing the point" of algorithmic design.

A decade later, while working on Cydia--which has a complex problem of trying to provide efficient local operation of a federated "app store" on mobile devices, and was dealing with a need to sort tens of thousands of packages (and which had been very very slow with users getting increasingly angry at me refusing to centralized the service)--I ended up determining my biggest issue was the sorting process and turning to radix sort, using some experimentally-chosen offsets for radix passes, which I could use to "mostly sort" the data before switching to an insertion sort "fixup" pass (which could then use the fully-unicode-aware number-parsing magical comparator people are used to).

colmvp · on Feb 1, 2022

> In every decent programmer’s toolbox lies a strange weapon called a Radix Sort. Where does it come from ? Who invented it ? I don’t know.

Bit of a tangent, but at first this kind of caught me off guard because in this day and age it's easy to search for deeper info on established concepts and most articles would briefly have this answer (if the author cared to mention the history at all) but then I had to remind myself the post was created in 2000, before Wikipedia (Radix Sort was added to Wikipedia in 2001), explanatory YouTube videos, and tons of writeups made us into instant "experts" of every field.

OskarS · on Feb 1, 2022

An invaluable source for information like this (that did indeed exist in 2001) is The Art of Computer Programming by Donald Knuth. Knuth is extremely diligent in documenting the history of all algorithms he describes (arguably, TAoCP is the best source-book for the history of all computer science) and he has the following to say on the topic of Radix sort:

> The sorting method just described is not immediately obvious, and it isn't clear who first discovered the fact that it works so conveniently. A 19-page pamphlet entitled "The Inventory Simplified," published by the Tabulating Machines Company division of IBM in 1923, presented an interesting Digit Plan method for forming sums of products in their Electric Sorting Machine ... This punched-card tabulating method leads naturally to the discovery of least-significant-digit-first radix sorting, so it probably became known to the machine operators. The first published reference to this principle for sorting appears in L. J. Comrie's early discussion of punched-card equipent [Transactions of the Office Machinery Users' Assoc., Ltd (1929)]

> ...most people rejected the idea of radix sorting within a computer, until H.H. Seward['s master thesis in 1954] pointed out that... [<technical details omitted>]

Presumably, this is a method that has been discovered and rediscovered many times by many programmers and punch-card operators, but that H. H. Seward made it practical for digital computers in 1954.

PS. TAoCP is awesome. Even if you can't follow the more advanced math, just having this book on the shelf to look up stuff like this is lovely if you do this kind of thing professionally. Knuth is just a fantastic scholar and teacher. Just flipping through the things is a joy.

chasd00 · on Feb 1, 2022

i just looked TAoCP up on amz, the box set is $600. that's... pricey. sheesh

actually_a_dog · on Feb 1, 2022

I'm seeing used sets online for under $250. That seems well worth it to me, even if you only reference it once every couple years.

pfdietz · on Feb 1, 2022

Interestingly, radix sort can also lexicographically sort a collection of strings over a fixed alphabet in time O(sum of sizes of strings). It's not necessary to assume all the strings have the same length.

kevincox · on Feb 1, 2022

Exactly. Radix sort isn't really linear. It is O(N * M) where N is the size of the collection and M is the number of possible items in the collection. If the number of possible items is fixed (like fixed-size integers) then it is effectively linear. However you can do this with strings just the same way.

oriolid · on Feb 1, 2022

Why M? The whole point of radix sort is to do the comparisons binary digit by digit so that the total number becomes O(N * log(M)).

kevincox · on Feb 1, 2022

Oops, you are right. I was thinking the entropy/bits of the possibilities but accidentally wrote number.

__s · on Feb 1, 2022

You can also say comparison sorts are O(N * ln N * M) where N is the size of the collection & M is the cost of comparison (eg string length for comparing strings)

But instead we say it requires O(N * ln N) comparisons

adrianN · on Feb 1, 2022

If you read the actual proofs, they count comparisons, not operations, when they arrive at O(nlogn). More advanced material analyses algorithms on random access machines with some fixed word size for cases where you can't assume that comparisons are constant time operations, or specialize for the data type, e.g. numbers, where you sometimes don't need o(n log n) comparisons at all (like with radix sort).

pfdietz · on Feb 1, 2022

The model I usually consider is a RAM where the word size is ~ logn bits. This seems closest to real computers.

__s · on Feb 1, 2022

I find this article pretty insightful for thinking about algorithms that involve lots of memory: http://www.ilikebigbits.com/2014_04_21_myth_of_ram_1.html

Radim · on Feb 1, 2022

That article's discussion on HN (6 years ago!) was also quite interesting:

https://news.ycombinator.com/item?id=12383012

ccalloway · on Feb 1, 2022

M = string length and M = number of different possible values are very, very different. One is the exponential of the other.

contravariant · on Feb 1, 2022

Somehow the article posits you need to check if the list is already sorted to support multiple passes, but this version of radix sort is designed to be stable which is all you need.

And technically checking if the list is already sorted is mostly faster because of caching, in theory radix sort could read each byte exactly once.

wruza · on Feb 1, 2022

a radix is just a digit in a decimal number. For example the number «42» has two digits, or two radices, which are 4 and 2. In hexadecimal the radix is 8 bits wide. For example the hexadecimal number 0xAB has two radices, A and B

Possibly a typo, A and B are 4 bits wide each.

cb321 · on Feb 1, 2022

I would say the author is also confusing the words "digit" and "radix".

   From The Free On-line Dictionary of Computing (30 December 2018) [foldoc]:

   radix

       <mathematics> The ratio, R, between the weights of adjacent
       digits in {positional representation} of numbers.  The
       right-most digit has weight one, the digit to its left has
       weight R, the next R^2, R^3, etc.  The radix also determines
       the set of digits which is zero to R-1.  E.g. decimal (radix
       ten) uses 0-9 and each digit is worth ten times as much as you
       move left along the number.

       (2006-11-10)

Admittedly, "digit distribution sort" might be a more descriptive name for the algorithm.

lynndotpy · on Feb 1, 2022

The author recommends to use bucket-sort or traditional sorting for floats, but you can actually apply both to floats as well to get the guaranteed "linear" time.

Counting sort is similar and also a bit better, at O(k + n) rather than O(kn).

ableal · on Feb 1, 2022

Related, if you're shopping for a sorting algorithm: https://en.wikipedia.org/wiki/Bucket_sort

Radim · on Feb 1, 2022

There's C++ code included – great! But no actual timings :(

I'd be curious to see a deeper discussion of practical run-times, in actual seconds, relative to other popular sorting algorithms. Especially if driven by a real task with well-motivated problem constraints. (As opposed to "1000 random integers!" micro-benchmarks, or "N goes to infinity" theoretical big-Ohs.)

I personally find practical benchmarks the most exciting. They also help me identify different niches for different algos (caching, locality, array sizes, parallelization…).

vitorsr · on Feb 1, 2022

The following repository implements radix sorting algorithms “from the ground up” and includes some benchmarks:

https://github.com/eloj/radix-sorting#cpp-benchmark

bssrdf · on Feb 1, 2022

I implemented radix sort in cpython [0] and did some simple benchmarking comparing to python' built-in sort (Timsort I believe). It is indeed faster.

[0] https://github.com/bssrdf/RadixSortPy

Ono-Sendai · on Feb 1, 2022

Here are some perf measurements I made a while ago with a serial and a parallel radix sort: https://forwardscattering.org/post/34

charcircuit · on Feb 1, 2022

Are the listed applications actual performance wins over more common sorting algorithms?

LaserToy · on Feb 1, 2022

I needed to sort incoming stream of events by time and after some benchmarking ended up using modified version of a count sort, which had same runtime complexity as radix but was 2 times faster for my use case.