The bottom of this article links to an article by Michael Herf (of f.lux fame, but also having had developed Picassa) on his radix sort optimizations. I considered these two articles to be absolutely critical to my understanding of algorithms back 20 years ago, when I was finally transitioning to "more complex" development while working with/for Jake Cannell (who had worked under Herf and was paying close attention to Pierre). I came to believe that sorts that don't at least try to use don't use radix somewhere are somehow "missing the point" of algorithmic design.
A decade later, while working on Cydia--which has a complex problem of trying to provide efficient local operation of a federated "app store" on mobile devices, and was dealing with a need to sort tens of thousands of packages (and which had been very very slow with users getting increasingly angry at me refusing to centralized the service)--I ended up determining my biggest issue was the sorting process and turning to radix sort, using some experimentally-chosen offsets for radix passes, which I could use to "mostly sort" the data before switching to an insertion sort "fixup" pass (which could then use the fully-unicode-aware number-parsing magical comparator people are used to).
> In every decent programmer’s toolbox lies a strange weapon called a Radix Sort. Where does it come from ? Who invented it ? I don’t know.
Bit of a tangent, but at first this kind of caught me off guard because in this day and age it's easy to search for deeper info on established concepts and most articles would briefly have this answer (if the author cared to mention the history at all) but then I had to remind myself the post was created in 2000, before Wikipedia (Radix Sort was added to Wikipedia in 2001), explanatory YouTube videos, and tons of writeups made us into instant "experts" of every field.
An invaluable source for information like this (that did indeed exist in 2001) is The Art of Computer Programming by Donald Knuth. Knuth is extremely diligent in documenting the history of all algorithms he describes (arguably, TAoCP is the best source-book for the history of all computer science) and he has the following to say on the topic of Radix sort:
> The sorting method just described is not immediately obvious, and it isn't clear who first discovered the fact that it works so conveniently. A 19-page pamphlet entitled "The Inventory Simplified," published by the Tabulating Machines Company division of IBM in 1923, presented an interesting Digit Plan method for forming sums of products in their Electric Sorting Machine ... This punched-card tabulating method leads naturally to the discovery of least-significant-digit-first radix sorting, so it probably became known to the machine operators. The first published reference to this principle for sorting appears in L. J. Comrie's early discussion of punched-card equipent [Transactions of the Office Machinery Users' Assoc., Ltd (1929)]
> ...most people rejected the idea of radix sorting within a computer, until H.H. Seward['s master thesis in 1954] pointed out that... [<technical details omitted>]
Presumably, this is a method that has been discovered and rediscovered many times by many programmers and punch-card operators, but that H. H. Seward made it practical for digital computers in 1954.
PS. TAoCP is awesome. Even if you can't follow the more advanced math, just having this book on the shelf to look up stuff like this is lovely if you do this kind of thing professionally. Knuth is just a fantastic scholar and teacher. Just flipping through the things is a joy.
Interestingly, radix sort can also lexicographically sort a collection of strings over a fixed alphabet in time O(sum of sizes of strings). It's not necessary to assume all the strings have the same length.
Exactly. Radix sort isn't really linear. It is O(N * M) where N is the size of the collection and M is the number of possible items in the collection. If the number of possible items is fixed (like fixed-size integers) then it is effectively linear. However you can do this with strings just the same way.
You can also say comparison sorts are O(N * ln N * M) where N is the size of the collection & M is the cost of comparison (eg string length for comparing strings)
But instead we say it requires O(N * ln N) comparisons
If you read the actual proofs, they count comparisons, not operations, when they arrive at O(nlogn). More advanced material analyses algorithms on random access machines with some fixed word size for cases where you can't assume that comparisons are constant time operations, or specialize for the data type, e.g. numbers, where you sometimes don't need o(n log n) comparisons at all (like with radix sort).
Somehow the article posits you need to check if the list is already sorted to support multiple passes, but this version of radix sort is designed to be stable which is all you need.
And technically checking if the list is already sorted is mostly faster because of caching, in theory radix sort could read each byte exactly once.
a radix is just a digit in a decimal number. For example the number «42» has two digits, or two radices, which are 4 and 2. In hexadecimal the radix is 8 bits wide. For example the hexadecimal number 0xAB has two radices, A and B
I would say the author is also confusing the words "digit" and "radix".
From The Free On-line Dictionary of Computing (30 December 2018) [foldoc]:
radix
<mathematics> The ratio, R, between the weights of adjacent
digits in {positional representation} of numbers. The
right-most digit has weight one, the digit to its left has
weight R, the next R^2, R^3, etc. The radix also determines
the set of digits which is zero to R-1. E.g. decimal (radix
ten) uses 0-9 and each digit is worth ten times as much as you
move left along the number.
(2006-11-10)
Admittedly, "digit distribution sort" might be a more descriptive name for the algorithm.
The author recommends to use bucket-sort or traditional sorting for floats, but you can actually apply both to floats as well to get the guaranteed "linear" time.
Counting sort is similar and also a bit better, at O(k + n) rather than O(kn).
There's C++ code included – great! But no actual timings :(
I'd be curious to see a deeper discussion of practical run-times, in actual seconds, relative to other popular sorting algorithms. Especially if driven by a real task with well-motivated problem constraints. (As opposed to "1000 random integers!" micro-benchmarks, or "N goes to infinity" theoretical big-Ohs.)
I personally find practical benchmarks the most exciting. They also help me identify different niches for different algos (caching, locality, array sizes, parallelization…).
I needed to sort incoming stream of events by time and after some benchmarking ended up using modified version of a count sort, which had same runtime complexity as radix but was 2 times faster for my use case.
A decade later, while working on Cydia--which has a complex problem of trying to provide efficient local operation of a federated "app store" on mobile devices, and was dealing with a need to sort tens of thousands of packages (and which had been very very slow with users getting increasingly angry at me refusing to centralized the service)--I ended up determining my biggest issue was the sorting process and turning to radix sort, using some experimentally-chosen offsets for radix passes, which I could use to "mostly sort" the data before switching to an insertion sort "fixup" pass (which could then use the fully-unicode-aware number-parsing magical comparator people are used to).