I think repeated sequences in data only became relevant when there was large ima...

beagle3 · on Jan 18, 2021

Not at all. Shannon's work is incredibly profound, and deals with distortion of analog signals, channel capacity under different constraints, and many other things. He was very much aware and discussing repeated sequences in the 1952 class in which Huffman constructed his code and proved it is optimal. (According to lore, Shannon said "hey, Fano and I have this Shannon-Fano code, which we know is close but not optimal. Improve on it and you can skip the final")

Shannon's original paper, "A mathematical theory of communication" is very readable (especially discrete sources and channels; you need some calculus for the continuous sources and channel stuff), and yet touches on many things and has many insights. At least as late as 2000, one very-well-published prof at the uni I was attending told me that whenever he wants to break out to a new direction, he rereads it 5-10 times and realizes another thing that Shannon considered trivial and that was yet unexplored.

Also, there's basically no repeated sequences in large image data. JPEG, H264 and all their friends are all about compressible approximations that are still close enough, and which rarely if ever have repeated data.