On this quantity of data, you wouldn't be able to do this manually.
If you hope to avoid being caught this way, I'm going to assume you noticed this without the benefit of hindsight and plan to correct all out-of-place Unicode characters automatically. How will you avoid over-correcting?
There's also no reason to believe this is the only fingerprinting Genius has done (they only need to publish the most obvious fail). For example, I can use the same fingerprinting technique but switch between American and British spellings.
I think if you had multiple corpuses of lyrics, you could cross-check for anomalies of any variety (odd quoting, switching between american/british english), etc.
The fingerprinting isn't likely applied to every song, to prevent obvious detection. If you went through multiple databases, you might see N prevailing copies of a song's lyrics, and 1 that seemed different. The one that's different has the anomaly.
If you hope to avoid being caught this way, I'm going to assume you noticed this without the benefit of hindsight and plan to correct all out-of-place Unicode characters automatically. How will you avoid over-correcting?
There's also no reason to believe this is the only fingerprinting Genius has done (they only need to publish the most obvious fail). For example, I can use the same fingerprinting technique but switch between American and British spellings.
This is not a straightforward problem.