i.e. you known what the code is supposed to do then you test that it really does it.
Maybe if your code is "Hello World", but what if your code is a motion search, or a facial recognition algorithm, or a band-pass filter, or any other sort of real-world code whose operation can't be summed up exactly in two lines of code?
You write tests for it. Really, the complexity of the whole doesn't necessitate complexity of the parts. Hell, I'll go as far to say that a facial recognition algorithm that is a single piece of complex code is also a horribly bad piece of code. And yes, if you are writing that sort of code, it will be hard to test. Because it's bad code.
> or any other sort of real-world code whose operation can't be summed up exactly in two lines of code?
By this, I assume you mean real-world code whose operations consist of 10,000 lines of un-modularized spaghetti code.
>>By this, I assume you mean real-world code whose operations consist of 10,000 lines of un-modularized spaghetti code.
That is the same thing that came to my mind. Indeed, if you write code like that it will be really hard to write automated test cases, debug, maintain, etc. etc.
>>Maybe if your code is "Hello World", but what if your code is a motion search, or a facial recognition algorithm, or a band-pass filter, or any other sort of real-world code whose operation can't be summed up exactly in two lines of code?
Then you break the problem down as much as possible so that you know exactly what should be the expected output for the expected input. You have to keep breaking the problem down until it becomes obvious how to automate the tests. Otherwise it'll be really hard to write good, clean, modular code.
Then you run some samples through it and make sure it works correctly.
What if the exact meaning of "correctly" isn't defined?
Here's what happens when you blindly apply this approach to real-world applications: you end up with hardcoded "correct" md5sums that are "whatever the function happened to output last time, and we checked and thought it was right". I've seen this happen, repeatedly, with real-world apps, and it is useless at best and usually harmful.
For a huge span of real-world applications, there is NO RIGHT ANSWER. You can't blindly apply unit-testing to that in the same way you would for a web framework.
So I guess the question is, if the meaning of "correct" isn't defined, how did you arrive at code to do that thing? Are we talking neural networks here or what? Even those get tested, maybe not in an automated way, but you could at least make sure the results are better than your last iteration.
Take the facial recognition example. Give the whole thing a few images that it should fail on, and some it should pass. As you learn more about the system and where things fail or just don't add up, add tests there. Its likelier those parts of the code are bad too so refactoring isn't unheard of.
To a degree I completely agree with you that some things don't have a easily known answer in the real world. But adding tests to test known things like regressions, just seems like it should be a minimum of testing. I can't count the real world apps I've seen without a single "proof" that they don't regress state as things change. And they do, often and repeatedly. But dealing with external products and teams can be frustrating.
It seems to me that if there's no right answer, then whatever you do you can't possibly be wrong. That's pretty easy code to write (and make really fast at the same time).
I'm guessing there is a right answer, it just might be poorly specified. And automated or unit testing doesn't guarantee that even simple things are correct. It's just a trade-off of how much time you put into the automation versus the time you spend chasing bugs you could have caught with automation (which isn't all of them). I'd think in most real world code (for all the usual real world reasons) that people err on the too-little automated-testing side of things.
What do see as the problems with checking the md5sums? I'm guessing they're brittle if the answer isn't supposed to be exactly the same, but that reduces down to the same as checking a number is 5 when it might be between 4.8 and 5.2. It's not rocket science and getting it wrong doesn't invalidate all automated testing. At the very least the first time it throws an error the developer who wrote the bad test is going to learn something new about his system and you might sensibly use such cheaply written tests to locate errors when running code on a different OS (or version) or after minor cosmetic changes where you would expect the exact same result.
Maybe if your code is "Hello World", but what if your code is a motion search, or a facial recognition algorithm, or a band-pass filter, or any other sort of real-world code whose operation can't be summed up exactly in two lines of code?