Contagion is a really great term. I've seen my poor abstractions be replicated by others on my team, to my horror -- "don't they see why I did that in this particular case, and not in this other case?" Of course, that's entirely, 100% my fault. I picked a poor abstraction, I put it in the code, I didn't document it well enough, and of COURSE other programmers are going to look to it when solving similar problems. They should!
That said... Sometimes I spend a bunch of time finding the right abstraction for a feature that we end up not expanding. And then it feels bad that I spent all this extra time coming up with the "right" solution, instead of just hacking out something that works. Hmm...
One team I was part of kept a separate backlog of technical debt and experiments. It was nice to have a place to say, "in 30 days, look at this hacky thing and see if it's worth making better". Or, "I noticed this is a mess, here's how I might clean it up." We'd occasionally talk over the backlog and prioritize it, which helped communicate both the general make-things-better spirit and specific issues like you mention. I really liked it.
One thing that made it work is that we worked on it in small slices all the time, without involving the product manager. It was still visible, so there'd be the occasional question, but as long as we kept delivering user value, nobody worried to much about our mysterious code concerns.
> I'm really pissed that technical debt is considered as "Hey the dev guys are complaining again".
That's because it's very untransparent to anyone other than the engineers working on a project.
I've had a limited amount of success by making this more transparent. Signaling every time a feature will take longer because of a piece of technical debt the team wants to fix caused the fix to get priority before implementing the 4th and 5th feature affected.
Don't the bean counters at Ford Motor Company (for example) nark on the assembly line workers and industrial engineers and QA/QC folks have work pile up, broken machines lying around, uncleaned trash?
It's risk/reward to the people who want to decide how their money is spent, isn't it?
In your example, the worst-case scenario is that someone could die, and that tends to spur on investors to discover the probity within themselves to spend some money avoiding an expensive lawsuit.
But when the devs are complaining about the old code being terrible and making their lives hard, it never seems to hinder them that much to management. They keep banging out new features and fixing bugs, and nothing bad seems to happen. But the drip-drip-drip of bugs keeps increasing, and the new features take a little longer each time, and nobody dies at least, but the thing becomes a haunted moneypit that nobody wants to touch, and you're stuck with it now unless you rewrite it all at huge expense, etc., etc.
Maybe everyone should just treat a piece of software as they would a life. I bet we've all seen some codebases where if it were a friend, you probably would have staged an intervention by now. Your software baby needs absolute care from the get-go until the very end, or it will get sick and probably die, and most likely in a very prolonged and painful way.
The place I used to work in has been hiring (junior) people like crazy. Part of the reason they need so many is the crushing foundational technical debt at the core. When they hired someone to capable of improving that they were unable to merge the changes due to fear, and the management couldn't see the business value of doing so. They've had a few nasty outages recently too. I believe the insides of the Atlassian kit are similarly riddled with technical debt.
An important difference being that in your Ford example, you can just throw new people at the problem while in software it generally needs to be handled in the responsible team.
I’ve found it helps to metric “how fast does it take to get a thing of x size done” - if you can measure the results of your improvement (like how fast it takes to get a new design implemented) it’s an easier sell. Eventually, it becomes known throughout the company that things are going faster, regardless of the metric results. Of course if you go around making high risk changes for low reward they’ll see the artifacts of increased bugs and less system reliability.
I've had some success building technical debt into my estimates. While I'm working on a new feature or a bug, I'll tidy up in the area around it - the tidying up is just part of the work necessary to complete the task.
The really cool thing is that eventually you're able to deliver large, complex tasks in very brief times and then spring on the PM/management that you're able to do this _because_ you've been refactoring. That's made a believer out of at least one of my PMs.
Obviously this doesn't work in all circumstances - it's not always feasible to get the really systemic, contagious debt cleaned up as part of feature work, and if the PM catches on then it makes this tactic difficult to continue.
The bigger obstacle I've had, though, is other developers who haven't fully bought into a culture of continuous improvement. Fear of breakages causes refactor paralysis, which makes it easier to break things when working on them, which increases fear, and so forth. I'm not really sure the best way to deal with that aside from adding a bunch of unit tests (which I still sometimes get pushback on)
In this case, a JavaScript front-end that had no unit tests previously. I also wasn't able to get NPM through the firewall, so I used Jasmine standalone and kept its files and copies of our third-party framework files in a "Frameworks" folder within a separate "Test" folder
The pushback I received was that keeping the framework code in Source Control would result in it being caught in the JS build/minification script, as well as my spec files. The individual that pushed back was also concerned about JS exceptions since we were up against a release, which speaks to a need for training about how unit test files work. Ultimately I .gitignored the framework folder but wouldn't budge on leaving the test files in, since .gitignoring unit tests defeats the purpose. Then I learned that the build script wouldn't grab those files anyway. :)
My boss at my last job had the mind set of "refactoring only makes it different, not better". I asked him if I could spend some time refactoring our build system. He said no. I eventually did it anyway a few months later, spotted a bug due to the changes, and all of a sudden, build times were cut in half or in 10 in many instances.
Same story for a pretty nasty hunk of code we had for handling sparse arrays. Asked if I could refactor, got told no, did it anyway a while later, and all of a sudden a problem that had been considered borderline infeasible takes like 1 day of work.
Refactoring isn't always the right decision, a good boss/lead needs to carefully weigh the pros & cons.
There is always some risk that refactoring makes code not only different, but worse. Corner-cases are often there for a reason, and refactoring sometimes misses them, especially when there isn't complete unit test coverage. Since it's often easier to get the core logic right, this likely leads to issues that are discovered in production.
There is no "right" thing. There may be an optimal thing from the development perspective and an optimal thing from the business perspective. Since the two pieces cannot exist without each other, both parties have to communicate effectively and trust each other to find the optimal decision for the combined problem space which may be sub-optimal when considered separately.
> If boss can't trust the minions to do the right thing, someone's got the wrong job.
There are many people who have wrong jobs.
More importantly, there are many people who are good, but not perfect. They do some aspects of their work greatly, other aspect less well. Good boss has some idea about that and is able to work with people who are not super great.
Least not last, even very good people often disagree about many things, including whether refactoring is needed or not or what kind of refactoring to do. Even if boss trusted all and listened all, he would still be told plenty of contradictory opinions.
The only time my improvements have even been noticed is the pointy haired boss said "Well, you should have thought of that sooner. What am I paying you for?"
One of the things that I almost always insist on in a dev/PM feedback cycle is the concept of "chores." The Devs (usually via eng lead) get to schedule chores in the backlog, full stop. PM can have a convo with eng lead to say "hey, will this chore take a super long time? Can you possibly reschedule it?" but if it's work that is a pure refactor (no product implications) the PM doesn't get to block it, period.
Of course, this only works well on teams where your PM and eng lead don't have a fundamentally adversarial relationship. I like to think this is most teams but does take some getting used to in terms of eng lead and PM communicating priorities and needs, between product moment and code quality.
That's how I do it. It took some time to build up the trust relationship, but most of the time, our stakeholders and me can keep a good balance of maintenance and features. And this balance doesn't have to be rigid. I want my maintenance tasks done, but it's fine to prioritize deliverables for a sprint or two - we'll have a sprint or two of maintenance then. And that might be fine, or even beneficial, because then you have a bigger block of time to do some bigger cleanup tasks.
On most of my projects we have enlightened PMs who make allowances for paying down tech debt. For example, on my most recent rotation (an RoR app front-end to manage cloud orchestration software), the PM and tech lead worked out an arrangement where, for the four weeks following "feature freeze", half the dev time was spent paying down tech debt and other chores (the other half was spent fixing bugs).
Yeah, trusting developers to use their time wisely given a high-level alignment on the big goals can be very powerful. One of our struggles on the individual level is the uncertainty of "is this the little feature that will take the champion from good to great?" that leads to slow and steady feature creep. It's tough to weigh those against tech debt cleanup even though we have the autonomy to work on "mysterious code concerns" when we choose to.
I would like to suggest that there is a fourth dimension that might be called 'interest' as we are using a debt analogy - the tendency for the cost to increase over the time elapsed since the debt was incurred.
When an item of debt is first created, the people making it are often well aware of what they have done and are therefore in a relatively good position to fix it, but that knowledge quickly dissipates, to the point where it is often forgotten that there is a specific issue there. Furthermore, there is a tendency for it to be made less obvious as further changes are layered on top and around (this is distinct from contagion, as it can occur if the later changes are themselves debt-free, or at least independent of the decisions that created the debt and their consequences.)
One place I worked addresses this by having mandatory post-deploy monitoring / patch day. We’d all do a deploy and keep an ear to support / logs while going ahead and improving things we knew needed a little clean up. If we saw anything come in from the release, we fixed it immediately.
An entire day is excessive in a CD setup, but for a two week release cycle it worked well. Kept the rough edges out of customer view very well.
The top comment under the articles uses the hight of the interest rate to describe the level of contagion http://disq.us/p/1ros2o9
'tl;dr "contagion" is the most important attribute because its properties are similar to interest rates. Having a small loan (small impact/fix cost) but high interest rate (high contagion) can quickly dwarf large loan small interest rate.'
I found contagion to be a great clarifying concept too; it's something that I've been looking at in my codebase as the team expands.
My gut feel is that it's not necessarily about what you write in the first place, but what you refactor -- sometimes you can get away with a gradual replacement strategy (like std::string => AString from the article), but if the original pattern is contagious and bad, then you might have to take a more aggressive one-shot refactoring approach.
I've definitely seen this where a localized refactor is made to try to find a better way of doing something, we decide that we like the new way, and then don't find the time to replace the rest of the usages, resulting in a confusing state of affairs where you need to know which is the "blessed"/"correct" way of doing things.
I think that "contagion" is a good lens to use when assessing what the refactoring strategy should be for a given change to the codebase.
I really enjoyed the Lava Layer antipattern for incremental refactors that never complete. Having learned to recognize it, I think I'm more aware of the cost/benefit of introducing a new pattern, even if it's better in some way.
That article has changed my behavior in some places as well. Sometimes it's indeed better to sit down and replace the entire old solution, instead of going incrementally. It's a bigger immediate pain, but less following pain.
I've also seen bad pattern replication, and had a difficult time explaining to other teams why it was a problem.
I used to write a lot of app-wide Javascript at a previous job that would get consumed by multiple teams. If I didn't encapsulate something well enough or if I left a private open, I'd later find a code review with someone exploiting it.
The worst offender was a team that once used the prototype of a shared class as a mixin, duplicated/mocked just enough of my implementation logic to get three or four methods working, and then left it at that. Of course, the next time I changed any of my code, even in the constructor, their page broke.
My experience has been that when other teams see these patterns, they see a single page or feature that's working at the moment and assume "this must be fine." They don't see the three or four frantic show-stopping bugs that got logged last month.
When I would confront teams about this, often the response that I would get was "Well, if it's good enough as a quick fix for them, why can't we do the same thing? Why are we the only team that has to fix this?"
Of course, when teams don't want to be the first one to break from a bad pattern, the end result is that nobody changes anything.
I have found the closer I am to the product and the clients that will be affected, and the more thoroughly I understand the usecase from the client’s perspective, the better I am at understanding how much effort to spend on “getting it right” in this way. Still wrong sometimes though!
Contagion is why I want a VCS tool that allows me to keep code review comments with the code. Just because someone senior did something bad two years ago doesn’t mean you have carte Blanche to make new code that behaves the same way!
Interesting how you point to a slightly different kind of contagion in replicating code patterns. While the article seems to discuss the kind that is inevitably forced on whoever depend on the code.
I have implemented a bunch of things that, while helpful short term, had clumky hacks to make up for either lack of tooling, or due to time constraints. And then the solutions get replicated verbatim, because "they work". The more time passes, the worse they become.
Contagion is a really great term. I've seen my poor abstractions be replicated by others on my team, to my horror -- "don't they see why I did that in this particular case, and not in this other case?" Of course, that's entirely, 100% my fault. I picked a poor abstraction, I put it in the code, I didn't document it well enough, and of COURSE other programmers are going to look to it when solving similar problems. They should!
That said... Sometimes I spend a bunch of time finding the right abstraction for a feature that we end up not expanding. And then it feels bad that I spent all this extra time coming up with the "right" solution, instead of just hacking out something that works. Hmm...