If you read my articles here on ITworld, you’ve probably noticed that I like to write about academic research relating to the work of software developers. I’ll often share the results of empirical studies, surveys, and experiments that I feel our readers who work as developers would find interesting, such as ones that found why software builds fail, that your coding style is as unique as a fingerprint, and that happy programmers are better programmers. To that end, last week, I wrote about the results of a recently published experiment which concluded that refactoring doesn’t improve the quality of source code.
I thought (and still believe) that the study was worth writing about, being published research that would be of interest to developers. However, many readers, the ones who chose to comment, at least, begged to differ. Strongly. Many thoughtful reasons to question the research were shared. I thought I would take the time to share their main points of contention.
Many found the study, which was based on 20 students evaluating the effect refactoring 4,500 lines C# code, simply too small, both in terms of the number of participants and the code size, to draw any general conclusions.
“This is one study about one small C# application. Further studies are required before publishing ‘results.’” Decibel Places
“Four and a half thousand lines!? I've seen while loops that are longer than that.” Sarah Phillips
“There's nothing like using a very small sample to make a accurately sweeping generalization.” David Mott
This, of course, is an argument that can be made against many studies; it’s only one data point, the sample size is too small, etc. As with any research based on sampling, whether the results can be extrapolated to the general population is up for debate. The authors acknowledged the small sample size as an argument against their findings.
Some people questioned the abilities (or motives) of those doing the refactoring and the evaluations.
“A small sample space (one small application), lack of details about the refactoring done (experimenter bias if done by the study authors), and inexperienced evaluators (college students) makes this study completely worthless.” Jim Balter
“Refactoring is only effective when accomplished by a competent developer.” Slynk Adink
“Are we supposed to believe that the ENTIRE industry is wrong, based solely on a single data point, evaluated by *students* with zero practical real world experience?” GNU Guy
The researchers don’t make clear who actually did the refactoring, the student participants, the authors or someone else, so it’s hard to automatically assume they didn’t know what they were doing. The authors defend their choice of using students to do the evaluations by writing that existing research found that students have “comparable assessment ability” to professional developers. In my opinion, questioning whether students can properly evaluate code quality as well as professionals seems fair, but it doesn’t seem fair to dismiss their evaluations solely because they’re students.
Some commenters felt the approach the researchers used of first choosing the refactoring methods to use, then looking for places to apply them, is opposite to the way refactoring should work in the real world.
“Limiting which refactorings to apply makes no sense at all - good software development is a craft and documented refactorings are tools that developers can use - ‘Yes I know you'd normally use a wood chisel to make that joint, but we're only giving you a flat-head screwdriver - it's sharp though!’.” Kevin Roche
“These researchers worked back to front. They identified ten high impact refactorings, then looked for smells where the refactoring could be applied, then looked to see whether the refactoring made the code easier to understand and maintain. More experienced practitioners usually start by trying to understand code because they have to change it and use smells to identify why the code is hard to understand, or else they notice smells as they perform maintenance. Then, and only then, they refactor.“ Sarah Phillips
This is a strong argument against the study, as it indicates the methodology went against standard practice, which should bring the results into question.
Many commenters felt that the study authors misunderstood the whole point of refactoring in the first place, asked the wrong questions and applied the wrong metrics to measure its effectiveness.
“Refactoring doesn't fix problems with the code, it makes it more easy to comprehend as a whole and to increase confidence in changes you intend to make.” Siderite Zackwehdex
“First and foremost refactoring is NOT optimization! In fact in many cases they can end up being polar opposites. The point is to make it less likely for modifications to the code to *introduce* bugs. Properly refactored code should have fewer dependencies and side-effects. *That* is the point of refactoring.” Logan Murray
“Optimizations especially often obfuscate the code and make code harder to understand, the opposite of what a refactoring (in the best of worlds) does.” Antero Kärki
This argument, that the researchers’ hypotheses were invalid and that they didn’t use the correct metrics, is also good case against the study. Although, no specific alternative metrics were proposed by anyone.
For the most part, I think that you commenters brought up valid reasons to question the study conclusions that refactoring doesn’t improve code quality. The best reason provided, though, to dispute that notion, in my opinion, was simply that many had personally experienced the benefits of refactoring.
“I have 20 years of empirical evidence that proves refactoring's benefits. Like any good thing, too much of a good thing can be bad, and when used inappropriately it can be bad.” Joseph Anthony
“I daily see how code that has been refactor[ed] is so much easier to deal with than code written ‘a-la-carte’.” Frank Cedeno
“By experienced in over 20 years I know for sure that refactoring always gives better quality code, provided that it's correctly done.” Sudarshana Gurusinghe