Refactoring: Where Do I Start?
Unfortunately, I have no video of this old presentation, but I can offer the presentation notes that I’d distributed at the time.
Better Software Agile Development Practices 2007
Since Martin Fowler completed his now-classic work Refactoring: Improving the Design of Existing Code, few programming practices have been more effective—and more controversial—than refactoring. Refactoring is effective when you study and practice it diligently. It remains controversial because many development managers think developers should be adding features, not reworking old code. In spite of this, refactoring is a key skill to develop as rewriting costs more and is tolerated less. This article, presented as answers to common questions, accompanies the class taught at Better Software’s Agile Development Practices 2007 conference.
What is refactoring?
The act of refactoring is improving the design of existing code. It is meant as an alternative to rewriting code, and there are a number of important differences between the two activities.
- When refactoring, we maintain the behavior of the current system; whereas when rewriting, we might decide to change its behavior deliberately; or worse, we might accidentally change its behavior, introducing either incompatible changes or defects, both of which annoy users.
- When refactoring, we begin with the existing code and change it over time as needed; whereas when rewriting, we sometimes start from scratch, replacing large sections of code as needed.
- When refactoring, we can steadily improve existing code over time without disrupting the release schedule. We can release code, even in the middle of a refactoring. When rewriting, we often replace large sections of code at once, and we usually cannot release any part of a rewrite, because it has to replace an entire section of existing code at once.
- When refactoring, we can start with less up-front work, change direction when needed and stop part-way through if that’s appropriate. When rewriting, we usually have to plan the entire rewrite in advance, we need it to be proceed as expected or return from coding to planning, and we can’t stop until it’s complete.
- When refactoring, we often support parts of the old and new system at the same time; whereas, when rewriting, we usually flip a switch to move from the old system to the new. This makes us less aggressive in putting the new system in place, so that we must live with the old system (and its problems) in its entirety for longer.
Martin Fowler described refactoring as changing the structure of code without changing its behavior; but I believe that defiinition lacks a key element: improvement. I describe refactoring as the act of identifying a specifiic problem with code, envisioning how to improve it, then improving it in small, reversible steps. We use the metaphor of code smells to describe problems in code structure, and so refactoring consists of identifying code smells, deciding how to remove those smells, then systematically transforming the code so that it no longer smells.
We also use the term refactoring as a noun, to refer to a particular code transformation. For example, we might improve code by moving some repeated code into a method. We apply a transformation called “Extract Method”, in which we move the code into a new method, then invoke that method from the places that used to contain the repeated code. We call this transformation “a refactoring”, and so you’ll hear us talk about the “Extract Method refactoring” or the “Introduce Delegate refactoring”. These refactorings are usually reversible, so we can undo them if we need to, but they don’t always improve code. When I describe (the act of) refactoring, I could say that it consists of identifying code smells, then applying refactorings to improve the code.
How do I become good at refactoring? To refactor effectively, you must learn three things:
- How to identify code smells
- How to apply refactorings
- Which refactorings eliminate which code smells
The easiest, by far, is learning how to apply refactorings. Fowler’s book Refactoring includes a catalog of step-by-step recipes for each refactoring. You will typically learn each refactoring as you need to apply it, and IDEs can apply many of them automatically. Identifying code smells and choosing the appropriate refactorings are not techniques to learn once, but skills to develop over time and through practice. You have abundant opportunities to learn to identify code smells, as smelly code abounds. Also, as you practice refactoring, you learn to sense bigger and more complex code smells. You will even start to anticipate sequences of refactorings to apply, when one refactoring fixes this code smell, but leaves behind that other one. I started to learn by refactoring a few dozen classes to within an inch of their lives, and within six months, I had already developed quite an advanced “sense of smell”.
Why do I need to refactor?
If you could design entire systems perfectly the first time, you wouldn’t need to refactor. I don’t know how to do this, so I refactor. Even if you designed systems perfectly, no design can anticipate all future change. Even if your system meets all your users’ current needs, their needs will change and make parts of your system obsolete, incomplete or ineffective. I refactor systems to make room for new features, rather than let changing needs slowly consign systems to the scrap-heap.
Code rots. Refactoring refreshes code. The longer you wait to start refactoring your code, the more effort you will need to spend to maintain it over time, whether to fiix defects, add new features or replace it newer technology.
How do I refactor code I don’t know?
I get to know code by applying the simplest and least risky refactorings: renaming. The most common obstacle to getting to know code is reading it: classes, methods and variables have confusing or misleading names. The simple act of renaming them provides a window into the code, making it easier to understand. When presented with “unreadable code”, I can often understand it within minutes, merely by renaming a handful of variables. This simple act begins to give me confiidence in making greater changes. After improving names, further changes carry the risk of changing behavior in unpredictable ways. At this point, without tests, further refactoring could do more harm than good; so I look for a set of tests I can write that I believe will cover the code I intend to change. In the process, one of two things tends to happen: either I gain confiidence in changing the code, then start refactoring it further; or I reach a point where I can’t see the impact of a change and can’t tell where I need more tests. In the fiirst case, I forge ahead and refactor; but in the second, I need to add some protective layering to increase the safety of the changes I intend to make.
When I don’t know the impact of a change on the rest of the system, I look for ways to make smaller, less risky changes that create a warm, dry place where I can work with more confiidence. This often consists of introducing interfaces at the boundary of the code I want to change. Extracting an interface is among the safest structural changes one can make to a code base, so I tend to apply that refactoring aggressively. By doing this, I clearly identify the boundaries around the code I want to change, and that alone helps me understand the impact of any change I might make. Often I can then narrow the scope of the change, which reduces the likelihood of creating wide-ranging problems. Michael Feathers’ book Working Effectively with Legacy Code details techniques for rescuing code that your team or company fears changing. Most of his techniques describe how to get to know code better.
Why should I refactor when I can just use design patterns?
I find design patterns immensely useful in my work; however, they often seduce programmers into thinking that if they could simply choose the right patterns, then they wouldn’t have to think about design. When I discovered that design patterns don’t solve my design problems for me, I learned how to integrate my knowledge of design patterns with my growing knowledge of refactoring. Joshua Kerievsky’s book Refactoring to Patterns describes how to refactor towards, away from and around common design patterns, all based around real-life examples. This book exemplifies a healthy relationship between design patterns and refactoring: rather than seeing these two as an either/or proposition, it shows how an understanding of design patterns can both interfere with and guide refactoring efforts. Once you have mastered the basics of refactoring, you will begin to develop skills related to judging when to move towards a pattern, away from pattern or to steer clear from a patter in the first place.
How do I know I’m improving the design?
I like to categorize ideas simply to make it easier to remember them. When I read Kent Beck’s rules of simple design, I immediately took to them and have successfully used them as the basis of all my decisions about design. When I refactor, I use these rules to guide my judgment about what constitutes good design. A simple design exhibits these properties:
- It passes all the tests
- It minimizes duplication in all its forms
- It expresses its intent clearly to the reader
- It removes unnecessary elements
I apply these rules in order, from top to bottom. This means that I add code if it expresses intent more clearly or removes duplication. It further means that if the code does not pass its tests, then discussions of good or poor design mean little. I believe we can reduce all our ideas about good design to following these four rules, and so I call them the elements of simple design.
These elements present interesting questions. For example, what if there are no tests? I treat untested code as expendable, removed as needed. This does not mean that I would remove entire systems because they are untested, but rather that I do not hesitate to test-drive replacement code if that provides more value than rescuing the existing untested code. Even when refactoring, I do rewrite code, but in much smaller sections at a time, so as to limit the risks of rewriting code. If it only takes 30 minutes to rewrite an untested method, and I feel confident the next release is more than a few hours away, then I will often test-drive a drop-in replacement for the method. We call this the “Replace algorithm” refactoring, and done at a small scale, it is effective.
I encourage you to reduce all your design thinking to the four elements of simple design and observe how it changes the way you think about design. I found it improved my design sense, and believe it will do the same for you.
What can go wrong when I refactor?
I refactor to reduce risk, but when I refactor poorly, I sometimes increase risk instead. This means I need to remain aware of what can go wrong when I refactor.
- I might not have enough tests, in which case I might introduce defects
- I might apply narrow-minded refactorings that I need to undo later
- I might try to refactor before I understand what the code needs to do
- I might see unrelated problems while refactoring, then start fixing those, rather than dealing with the problem at hand
- I might stumble upon a defect and fix it
- I might stumble upon a small, missing feature and add it
- I might crawl down a rat-hole and not see how to back out of it
- I might refactor more than needed for the current task
I mitigate some of these risks through technique or rules I follow. When refactoring, I do not allow myself to add features or fix defects. When I notice myself doing this, I undo changes until I return to where I stopped refactoring, then make a note about the feature or defect to schedule later. If I notice a defect while refactoring, and can fix the defect in only a few minutes, I reach a stopping point in my refactoring, check in my changes, then test-drive a fix for the defect, check in those changes, then return to refactoring. My reason is simple: refactoring and writing new code demand different approaches, so I prefer not to mix the two. I might switch back and forth quickly, I no longer allow myself to do both at the same time. This causes me fewer problems by distracting me less often and leading me down fewer rat-holes. Rather than detail examples for each problem, I invite you to keep these potential pitfalls in mind while you learn to refactor and note when you encounter them. What solutions will you devise?
There exist other, team-based problems you might encounter while refactoring.
- You and another person might engage in a refactoring tug of war: you apply a refactoring, then someone else applies the inverse refactoring, and so on
- You might find yourself the only person refactoring
- You might find others perceive you as going more slowly because you refactor
- You might make others look bad because refactoring helps you complete tasks more quickly than they do
- Another person might destroy a design you have painstakingly refactored
- You might find others perceive you as a careless designer because you refactor, rather than get it “right” the first time
I have not been able to solve these problems with techniques or rules. People problems rarely yield to technical solutions. Instead, I recommend you learn more about the people around you, including their needs and motivations. You won’t alleviate all their fears, but you can probably resolve some of the simpler ones, by which you can gain trust to deal with the more complex ones.
When should I not refactor?
I like to split this question in two: when should I rewrite instead or refactor? and when should I just leave this code alone?
To answer the first question, I refactor unless I am quite certain that I can rewrite the code safely, which means quickly enough time, with an acceptable level of correctness, feel confident that clients of the code can handle the change and believe that rewriting would take less effort or time than refactoring. My choice to rewrite has always depended on the testability and maintainability of the existing code, and never on the magnitude of the change or the complexity of the intended behavior. To refactor, I must have tests, and if test-driving code from scratch seems less expensive than adding the necessary tests to start refactoring, I often start by test-driving small sections of replacement code. After replacing a few sections of code, I often reach the point where I can begin refactoring. I am much less likely to misunderstand what code does when I write tests for it, compared to test-driving replacement code. This explains why I refactor except when rewriting appears far better.
Sometimes, you just need to leave code alone. When I teach refactoring, I commonly find people who want to refactor their entire code base to “get it right” before adding more features. I have never seen this approach work, simply because I have not seen a project that can afford to spend months without shipping features. Not only that, refactoring without a purpose does not necessarily improve code: without a specific goal in mind, how can you know whether a change improves the situation? I advise these people not to refactor without a purpose. More generally, only change code when you have a reason to do it. This advice does not mean “If it ain’t broke, don’t fix it.” On the contrary, your design is broke and you must fix it. Still, if you don’t have a specific reason to change a section of code, then why change it? If you don’t need to fix a defect or add a new feature in the area, then don’t refactor that code. If you need to fix a defect, sometimes refactoring the code in question makes the offending lines of code obvious. Sometimes, refactoring can remove the place where a defect exists! This explains why I routinely refactor code that exhibits a defect. If you need to add a feature, you can refactor code in the neighborhood to make room for the new feature. Done well, the new feature simply clicks into place. If you want to spend time just practicing your refactoring skills, then by all means, refactor some code. You might not check in the corresponding changes, but even spending two hours per week practicing can help you develop your refactoring skills much sooner. Some teams even practice refactoring together every Wednesday afternoon. Without a specific goal for what to learn, add or improve, don’t refactor.
I’d like to refactor, but my boss thinks it’s a waste of time. What should I do?
Simply put, if your boss writes code with you, then you need to respect her opinion about how to write code; but if she doesn’t write code with you, then feel free to ignore it. If you have a good relationship with your boss, then consider politely pointing this out: we write the code, so we should decide how to do it, and we strongly believe that without refactoring, we will simply slow down more and more over time until eventually we stop producing new features. In most cases, inattention to design has contributed significantly to your current schedule problems.
Pay strict attention to how much you refactor, and ensure that you only refactor with a purpose. This will reduce the amount of time you appear to waste refactoring. Also, focus your attention on reducing direct dependencies on your code. The less the outside world depends on the details of your design and implementation, the more freely (and inexpensively) you can refactor. Finally, measure the difference. If your boss believes that refactoring slows you down, measure your progress as you refactor. Measure such things has defect rates, cycle time (the time from when a feature request arrives to when you are ready to ship it) and feature implement time (the time from when you start a feature to when you are ready to move it along the conveyor belt, such as to the testing team). If these measures improve, then you can provide compelling evidence to your boss that refactoring is a good idea and should continue. If these measure don’t improve, then you have identified areas you need to improve. In either case, you receive valuable feedback.
If, in spite of evidence that refactoring helps your team, your boss insists that you stop, then you might consider alternate employment opportunities.
Reading List
Martin Fowler, Refactoring: Improving the Design of Existing Code. (Addison-Wesley Professional, 1999; ISBN 0201485672; ISBN–13 978–0201485677.)
Joshua Kerievsky, Refactoring to Patterns. (Addison-Wesley Professional, 2004; ISBN 0321213351; ISBN- 13 978–0321213358.)
William Wake, Refactoring Workbook. (Addison-Wesley Professional, 2003; ISBN 0321109295; ISBN–13 978–0321109293.)
Patrick Lencioni, The Five Dysfunctions of a Team. (Jossey-Bass, 2002; ISBN 0787960756; ISBN–13978- 0787960759.)
Scott Ambler and Pramod Sadalage, Refactoring Databases. (Addison-Wesley Professional, 2006; ISBN 0321293533; ISBN–13 978–0321293534.)
Ron Jeffries, “Emergent Design”. Includes a statement of Kent Beck’s rules of simple design and an alternative set of rules from Alan Shalloway.