Behavioral code analysis is a young discipline.
One of its sweet spots is to detect, prioritize, and manage technical debt based on data from how developers interact with the code.
In this article we put the technique to use on the React JavaScript library from Facebook to detect technical debt and suggest possible actions.
React is a popular JavaScript library for building user interfaces, and we are using it ourselves at Empear. As such, I’m familiar with React as a user and I like the programming model. React is an important step forward. However, I never looked under the hood at the actual code. Normally, this would be a daunting task given the ~130,000 lines of code in React. Not so with a behavioral code analysis; by building a hotspot map, we get a quick overview of the codebase and immediately know where the most interesting modules are. Let’s have a look at React using the CodeScene tool to automate the analyses. You can follow along in the interactive analysis.
Prioritized hotspots have high development activity and a declining code health.
The visualization shows that there is a cluster of hotspots in the react-reconciler
package. Zooming in on that package reveals a prioritized hotspot, ReactFiberBeginWork.js
, where most of the development activity has taken place over the past year. This dimension is used to prioritize; if a piece of code is worked on a lot, it’s likely to be important and have a high business impact. It’s a development hotspot.
However, just because something is a hotspot doesn’t mean it’s a problem. To dig deeper, CodeScene presents a Code Health score. The Code Health score is an aggregated metric that aims to identify code that is expensive to maintain and at risk for defects. The Code Health scale goes from 10
(best) down to 1
(code that’s likely to have severe maintenance issues).
When analyzing code, I tend to emphasize trends over absolute values. As such, I’m more interested in the direction a hotspot evolves – is it getting better or worse – than any specific metric in isolation. Along that line of reasoning, I would conclude that based on what we have seen so far, ReactFiberBeginWork.js
is likely to be a good refactoring candidate because:
ReactFiberBeginWork.js
is a hotspot, which means the organization works with the code regularly,ReactFiberBeginWork.js
is worked on, it’s code health declines. This is evident by the historic code health score in the preceding figure, which shows that over the past year the code health has declined from 4
down to 1
.
Now, before we dig deeper I’d like to share a related discovery. codescene.io has the React codebase as a public showcase analysis. That showcase analysis is based on a historic snapshot of React as it looked in 2018. This makes for an interesting comparison, so let’s look at the historic findings on the ReactFiberBeginWork.js
hotspot:
Historic analysis results for the current React hotspot.
Fascinating! The code seems to have been much simpler back in 2018, but the warning signs of a growing hotspot were all there. There’s not much we can do about that now, except following the old proverb: The best time to refactor a hotspot was last year. The next best time is today.
Now, let’s get back to the present day and see how to make that task actionable.
A hotspot map helps you build a mental model of what a codebase looks like. Within minutes, you have a visual representation of the system that provides context and guides explorations. A hotspot analysis helps you focus on the weak spots in the system that you are likely to end up in sooner or later during coding.
A behavioral code analysis also provides social data – such as knowledge maps – which makes it easy to identify the main developers behind a module. Use that data to find the right people for specific questions.
Based on the analysis so far, it looks like ReactFiberBeginWork.js
could benefit from refactorings in the short-term. A quick look at the code reveals that ReactFiberBeginWork.js
contains over 3,000 lines of code. So who would like to have that refactoring task? No one? Good, because large scale refactorings are rarely a good idea. Due to the high development activity, it’s likely that a refactoring on this scale would be left on a branch for months and, eventually, left to die unmerged. We need to do better.
To prioritize specific starting points for a refactoring, I use the X-Ray technique. An X-Ray analysis simply parses the hotspot into its functions and looks at where each commit hits over time. Sum them up, and you get hotspots on a function level. Here’s what the X-Ray of ReactFiberBeginWork.js
looks like:
The X-Ray analysis calculates hotspots on a function level inside the large React file-level hotspot.
The X-Ray results give us a starting point for strategic refactorings. In this case, the number one candidate seems to be the beginWork
function since it’s 1) overly complex, 2) we work on it a lot, and 3) as we do the function gets even more complex. It’s the same criteria we used to select file-level refactoring candidates.
At this point in the analysis we need to dig into the code, both to confirm the analysis findings but also to spot potential refactoring opportunities. The main advantage is that we are now on a level where we can act upon the findings and do a focused refactoring based on data from how we – as an organization – actually work with the code.
An example of the code and potential design issues inside the beginWork function in React.
Now I would be ready to dig into the code myself, or hand this refactoring task over to the team.
Prioritizing technical debt on a function level is a big win. It’s what makes the analyses actionable. Unfortunately, the function level isn’t always good enough, and React provides an excellent example. The JavaScript of today is no longer a simple language (if it ever was, I’m really not sure). Given the flexibility and wide array of design options, different pieces of JavaScript can look like completely different languages. There’s no common style, although some idioms tend to be more popular. One example is to use the current function as a module system, a wrapper. Let’s look at an example by X-Raying another React module, ReactChildFiber.js
:
X-Ray can prioritize local, nested JavaScript functions.
In this case, CodeScene goes that extra mile to identify local functions nested inside another function to highlight them in the X-Ray. And this is what I find so fascinating about behavioral code analysis: we can pick up a large codebase and prioritize the findings all the way down to small chunks that serve as specific refactoring targets. Since the data is based on how the organization actually works with the code, we are almost guaranteed to get a pay off when we improve our hotspots.
Many organizations require an assessment of open source software. These assessments typically focus on the license and ease of use, but rarely look under the hood. Is the code easy to maintain so that we can fix any bugs we detect? Are the main contributors still active so that we can expect new versions, or will we have to be prepared to step up and contribute ourselves? Aspects like that could have a large business impact, so why leave them to chance?
The code health decline that we identified in the top hotspot is a common finding; once a piece of code has grown more complex, the bar for refactoring it gets higher. Partly this is due to the frequent lack of visibility for code and, specifically, technical debt. A behavioral code analysis provides that much needed visibility. Using tools like CodeScene you can even create a safety net for supervising hotspots in a CI/CD pipeline or as part of pull requests. Read more about the CI/CD integration here.
The preceding analysis is done with the on-prem version of CodeScene, and the tool is also available as a free service at codescene.io.