Pursuing Great Code Quality
Things I wished I knew when our tech organization scaled out
2017. Night. It was raining hard outside.
Okay, the night and the raining don’t really add anything to this story. But at that time, I was part of a team assigned to start a new experimental business model. The CTO wanted our team’s architecture to be independent from the main stacks, because one day our team will be separated into a spin-off company.
The rain continued. It was close to midnight. Bats, ghosts, and all the dark creatures of the night were unleashed.
We started a Ruby on Rails monolith. There were no rules. Everyone pushed directly to the main branch. No domain-driven design. No code reviews. Bugs on production were tolerable, and expected. There was only one goal: to ship fast, in order not to lose market momentum.
Two years later, the team grew five times bigger and splitted into six smaller teams. It’s getting harder and harder to add more features. We started pushing new features into new microservices, and strangler-figging some domains from the monolith into separate services. But it’s too late. Our software entropy is just too high.
Over the years, I’ve been thinking about this. As I learned more, I think these are the steps and principles that should be in place to ensure great code quality starting day zero.
1. Pull Request, or Pair-Programming
Either enforcing pull request (PR, or merge request/MR) system with mandatory reviewer approval and definitely disabling push-to-main-branch directly; or implementing pair-programming. Because pairing means there’s always a second set of eyes who reviews the code constantly.
But I think in a full-remote work setting, pair-programming means opening a persistent Zoom call all day long, which could be tiring for some people. So nowadays, I’m inclined towards the asynchronous PR system.
2. Automated Check
All the mechanical checks should be performed automatically using various tools through Git repository CI/CD pipelines, thus freeing the mental burden for the code reviewers, and also preventing human error, while ensuring standards and compliances.
All the tools mentioned below are from my past experiences, not an exhaustive list, and some functionalities might overlap with each other.
To get PRs that fit our organization culture, a tool like Danger provides customizable ways to enforce team conventions:
- Enforcing sufficient PR title and description. E.g. Imposing minimum characters or words. Naked PR is hard to understand. And more and more engineering challenges stem from communication problems like this.
- Forcing small-sized PR. E.g. Limiting changed files to 7 at max. Because small PR, with single responsibility (i.e. single change) is easier and faster to review.
- Checking whether a JIRA ticket key is added to the PR title. LOL, but it helps in tracking past changes.
Syntax and typo errors are easy to catch in compiled languages, but not so in interpreted ones. So, here comes linter: hadolint, golangci-lint (double protection), eslint, and my favorite one Rubocop, to enforce code correctness.
This is a comprehensive list of all the good linters: https://github.com/caramelomartins/awesome-linters.
2.3 Code Quality Metrics
Some linters like Rubocop go further to check code quality metrics (and best practice advisors). We can limit things like function, module, and class sizes, line width, parameters length, and many more to enforce modularity. It also offers opinionated advice like guard-clause, constant naming, memoization, and many more where they’re applicable.
Code Climate and SonarQube provide duplication rate, cyclomatic and cognitive complexity metrics, which should push further towards better maintainability. As a bonus, they provide aerial view dashboards to show the overall code qualities across our multiple services.
Some of Code Climate and SonarQube plugins are covering security vulnerabilities checks as well, doing all the SAST and DAST analysis, enabling what people call DevSecOps.
2.5 Unit Test Code Coverage
SonarQube and SimpleCov + CodeCov track unit test coverage in terms of percentage. They also show which lines of codes are not being tested, which points out where we should add more tests — this is also achievable by using some IDE plugins though.
The idea is, a single system repository should be covered by 80%, err I mean 100% unit tests, and a new PR should never reduce that coverage.
2.6 Build Breaker
If any of the automated checks above fails, then the PR should be blocked from merging to the main branch. There’s a GitHub and GitLab config for that.
2.7 Regression Test vs. Service Isolation Test
Some people put regression tests as part of CI/CD build breakers, but I’m not a proponent of this. This kind of test is performed on integration (some called it staging/testing/pre-production) environment which mimics the production environment, and performed against the entire live dependencies. When we’re regressing our microservice, it will hit other running services which are maintained by other teams.
The problem is, integration environment breaks a lot. There’s always some engineers in some other teams who push a bad commit to their service. While our system depends on them, our tests fail multiple times every day. Putting regression tests as part of build breaker will frustrate our team a lot, because the problems mostly originated from the outside.
As long as it’s not a build breaker, regression tests should be performed every time there’s a push to integration environment. This will alert the whole organization when someone or something breaks the business flows.
A better approach, if available, is to put blackbox API tests, or service isolation tests. While unit tests check on components from the inside, and regression tests check on the entire organization flows, service isolation tests check a single service behavior only, from an outsider perspective. For an API service, this means running the service, hitting the API and asserting the results, while mocking everything else (i.e. mocking all REST API calls) including database, cache/memory store, search engine, logging/instrumentation, and other dependencies.
2.8 CI/CD Runners on Auto-Scale
I also heard these complaints a lot: our pipeline runners are slowing down, and some jobs are failing at certain hours. That’s what happens when we provision a fixed amount of machines to serve as runners. So, better put them on a Kubernetes HPA (horizontal pod autoscaling), or Docker Machine Executor set for autoscale.
3. Human Checks
When all the automated checks above are passed, then comes the human reviewers to handle what the machine couldn’t accomplish (yet).
3.1 A Chance for Mentoring
Most of the time, code review is a chance for a mentoring session between senior engineers to their juniors. When a PR is complex enough and worthy of a dedicated discussion, we encourage people to set up a 1-on-1 schedule. Thirty minutes should be more than enough.
3.2 Written-Communication Review
The main responsibility of an engineer is not to write codes that machine can understand — because that’s easy — but to write codes that the other engineers can understand. More on this also applies in the Clean Code section below.
But first, to make our changes easily understandable by other people, we need to write a compact sentence in the PR title that explains the whole change. Another nicely-written short paragraph in the PR description should add a little bit more details, but not too much. Then below that there should be a list of references to other documents that should explain our changes in detail.
While a tool like Danger can enforce a minimum chars/words limit in the PR title and description, a human should review how it’s written. Having readable PR titles and descriptions show maturities in a tech organization.
3.3 Functional + Non-Functional Requirement Check
Since a PR change purpose is written in its description, or in another attached PRD (product requirement document), RFC (request for comments), or ADR (architecture decision records), then a code reviewer should help verify whether the intended feature requirement is met in this PR.
Other than that, these are some questions to ponder:
- Will there be a performance issue caused by this PR? Or concurrency problem?
- Will the change increase traffic load to this service? Or to the other services? Do we need to scale up or scale out certain stacks to cope with the increased load?
- Do we need to add more logging + metrics to measure the impact of this PR?
- Do we need to put this change behind a release or feature toggle? Do we need a percentage-based rollout for this?
3.4 Test Path Coverage
While the test coverage reporter in automated CI/CD jobs can calculate the percentage of tested code and tell us which lines are untested, up until now I couldn’t find a tool to verify the test completeness against all possible paths.
This means for every combination of function arguments, conditionals, side effects, and logic branches, a human should verify whether all possible code paths are covered in the test scenarios. This applies for all types of tests: unit, regression, and service isolation.
Please tell me if you know of such a tool’s existence.
3.5 Clean Code + SOLID Review
I’m a follower of Clean Code, and I believe this is the clearest set of principles to follow to achieve great code quality. These fundamentals are what I usually seek:
- Intent-communicating but concise object namings.
- Well-scoped variables.
- Well-sized-and-structured functions, classes, modules, and other programming objects.
- High-cohesion and loose-couplings —this means good placement and separation of concerns.
- The S + D of SOLID. S because it helps with the sizing of the above programming objects, and to get better maintainability/changeability. And D because it helps with the independent developability + deployability.
- And the I of SOLID when we’re using interface-heavy language like Go-Lang and Kotlin.
In short, everything to make our codes easy to understand and easy to change by other people.
3.6 Design Patterns Review
When assessing code design and structure, sometimes there are opportunities to apply a certain design pattern. Be it from Gang of Four (aka classic 23 object-oriented patterns), concurrency patterns, microservices patterns, enterprise integration patterns, or from any other references.
Although, one sign of mediocrity is to force-impose design patterns on everything we create. So, we use them only when they help, while keeping our codebase simple and not over-complicated.
3.7 Refactoring Principles Review
When improving existing codebase, especially legacy codes, I’m a follower of Martin Fowler’s Refactoring principles. This Refactoring.Guru explains these principles with easy explanations and beautiful visuals.
I haven’t really read this book: Working Effectively with Legacy Code, but this is one of the mandatory references on this topic that we should look into. On the other hand, I believe we should always pay respect to and not despise “legacy codes” because they were the ones who served our users for years and contributed a big amount of revenue already. I’m saying, legacy codes pay my salary.
But aren’t all these steps only bogging down our velocity? Slowing down the entire software development process?
Yes, I think it’s inevitable.
As our tech organization grows, and more people join and contribute their codes, and our codebases are getting bigger with more complexities, we no longer talk about delivering fast, disrupting the market, and breaking things. The emphasis is now on how to do things correctly, how to make team communication better, how to share organizational knowledge, and how to not repeat the same mistakes made by other teams. I believe that’s how a tech organization becomes more mature.
So yeah, there’s an expensive price for great code quality — but not too pricey.