From Automated to Continuous
Due to circumstances I cannot recall, there was a specific Friday when we did not manage to deploy. We also did not deploy on the Sunday evening, as had been the deplorable practice on occasion. In stead, we deployed early Monday morning. And what do you know, we suddenly remembered that the reason we started deploying on Fridays in the first place was because deployments took so long and were volatile. The Monday morning deployment went well, and we were able to sort out system problems during work hours. Novel thought, that…
However, the problem of system destabilisation because of deployed changes was still a reality. Why? Because we were deploying a week’s worth of programming, and refactoring, and DB schema and data changes at once. And I do not know about you, but our team consists of 8 developers, and I can honestly say we produce a very significant amount of new code, and a significant transformation of existing code, in the period of a week. In short, there is a massive delta going into production at deployment time. How does one overcome that? By deploying smaller deltas. And the only way to do that is to deploy more often.
Now here I get to the very essence of why Continuous Delivery results in such huge benefits, but it is also the part that hits traditionalists right between the eyes and makes them balk.
“You want to do what?” they ask. “Deploy every day? But the system always gives problems after you deploy, and once a week is already too much! And what about QA and sign-off?”
It seems counter intuitive, but deploying more often makes the system more stable. Currently we are in a rhythm of deploying the entire system every morning, with multiple sub system deployments throughout the day. Which ever team member get’s in to the office first, usually at around 6:45 AM, clicks the Deploy button in Team City, our build server. About 7 minutes later, the entire distributed system is rolled out to production.
How does it become more stable with more frequent deployments? It comes down to 2 things: smaller deltas, as I have already mentioned, and permanently integrated code.
Deploying only a day’s worth, or less, of coding changes means if something goes wrong, the changes that were made are still fresh in your mind, and the context of the problem is easily understood and usually resolved within a few minutes. Often via another snappy deployment. Conversely, the bigger the delta you deploy at one time, the more places there are that could be causing the problem, the more unforeseen side effects can result, and the longer it takes to track it down and resolve it.
Permanently integrated code
This means that all code, or 95% of all code, gets committed into our master branch daily. It also means that all code that is committed into master must be production ready, with automated tests written and first level manual testing performed. Essentially this comes back down to small deltas again. If you have long lived feature branches in your source code control system, when you bring them into master, it is a huge delta that comes in and that is going to be deployed into production for the first time. However, some features are just too big to finish in a day, and you can only go live with it when the feature is complete and fully tested. So how do you commit such partial feature code into master? By putting those features behind various levels of settings and permissions. Books have been written about this topic, but the skinny is that every day, your small code changes to the unused feature is deployed into production, and even though it is not turned on yet, it is fully integrated into the production source code. It runs on the CI server through the test suite a couple of hundred times before it goes live. And then, once the feature is done, and QA has signed off on it on the staging environment, as we recently had with a high impact financial feature, all you have to do is to flip the setting in production to turn the feature on. No need even for a new deployment in many cases, because the feature has already been incrementally, and silently growing to maturity in the production environment.
What has CD done for us?
- Friday night deployments are a thing of the passed.
- (I can now write blog posts like this one on a Friday night in stead)
- The team is much less stressed.
- The system is much more stable.
- The team’s level of confidence to support the system has skyrocketed.
- Business’s confidence in the system has skyrocketed.
- We deliver new features and value to business and customers every single day.
- We keep feedback cycles really short which keeps us really agile and responsive.
- We are able to quickly start recording new metrics into our realtime monitoring system.
This post is called “A real world journey of fighting toward Continuous Delivery” because it was indeed both a journey and a fight to get where we are. Has it been worth it? Absolutely, without a doubt. The rewards have far outstripped the cost. Have we arrived? Absolutely not. There are still many, many things we can improve. And we are working toward deployments every hour, or every 30 minutes, or every 10 minutes. Whatever turns out to be the optimal cadence for our context.
Not all systems are suited to CD. Not all minds are open to CD. But, if you are in an environment that would benefit, start making the little adjustments toward a better deployment pipeline. Don’t ask for permission, but responsibly make judgment calls in your area of expertise. Incrementally set the stage for blowing the minds of the skeptics with proof. And enjoy the ride!