DevOps, Technical Principle or Institutional Discipline?

Work In Progress

DevOps as a term is seen to have emerged after a 2009 Velocity conference where John Allspaw and Paul Hammond gave their “10 Deploys per day: Dev and Ops Cooperation at Flickr” presentation. During which they described how they created shared goals between Dev and Ops to make deployment a part of everybody’s day job.

Later that year Patrick Debois excited by the idea went on to create the first DevOpsDay in Belgium where the term “DevOps” was born.

Source “DevOps Handbook p. 5” 

The DevOps and I

“To know where we’re going, we need to know from where we came”

I have no idea when I first came across the empty vessel widely referred to as devops but likely it was a couple of years after it’s inception as a thing over at “el Reg” and no doubt like many people from a technical operations and production support background I looked at it with something akin to dejection. “Another word to add to the lingo bingo of tech talk,” I thought. On the surface, the language used to describe devops function tends to revolve around the web, Greenfield deployments and startup ventures with nothing much to offer the enterprise. 10s (at the time) of deployments a day, globally distributed presence at a flick of a switch, self-service for developers so on and so forth, wicked but can you get Jenny in systems her new laptop on time and will the bacs system be up for payroll run? Can you tell me anything about how we get this monolithic java enterprise app running? No? Ah well.

Another thing about these early years of mine and no doubt others exposure to the concept of devops was that a lot of it sounded like things we already did, using scripts to provision resources, having monitoring systems perform self-healing, trying to help developers with their local environment builds. Nothing I’d heard coming out of the devops space seemed like it wanted to solve these problems in an enterprise or legacy setting, often hand waving and the “let devs run wild on production” which would make most operations peoples blood run cold.  Now many of us are no fools, we know that if we get the dev some level of read access or a way to aggregate logs to them it’ll speed up resolution of issues, but actual production access is concerning on a number of levels particularly if you’ve been in the game for a number of decades. We’ve all had similar conversations to the below

Dev "It works on my local system"
Ops "But you don't run firewalls and everything is on the same system"
Dev "Sure it's quicker that way and it works"
Ops "You're supposed to use the test environment"
Dev "That things always broke, I raised a ticket a month ago to get it fixed"

As the years went by and nothing particularly insightful came on the overall principles around devops a number of really useful tools and concepts started to emerge namely central configuration management tools like puppet and chef allowing for “Infrastructure as code” and the idea of Test Driven Infrastructure. From an operations point of viewpoint, these technologies and ideas helped solve real problems and were a successor to the system administrators toolbox of bash, perl and powershell scripts operations had since the Unix days.

Roll forward again and everyone is talking about bots, and I get interested so go off to do a quick Microsoft academy session on bots as the framework had just come out and they recommended two books, “The Lean Enterprise” and “The Devops Handbook” okay I’ll bite. It may have been about azure container services actually, anyway if you’re bored I’d suggest going to have a look at the Microsoft academy it’s pretty good.

And Lo There were Principles!

Page XIV of the DevOps handbook preface, the following two and a half pages lay out the devops myths that permeated most of the blogs, consultants, management speak and, readily available online discussion I had encountered to that point.

“DevOps is only for startups”
“DevOps is incompatible with ITIL”
“DevOps is incompatible with security and regulatory compliance”
“DevOps means eliminating IT Operations”
“DevOps is just infrastructure as code”

There were two others but these were some of my main areas of interest, then the book in quick succession moves onto its introduction “Imagine a World Where Dev and Ops Become DevOps” the very first paragraph asking you to imagine a world where all the various main roles in the delivery of IT services to the customer all work together, not just to help each other out, but towards overarching organisational goals! Holy Mother I’m holding dynamite I thought, it then goes on to explain the IT Death Spiral (I’ll dive into these later) and I’m just going “Yep, I know this, I’ve seen this, are they in my office?” Of course, they weren’t it’s simply that the same story is being played out in organisations around the world all the time there are likely hundreds of thousands of organisations spiralling the drain as I write. I didn’t get much further into the book at that point but I came away with two things.
Number One, for the first time in well I can’t really say it felt like I had had an epiphany. This could really change things, for everyone, forever!
Number Two, I had to read more, I immediately picked up the Pheonix Project (a novel about a fictional organisation that finds DevOps) and Toyota Kata. Why Toyota Kata? Well because in Toyota lived the principles that the DevOps handbook and Pheonix Project alluded too.

For me, devops had become DevOps.

The Friction at the Heart of IT Services

I’ll use IT Services as a catch-all for the all-encompassing mass of an organisation’s various technology obligations including but not limited to security, projects, quality, testing, development, servicing systems that customers directly use, core and remote infrastructure, servicing systems required by end users to do their day job and generic support.

At the core of a non-DevOps organisation, there is an overwhelming friction between “development” and “operations” illustrated below

With these two major arms of the IT Service in a state of constant conflict, it’s not entirely surprising that heated confrontation is often routine and we end up in a state of cold war with the two sides often passively undermining one another.

Often a release into production becomes akin to throwing a ball over a wall… though as me and a friend in QA management often say it’s more like they polish up a turd and then throw that over the wall. But let us pretend it’s a ball.

"Development are idiots, they've shipped buggy code again!" Says Ops girl.
"Operation are idiots, they've deployed the wrong tar ball again!" Says Dev boy.

As a result of every failed release Operations add a new administrative step or an additional manual test to stop the problem from happening again while development gets dragged into trying to fix a problem they can only see in the rear glass mirror which with every new step gets smaller and smaller.  At the same time, the time that is supposed to be being used to develop the next feature is getting smaller which will lead to less testing time and increases the chance of new problems in production. Everything just keeps getting worse as pressure at both ends of the supply chain gets higher, downtime windows get longer, the frequency of releases drops, fear builds with each new deployment, fixes upon fixes, patches upon patches, bodges upon bodges, heroic firefighting from all sides, we are going down the drain faster and faster. Before you know it you’re down to two releases a year and months of planning before each one, invariably followed by months of firefighting and emergency out of order patching.

This is the conflict at the heart of most organisations IT Services and It is the source of the IT Death Spiral.

You know you’re in the death spiral if, tick as applicable:-

  • Every release is met with dread.
  • Development is producing dozens of pages of documentation detailing every tiny thing.
  • Release meetings that take hours spread over months.
  • Everybody and their grandmothers are in those meetings but rarely anyone from development or operations.
  • Operations spend weeks preparing their documentation and doing test runs.
  • Everyone is trying not to be part of the deployment team.
  • You need a deployment team!
  • Everything is done on site.
  • You do the release over a weekend (or even better, bank holiday weekend!)
  • You’ve got a change process even your auditors think is thorough.
  • The feature set of the release is likely a tenth of what it was supposed to be and you already know there are bugs in it.
  • Nobody really knows why you’re putting the release in.
  • Nobody in the business is excited or expects it to work.
  • The development cycle is shot to pieces.
  • The release window is guaranteed to overrun.

Now I’ve fallen into the old trap of only talking about releases, but I am attempting to highlight the core conflict, I shall get onto the mundane but critical items later on.

Lean and DevOps, Here to Help

Jeez, we having nightmares yet? Well, all hope is not lost!

“Start by getting the boring stuff right”

So let’s cut to the chase, in my opinion, DevOps is no more a technical skill set than the Toyota Production System is. In The Pheonix Project the first time we see the fruits of everyone’s labor is when Bill gets his laptop delivered to him, he can’t believe that his laptop could possibly have been replaced so he goes off to complain to Patty who points out that yes they’ve done all the laptops and that amazingly they can now accurately predict when they will get future laptops to users! What has this got to do with cloud deployment skills, continuous deployment or automated environment management? Nothing of course.

So what is DevOps? What do we need to reach DevOps and break the self-destructive IT death spiral?

One thing we need to do is we need to determine where we are and where we want to be. High-level organisational goals that are well understood should be by everyone and every project should be aligned with those goals this is part of understanding the current state. Once you have goals you can go through everything and check to see if it will affect those goals if it doesn’t then it should be considered for dumping. Having high-level ephemeral goal is also important for an organisation overall as it gives you a True North to align yourself towards when you’re wandering through the unknown territory that is the now.

Communication is key, break down barriers that inhibit the accurate transfer of knowledge and make sure everyone is on the same page and aligned with the organisational goals. It’s a lot easier to push a boulder if you’re all pushing from the same side.