I have multiple lingering tasks for improving monitoring for for our applications. I believe this is a very important step we need to take to assess the quality of our applications and measure the value that we are delivering to customers. If I had my way, I would hire another me just so I can concentrate on this.
We need to monitor usage to better understand how our customer actually use the application in production. This will allow us to make better product design decisions and optimizations, prioritize testing effort in terms of regression coverage, and provide a signal for potential issues when trends are off.
We need a better way to monitor and analyze errors. We currently get an email when certain exceptions occur. We also log exceptions to a database. What we don’t have is a way to analyze exceptions. How often do they occur, what is the most thrown type of exception, what was system health when the exception was thrown.
We need a way to monitor and be alerted of health issues (e.g. current utilization of memory, cpu, diskspace; open sessions; processing throughput…). Ops has a good handle on monitoring, but we need to be able to surface more health data and make it available outside of the private Ops monitoring systems. It’s the old “it takes a village to raise an app” thing being touted by the DevOps movement.
Everyone on the delivery team needs some access to a dashboard where they can see the usability, exceptions, health of the app and create and subscribe to alerts for various condition thresholds that interest them. This should be even shared with certain people outside of delivery just to keep things transparent.
This can all be started in preproduction and once we are comfortable with it pushed to production. The point of having it is that QA is a responsibility of the entire team. Having these types of insight into production is necessary to insure that our customers are getting the quality they signed up for. When the entire team can monitoring production it allows us to extend QA because we can be proactive and not just reactive to issues in production. Monitoring production gives us the ammo we need to take preemptive action to avert issues in production while giving us the data we need to improve the application.
I am not a DevOps guru. I have been learning DevOps and Continuous Improvement for about 6 years now. I wanted to blog about some of what I have learned because I see companies doing it wrong. I wanted to start internalizing some of the lessons I have learned and the grand thoughts I have had just in case someone asks me about DevOps one day.
DevOps is a Religion
I’m not going to define DevOps because there is enough of that going on (https://en.wikipedia.org/wiki/DevOps). I will say that you can’t hire your way to DevOps because it isn’t a job title. You can’t have one team named DevOps and declare you are doing DevOps. Everyone on your application delivery teams have to convert to DevOps. When you only have one team enabling some DevOps practices through tools and infrastructure you are only getting a piece of the DevOps pie. Until you have broken down the silos and increased communication you haven’t realized DevOps.
Do not focus on implementing DevOps by creating another silo in a “DevOps” team. You can create an implementation team that focuses on DevOps processes, tools, and infrastructure, but if this will be a long lived team call them a Delivery Systems team or Delivery Acceleration team and make sure they are embedded in sprint teams and not off in some room guarded by a ticket system. As with some religions, you have to congregate. Your delivery team has to communicate with each other outside of tickets and email.
When you name the team DevOps it pushes responsibility for DevOps to that team, but the byproduct of DevOps is the responsibility of the entire delivery team. This is the same problem with a QA team, your QA team is not responsible for quality, the entire delivery team is responsible for quality. When you have silos like these, it is hard to get a “One Delivery Team” mindset. Find ways to break down silos, then you won’t be one of those companies that missed the DevOps boat because you couldn’t get your new silo’d DevOps team to delivery on the promises of DevOps.
Fast Feedback is a Main By Product
One of the main benefits of doing continuous anything (DevOps includes continuous improvement processes), is you get fast feedback. The tighter, faster your feedback loops the faster you can iterate. Take a small step, get feedback, adjust based on the feedback, and iterate. It’s not rocket science, its simplification. Work in smaller batches, talk about how to make the next batch better; watch your automation pipelines and KPIs, talk about how to make your pipelines and KPIs better… TALK.
Collaboration is the Key that Unlocks the Good Stuff
Having the entire delivery team involved and talking is key. The Business, QA, Security, IT, Operations, Development… everyone must communicate to insure the team delivers the value that end users are looking for. Give end users value, they give the business value, loop. Having a delivery team that huddles in their silos with minimum communication with other teams is a good way to short circuit this loop. DevOps is a way of breaking down the silos and improving collaboration. DevOps is not the best name to convey what it can deliver. Just remember that the DevOps way should extend beyond the development and operations team.
Automation is the Glue that Binds Everything
Having an automated delivery pipeline from source check-in to production enables you to have a repeatable delivery process that is capable of automatically providing fast feedback. It gives the entire team a way to start and stop the pipeline and monitor the pipeline to adjust based on feedback from the pipeline. It also aides in collaboration by providing dashboards and communication mechanisms accessible by the entire delivery team.
If you have no automation, start with automating your build on each check-in. Then automate running of unit tests, then deployment to a test environment, running automated functional tests, deploy to the next environment. Don’t forget virtualization. Figure out how you can virtualize your environments and automate the provisioning of an environment to run your apps in. Start where you are and focus on adding the next piece until you can automatically build once and deploy and test all the way to production. Iterate your way continuous delivery.
Virtualization is Magic Pixie Dust
Many people I have asked think of the DevOps as virtualization and automated server configuration and provisioning. Even though this isn’t everything in DevOps, it’s a big part of it. Being able to spin up a virtual environment to run a test removes environments as a hindrance to more testing. Being able to spin up a virtualized mock environment for a third party service that is not ready allows us to test in spite of the missing dependency. Virtualization in production allows us to hot swap the current environment with a new one when we are ready for the next release or when production nodes are being hammered or being otherwise unruly. Codifying all of this virtualization allows us to treat our infrastructure just like we do product code. We can manage changes in a source control repository and automatically run the infrastructure code as part of our delivery process.
Quality, Security and Health Come First
Before one line of code is written on a change, an analysis of the desired change must be done before delivering it. I’m not saying a large planning document has to be done. The team has to talk through the potential effect on quality, security and health (QSH) and it makes sense to record these discussions somewhere to be used during the iteration. You can create a doc or record it in a ticket, but QSH must be discussed and addressed during the iteration.
QSH is not something that happens after development has declared code complete. It should happen in parallel with development. There should be automated unit, integration and end-to-end checks. There should be automated static analysis and security checks. A load test and analysis of health monitors should be measuring how the applications is responding to changes. This all should happen during development iterations or as close to development as possible.
On a side note, in Health I am lumping performance, scale, stress and any type of test where a simulated load is tested against the application. This could be spinning up a new virtualized environment, running automated tests then turning off the database or a service to see what happens. Health is attempting to introduce scenarios that will give insight into how the application will respond to changes. It may take a lot to get to the level of Netflix and its chaos monkey in production, but having infrastructure and tests in preproduction to measure health will give you something instead of being totally blind to health issues.
I know there is no real meat here or guidance on how to do these things, but that’s what Google is for or read Gene Kim’s the Phoenix Project. Anyway, I may be a little naive on a few points, but the gist is DevOps is more than a job or team title, its more than development and operations signing a peace treaty, more than automated server configuration. Think of it as another step in improving your continuous improvement process with a focus on cross team collaboration where you break down the silos separating all of the teams that deliver your application.
We had to do a restore of a production database and it made me remember a book I read that you may enjoy. It’s called The Phoenix Project. It’s not the best written book, a little unbelievable and hokey at times, but I honestly couldn’t put it down (I’m a tech geek). I have never read a book like it.
It’s about DevOps yet, it’s a novel not a tech book. It leans more on the Ops perspective, is a retelling of The Goal, and is rooted in Kanban and Manufacturing Operations. Yet, it’s something I believe anyone working in Enterprise IT can relate to.
The Kindle version is only $10. If you read it let me know what you think as DevOps is something I am passionate about and I’d like to hear your perspective.
http://itrevolution.com/ – Author’s website
Imagine if you would, a modified game of Jeopardy hosted by Alex Trebec of course. This game is all about getting changes to production and a smart Dev is in the hot seat.
Dev: Can I have Daily Production Pushes for $200?
Alex: You just deployed a new code change to production, it is causing major issues with performance and you need to roll it back. How do you roll back the troublesome change without having to roll back the entire release? (Jeopardy theme music playing)…
Dev: What is a Feature Toggle.
Dev: Can I have Daily Production Pushes for $500?
Alex: You just started work on a new feature that is no where near ready and you check in your changes to source control. The powers that be want to push to production the branch you’ve checked in the new feature… and they want to do it NOW! How do you push the branch and not expose the unfinished feature changes without having to revert the feature and other work mixed in between your feature changes?
Dev: What is a Feature Flag.
Dev: Can I have Daily Production Pushes for $1,000?
Alex: Daily Double
Alex: Marketing wants to be able to turn a feature on and off for customers depending on their transaction volume. How do you accomplish it?
Dev: What is a Ticket Flag.
Alex: You are our Production Jeopardy Champion?