Optimizing the Software Delivery Pipeline: Deployment Metrics
Currently, I have no way of easily determining what build version is deployed to an environment. This made me take more interest in metrics about deployments, we basically have none. I can look at the the CD (continuous deployment) server and see what time a deployment was done and I can look at the builds on the server and sort of deduce which build was deployed, but I have to manually check the server to verify my assumptions. I wondered what else I am missing. Am I flying blind, should I know more?
Metrics in the Software Delivery Pipeline
I am part of a work group that is exploring software quality metrics. So, my first instinct was to think about deployment quality metrics. After some soul searching, I decided what would be most helpful to me is to know where our bottle necks are. We have an assembly line or pipeline that consists of various stages our software goes through as it makes its way to public consumption. Develop, build, deploy, test, and release are the major phases of our software delivery pipeline (I am not including planning or analysis right now as that is another animal).
I believe that metrics that focus on reducing time in our software delivery pipeline will be more effective than just focusing on reducing defects or increasing quality. If we can reduce defects or increase quality in faster delivery iterations, the effect of defects and poor quality will have less of an impact. This is the point of quality metrics in the first place, reducing the effects of poor quality on our customers and the business. Focusing on reducing time in the pipeline also supports our quality initiatives as the tools to reduce time, like automated CI and testing, not only reduce iteration time, but improve quality. Faster release iterations will allow us to address quality issues quicker. This is not to say that other metrics should be ignored. I just think that since we have no real metrics at the moment starting with metrics that support speeding up the pipe is a worthy first step.
Back to the point. What metrics should I capture for deployments. If my goal is to increase throughput in the pipeline, I need to identify bottlenecks. So, I need some timing data.
- How long does deployment take?
- How long do the individual deployment steps take?
- How do we report this over time so we can identify issues?
This is pretty simple and I can extract it from the deployment log on the server. Reporting would be just a matter of querying this data and displaying deployment time totals over time.
Additional Deployment Metrics
In addition to the timing data it may be worthwhile to capture additional metrics like the size of deployment. Deploying involves pushing packages across wires and the size of the packages can have an effect on deployment time. Issues with individual servers can affect deployment time so, knowing the servers being deployed to can help identify server issues. With the timing data, we can also capture
- The version of the build being deployed
- The environment being deployed to
- The individual servers being deployed to
- The size and version of the packages being deployed to a server
So, my first iteration of metrics center around timing, but would also have other data to give a more robust picture of deployments. This is a naive first draft of what the data schema could look like. I would suspect that this can all be captured on most CI/CD servers and augmented with data generated by the reporting tool:
- Deployment Id – a unique identifier for the deployment, generated by the reporting tool
- Environment Id – a unique identifier for the environment deployed to, generated by the reporting tool
- Build Version – build version should be the version captured on the server
- Timestamp – timestamp is the date/time the deployment record was created
- Start – the date/time the deployment started
- End – the date/time the deployment completed
- Tasks – tasks are the individual steps taken by the deployment script; it is possible that there is only one step, it all depends on how deployment is scripted
- Deployment Task Id – a unique identifier for the task, generated by the reporting tool
- Server Id – a unique identifier for the physical server deployed to, generated by the reporting tool
- Packages – packages represent the group of files pushed to the server, this is normally a zip or NuGet package in my scenarios
- Package Version – the version of the package being pushed, this may be different than the software version and is generated outside of the reporting tool
- Package Size – the physical size of the package in KB or MB (not sure which is better)
- Start – the date/time the deployment to the server started
- End – the date/time the deployment to the server ended
Imagine the above as some beautiful XML, JSON, or ProtoBuf, because I am too lazy to write it.
If my goal is to increase throughput in the pipe I should probably think about a higher level of abstraction in the hierarchy so that I can relate metrics from other parts of the pipeline. For now I will focus on this as a first step to prove that this is doable and provides some value.
All I need to do is a create data parsing tool that can be called by the deployment server once a deployment is done. The tool will receive the server log and store it, parse the log and generate a data structure similar to above, then store the data in a database. Then I have to create a reporting tool that can present graphs and charts of the data for easy analysis. Lastly, create an API that will allow other tools to consume the data. This maybe a job for CQRS and event sourcing. Easy right :). I know there is a tool for that, but I am a sucker for punishment.
This post will take more time than I thought so I will make this a series. I will cover my thoughts on metrics for development, build, test, and release in upcoming posts (if I can remember). Then possibly some posts on my thoughts on how the metrics and tools can be used to optimize the pipeline. Pretty ambitious, but sounds like fun to me.