I have multiple lingering tasks for improving monitoring for for our applications. I believe this is a very important step we need to take to assess the quality of our applications and measure the value that we are delivering to customers. If I had my way, I would hire another me just so I can concentrate on this.
We need to monitor usage to better understand how our customer actually use the application in production. This will allow us to make better product design decisions and optimizations, prioritize testing effort in terms of regression coverage, and provide a signal for potential issues when trends are off.
We need a better way to monitor and analyze errors. We currently get an email when certain exceptions occur. We also log exceptions to a database. What we don’t have is a way to analyze exceptions. How often do they occur, what is the most thrown type of exception, what was system health when the exception was thrown.
We need a way to monitor and be alerted of health issues (e.g. current utilization of memory, cpu, diskspace; open sessions; processing throughput…). Ops has a good handle on monitoring, but we need to be able to surface more health data and make it available outside of the private Ops monitoring systems. It’s the old “it takes a village to raise an app” thing being touted by the DevOps movement.
Everyone on the delivery team needs some access to a dashboard where they can see the usability, exceptions, health of the app and create and subscribe to alerts for various condition thresholds that interest them. This should be even shared with certain people outside of delivery just to keep things transparent.
This can all be started in preproduction and once we are comfortable with it pushed to production. The point of having it is that QA is a responsibility of the entire team. Having these types of insight into production is necessary to insure that our customers are getting the quality they signed up for. When the entire team can monitoring production it allows us to extend QA because we can be proactive and not just reactive to issues in production. Monitoring production gives us the ammo we need to take preemptive action to avert issues in production while giving us the data we need to improve the application.