Bisecting Our Code Quality Pipeline
I want to implement gated check-ins, but it will be some time before I can restructure our process and tooling to accomplish it. What I really want is to be able to keep the source tree green and when it is red provide feedback to quickly get it green again. I want to run tests on every commit and give developers feedback on their failing commits before it pollutes the source tree. Unfortunately, to run the tests as we have it today would take too long to test on every commit. I came across a quick blog post by Ayende Rahien on Bisecting RavenDB and they had a solution were they used git bisect to find the culprit that failed a test. They gave no information on how it actually worked just a tease that they are doing it. I left a comment to see if they would share some of their secret sauce behind their solution, but until I get that response I wanted to ponder it for a moment.
To speed up testing and also allow test failure culprit identification with git bisect we would need a custom test runner that can identify what test to run and run them. We don’t run tests on every commit, we run tests nightly against all the commits that occurred for the day. When the test fails it can be difficult identifying the culprit(s) that failed the test. This is were the Ayende steps in with his team’s idea to use bisect to help identity the culprit. Bisect works by traversing commits. It starts at the commit we mark as the last known good commit to the last commit that was included in the failing nightly test. As bisect iterates over the commits, it pauses at each commit and allows you to test it and mark if it is good or bad. In our case we could run a test against a single commit. If it passes, tell bisect its good and to move to the next. If it fails, save the commit and failing test(s) as a culprit, tell bisect its bad and to move to the next. This will result in a list of culprit commits and their failing tests that we can use for reporting and bashing over the head of the culprit owners (just kidding…not).
Custom Test Runner
The test runner has to be intelligent enough to run all of the tests that exercise the code included in a commit. The custom test runner has to look for testable code files in the commit change log, in our case .cs files. When it finds a code file it will identify the class in the code file and find the test that targets the class. We are assuming one class per code file and one unit test class per code file class. If this convention isn’t enforced, then some tests may be missed or we have to do a more complex search. Once all of the test classes are found for the commit’s code files, we run the the tests. If a test fails, we save the test name and maybe failure results, exception, stack trace… so it can be associated with the culprit commit. Once all of the tests are ran, if any of them failed, we mark the commit as a culprit. After the test and culprit identification is complete, we tell bisect to move to the next commit. As I said before, this will result in a list of culprits and failing test info that we can use in our feedback to the developers.
Make It Faster
We could make this fancy and look for the specific methods that were changed in the commit’s code file classes. We would then only find tests that test the methods that were changed. This would make testing focused like a lazer and even faster, but we could probably employ Roslyn to handle the code analysis to make finding tests easier. I suspect tools like ContinuousTests – MightyMoose do something like this, so it’s not that far fetched an idea, but definitely a mountain of things to think about.
Well this is just a thought, a thesis if you will, and if it works, it will open up all kind of possibilities to improve our Code Quality Pipeline. Thanks Ayende and please think about open sourcing that bisect.ps1 PowerShell script 🙂