I had a nagging issue were some tests were failing the build on our GoCD server, but the agent was not reporting the failure to the server. We are using NAnt to run NUnit tests that in turn call Web Driver to exercise web pages. There were some test failures that correctly returned a non-zero value that failed the build in NAnt. Also, the failure is captured in the log and saved in a text file. Yet, the agent didn’t report the build failure or send the artifacts to the server.
After a 2 day search for answers and a deep dive into the bowels of GoCD I discovered that a Web Driver process was kept open after the test fails the build. Specifically, the process is IEDriverServer.exe. This process was being orphaned by improper cleanup in the tests that resulted in the Web Driver and browsers staying open after the test failure.
When I ran the tests again, I watched for the failure then manually killed Web Driver and the agent magically reported to the server. I am still unsure why Web Driver would prevent the GoCD agent from reporting to the server. They are both Java processes, maybe there is something going on in the JVM or something… not sure.
My work around at the moment is to run a task killer on failure in the test script. Here is the relevant portion of the nant script that drives the tests:
<property name="nant.onfailure" value="test.taskkiller" />
<target name="test.taskkiller"> <exec program="taskkiller.bat" failonerror="false"> </exec> </target>
The taskkiller.bat is just a simple bat file that will kill Web Driver and open browsers.
taskkill /IM IEDriverServer.exe /F taskkill /IM iexplore.exe /F
Now this is just a band-aid. We will be updating our test framework to handle this. Additionally, killing all the processes like this isn’t good if we happen to be running tests in parallel on the agent, which may be a possibility in the future.