Category: Pipeline
Static Property Bug Caused by a Code Tool
With the following private properties at the top of a class.
private static readonly string taskTimeStampToken = string.Format("{0}Timestamp: ", taskToken);
private static readonly string taskToken = "[echo] @@";
What will this method in the same class print to the console when called.
public void Print()
{
Console.Print(taskTimeStampToken);
}
For all you code geniuses that got this right away, I say shut up with your smug look and condescending tone as you say of course it prints
Timestamp:
So, why doesn’t it print
[echo] @@Timestamp:
Because taskToken hasn’t been initialized yet, of course. The order of static properties matter in your code. Don’t forget it, especially if you use a tool that reorganizes your code.
I really did know this ;), but this fact caused me about an hour of pain. I use a wonderful little tool called Codemaid and I use it to help keep my code standardized. One of its functions is to reorganize my code in a consistent format (e.g. private members above public, constructor before everything…).
I obviously never ran Codemaid on this particular file with code similar to above because unit tests have always passed. Well I had a code change in said file, Codemaid did its thing, it caused the order of the properties to flip like they are above, and unit tests started failing. It took me at least an hour before I took a deep breath and noticed that the properties flipped.
Lesson Learned
- If you do any type of refactoring, even with a tool, make sure you have unit tests covering the file you refactor.
- Use a diff tool to investigate mysteriously failing tests. It will give better visual clues on changes to investigate instead of relying on tired eyes.
- Configure your code formatting tool to not reorganize your static properties, or request the feature to configure it.
GoCD: Agent Running Web Driver Test Hangs After Test Failure [SOLVED]
Problem
I had a nagging issue were some tests were failing the build on our GoCD server, but the agent was not reporting the failure to the server. We are using NAnt to run NUnit tests that in turn call Web Driver to exercise web pages. There were some test failures that correctly returned a non-zero value that failed the build in NAnt. Also, the failure is captured in the log and saved in a text file. Yet, the agent didn’t report the build failure or send the artifacts to the server.
Issue
After a 2 day search for answers and a deep dive into the bowels of GoCD I discovered that a Web Driver process was kept open after the test fails the build. Specifically, the process is IEDriverServer.exe. This process was being orphaned by improper cleanup in the tests that resulted in the Web Driver and browsers staying open after the test failure.
When I ran the tests again, I watched for the failure then manually killed Web Driver and the agent magically reported to the server. I am still unsure why Web Driver would prevent the GoCD agent from reporting to the server. They are both Java processes, maybe there is something going on in the JVM or something… not sure.
Solution
My work around at the moment is to run a task killer on failure in the test script. Here is the relevant portion of the nant script that drives the tests:
<property name="nant.onfailure" value="test.taskkiller" />
<target name="test.taskkiller"> <exec program="taskkiller.bat" failonerror="false"> </exec> </target>
The taskkiller.bat is just a simple bat file that will kill Web Driver and open browsers.
taskkill /IM IEDriverServer.exe /F taskkill /IM iexplore.exe /F
Now this is just a band-aid. We will be updating our test framework to handle this. Additionally, killing all the processes like this isn’t good if we happen to be running tests in parallel on the agent, which may be a possibility in the future.
Sev1 Incident
I read a book called the Phoenix Project. A surprisingly good book about a company establishing a DevOps culture. One of the terms in the book that I had no experience with was Sev1 incident. I have since heard it repeated and have come to find out that it is part of a common grading of incident severity. Well, I decided to finally research it about a year after I read the book and put more thought into a formalized incident reporting, triage, mitigation, and postmortem workflow. Which is similar to the thoughts I had on triaging failing automated tests.
Severity Levels
So, first to define the severity levels. Fortunately, David Lutz has a good break down on his blog – http://dlutzy.wordpress.com/2013/10/13/incident-severity-sev1-sev2-sev3-sev4-sev5/.
Severity Levels
- Sev1 Complete outage
- Sev2 Major functionality broken and revenue affected
- Sev3 Minor problem, bug
- Sev4 Redundant component failure
- Sev5 False alarm or alert for something you can’t fix
Identify Levels
With that I need to define how to identify the levels. IBM has a break down that simplifies it on their Java SDK site – http://publib.boulder.ibm.com/infocenter/javasdk/v1r4m2/index.jsp?topic=%2Fcom.ibm.java.doc.diagnostics.142%2Fhtml%2Fbugseverity.html:
Sev 1
- In development: You cannot continue development.
- In service: Customers cannot use your product.
Sev 2
- In development: Major delays exist in your development.
- In service: Users cannot access a major function of your product.
Sev 3
- In development: Major delays exist in your development, but you have temporary workarounds, or can continue to work on other parts of your project.
- In service: Users cannot access minor functions of your product.
Sev 4
- In development: Minor delays and irritations exist, but good workarounds are available.
- In service: Minor functions are affected or unavailable, but good workarounds are available.
Severity Analysis
Now that we have more guidance on identifying the severity of an incident, how should it be reported? I believe that anyone can report an incident, bug, something not working, but it is up to an analyst to determine the severity level of the report.
So, the first step is for the person who discovered the issue to open a ticket. Of course if it is a customer and we don’t have a self-support system, they will probably report it to an employee in support or sales and the employee will create the ticket for the customer. All tickets should be auto routed to the analyst team where it is assigned to an analyst to triage. The analyst will assign the severity level and assign to engineering support where the ticket will be reviewed, discussed and prioritized. The analyst in this instance can be a QA, BA, even a developer assigned to the task, but the point is to have a dedicated team/person responsible.
During the analysis, a time line of the the failure should be established. What led up to the failure, the changes, actions taken, and people involved should all be laid out in chronological order. Also, during triage, a description of how to recreate the failure should be written if possible. The goal is to collect as much information about the failure as possible in one place so that the team can review and help investigate. Depending on the Sev level various degrees of details and speed in which feedback is given should be established.
Conclusion
This is turning out to be a lot deeper than I care to dive into right now, but this gives me food for thought. My take aways so far are to
- formalize severity levels
- define how to identify the levels
- assign someone to do the analysis and assign the levels
Using GitHub Behind a Proxy (Windows)
At work I am connected to the internet through a proxy. The proxy prevented me from connecting to repositories on GitHub because authentication isn’t handled properly. A co-worker recommended using the CTNLM proxy (http://cntlm.sourceforge.net/) to handle the authentication.
CNTLM works well, but he said he was having a problem with slow connections. He said he found an issue where the proxy would try multiple times to connect and would timeout and finally connect to the Git server. He noticed that it was trying to connect to local host as with ::1, like a funky empty IPv6 address. He said that adding a proxy to .gitconfig (global or systemwide config) would cause it to connect faster without having to wait for all the different connection try and failures:
[http]
proxy = http://127.0.0.1:3128
[https]
proxy = http://127.0.0.1:3128
Why does this work? I don’t have enough geek cred to know this yet, but it works and I wanted to save it here for the time I have to setup a new computer and I forget what I did.
GoCD: Versioning .Net Assemblies
I recently updated my versioning on my build server to help separate CI builds from builds that are being publicly distributed. My versioning scheme for CI builds looks like 5.4.4-239CI37380 following SemVer 2.0 this gives me Major.Minor.Patch-PreRelease. My PreRelease is the “Go Counter” + “CI” + “Source Revision Number”.
Unfortunately, assembly versions use a different scheme, Major.Minor.Build.Revision and are only allowed to have numbers and no dashes (AssemblyVersionAttribute). So, I ended up keeping the CI version for file names, but changed the assembly to just use the Major.Minor.Patch for the assembly Major.Minor.Build (you with me?). Then for to help identify different assemblies I added the Go Counter to the end.
The lesson is to only use numbers in your .Net assembly version numbers.
Deploying NuGet Packages Instead of Zips
I was on a project to improve an application deployment process that used zip files for packaging the applications. Zips are good. They allow you to package and compress files into one bit sized file, but there is so much more to be had with a dedicated package solution. Maven, gem, wheel, npm, cpan, rpm, deb, nuget, chocolatey, yum… the list goes on and with so many options to provide an improved package for deployment its hard to justify using plain old zips.
Since this was a .Net project I focused on NuGet. NuGet is itself a zip file, but a zip on steroids. Zip provides the compression and NuGet adds additional meta data and functionality.
- Standard package meta data and file layout.
- Versioning ala SemVer.org.
- Package manager to control install, upgrade, and uninstall.
- Dependency management.
- Having a package manage file deployment means you have a repeatable process as opposed to manual where one missed file can kill you. Also, when I deploy the same package multiple times the system is in the same state after each deployment, idempotent.
Enough of the sales pitch. Actually, one problem that I had with using NuGet alone was no easy way to validate the package through checksum. So, in addition to NuGet, using a dedicated artifact repository solution like Artifactory gives an added layer of comfort. A good paper, although biased, on Artifactory can be found here.
Happy Packaging!
Build Once, Deploy Everywhere
We were faced with an interesting problem. We want to build once and deploy the build to multiple environments for testing, staging and ultimately consumption in production. Well in addition to build once, we also want to allow remote debugging. We need to build in debug mode to get pdb generated and have other configurations that will allow debugging. Yet, we don’t want to deploy a debug build to staging or production. What do we do?
The thought right now is to do a debug build and create to packages, one for debug and one for release. To do this we would have to strip out the pdb and turn on debugging in the release package. So, we still have one build, but we have two flavors of packages.
I am not yet sure if this is viable, hence the reason I am blogging this out. First I need to fully understand the difference between a debug and release build in MSBuild. I have an idea, but I need to verify my assumptions.
Difference Between Debug and Release Build
What I have found is the main difference between debug and release building are:
- Debug build generates pdb files.
- Release build instructs the compiler to use JIT optimizations.
PDB files are symbol databases that allow a debugger to map machine code to source code (actually MSIL) so you can set break points in source code and the debugger can halt execution of the machine code. This is probably a terrible explanation, but you should get the gist.
JIT optimizations are things the compiler does to speed up the execution of your code. It may reorganize loops to make them run faster and other little magic tricks that happen under the cover that we usually never have to worry about.
Scott Hanselman has an interesting post on this, http://www.hanselman.com/blog/DebugVsReleaseTheBestOfBothWorlds.aspx. This posts suggest that you could do a release build and configure the runtime with an ini file that would determine if JIT optimizations are performed or tracking information is generated.
http://msdn.microsoft.com/en-us/library/9dd8z24x(v=vs.110).aspx this post explains more about the ini.
Now What?
After doing this research I learned a lot about building .Net code, but I also realized that I am taking this a little to far. My primary goal is that we build our application once and use that build in multiple environments to get it tested and deployed to production. When we need to do a remote debug we are using researching an issue and there is no reason that we couldn’t flip a switch on one particular build so that it builds in debug mode, deploy it to a test environment, debug it, make some fixes after finding the cause of the issue we are debugging, then flip the switch back to release and build again, this time allowing the build to go all the way to production.
Issues
The problem here is that we need to make sure that we do not allow debug builds to make it into production. My initial thought is to mark debug builds with a version that is tagged with DEBUG. Then I can have logic in the production deploy that checks for the DEBUG tag and fail the deploy if it is present. We can do the same for pdb files and web.config. Specifically, check for inclusion of pdb (we shouldn’t have pdb file in production). We can also have logic that checks for debug=true and other configurations that we don’t want leaking into production.
We would have to alter our deployment pipeline to add a job that will do these checks based on the environment being deployed. We would have to also look at maybe putting debug builds in a different artifact repository to keep them segregated from release candidates. This would also cause another change to the deployment pipeline where we check the release candidate or debug repository based on some setting.
Conclusion
This would be a lot of changing to our pipeline, but I believe it is worth it in the long run. It also prevents us from leaking manual processes into how we build and deploy the app.
GoCD: Automate Agent Install with PowerShell
I have been setting up build servers and I have been exploring automating the process. So, I have been scripting every step I take to stand the servers up. In this post I will sharing some of the commands I use to create GoCD Agents. If you decide to go down this road, you should think about creating reusable scripts and parameterize the things that change (I didn’t want to do all the work for your :). Also, it would make sense to use a configuration manager like DSC, Puppet or Chef to actually run the scripts.
I am using PowerShell remotely on the build servers, which is indicated by [winbuildserver1]: PS> in the command prompt. Check out my previous post to learn more about configuring remote servers with PowerShell.
Copy Install Files
The first thing I do is copy the install files from the artifact repository to the server.
[winbuildserver1]: Copy-Item -Path \\artifactserver\d$\repository\Go-Agent\go-agent-14.1.0-18882\go-agent-14.1.0-18882-setup.exe -Destination "D:\install-temp\" -Recurse -Force
Install Agent
[winbuildserver1]: PS>([WMICLASS]"Win32_Process").Create("D:\install-temp\go-agent-14.1.0-18882-setup.exe /S /SERVERIP=<ip of go server> /GO_AGENT_JAVA_HOME=<path to JRE> /D=D:\Go Agents\Internal\1\")
Here we are getting a reference to the static WMI class “Win32_Process”, call the create method passing the command line to install an agent (http://www.thoughtworks.com/products/docs/go/current/help/installing_go_agent.html). In the command line we have
- the path to the install file
- /S switch for silent install (no user prompts)
- /SERVERIP switch for the IP of the Go Server (this is optional)
- /GO_AGENT_JAVA_HOME switch for the path to the JRE (this is optional)
- /D switch is the path to location you want to install the agent.
Run Multiple Agents on Same Server
If I want to run multiple agents on the same server I do a little extra work to get the other agents installed.
[winbuildserver1]: PS> Copy-Item "D:\Go Agents\Internal\1\*" -Destination "D:\Go Agents\PCI\1"
[winbuildserver1]: PS> Remove-Item "D:\Go Agents\PCI\1\config\guid.txt"
[winbuildserver1]: PS> Remove-Item "D:\Go Agents\PCI\1\.agent-bootstrapper.running"
Here we are just copying an installed agent to a new location and removing a couple files to force the agent to recreate and register itself with the server.
Create Agent Service
Lastly, I create a service for the agent.
[winbuildserver1]: PS> New-Service -Name "Go Agent PCI 1" -Description "Go Agent PCI 1" -BinaryPathName "`"D:\Go Agents\PCI\1\cruisewrapper.exe`" -s `"D:\Go Agents\PCI\1\config\wrapper-agent.conf`""
Get more on using PowerShell to configure services in my previous post.
Conclusion
I use similar commands to install the server, plug-ins, and other tools and services (e.g. Git, SVN, NuGet…) that I need on the build server. I have to admit that this isn’t totally automated yet. I still have to manually update the service account, credentials and manually accept a certificate to get SVN working with the agent, but this got me 90% done. I don’t have to worry about my silly mistakes because the scripts will do most of the work for me.
GoCD: Environment Variables in Build Scripts
I wanted to use some of the GoCD Environment Variables in my build scripts, unfortunately finding info on how to do that was limited or my search skills lacking.
Anyway, to use a Pipeline Parameter you would tokenize the parameter like so:
#{ParameterName}
To use a GoCD Environment Variable you would tokenize the variable like this:
%VariableName%
GoCD: 404 Error Fetching Artifact [SOLVED]
Problem
[go] Could not fetch artifact https://127.0.0.1:8154/go/remoting/files/pne.test.build/127/Build/1/Build/cruise-output/PreTest.PreTest.nant.log.xml?sha1=8899RvS5mElcpqSju5FdfoYPUQU%3D. Pausing 19 seconds to retry. Error was : Unsuccessful response '404' from the server
This error stumped me for a while. The reason it stumped me is because of the IP addrdess and port, my Go Server is not located there. The Go Agent is not on the same server as the Go Server, so it shouldn’t be using a local IP. The agent configuration is properly pointed to the Go Server’s IP and port. I assumed that the 404 was because of the incorrect IP and port and I did a lot of research and digging trying to correct it.
Issue
I finally figured out that this error is simply stating that the file was not found.
Solution
I am not sure why the wrong IP and port is reported, but when the file in the error was added to the artifacts on the server, the error went away.