Category: DevOps

Thoughts on DevOps

I am not a DevOps guru. I have been learning DevOps and Continuous Improvement for about 6 years now. I wanted to blog about some of what I have learned because I see companies doing it wrong. I wanted to start internalizing some of the lessons I have learned and the grand thoughts I have had just in case someone asks me about DevOps one day.

DevOps is a Religion

I’m not going to define DevOps because there is enough of that going on (https://en.wikipedia.org/wiki/DevOps). I will say that you can’t hire your way to DevOps because it isn’t a job title. You can’t have one team named DevOps and declare you are doing DevOps. Everyone on your application delivery teams have to convert to DevOps. When you only have one team enabling some DevOps practices through tools and infrastructure you are only getting a piece of the DevOps pie. Until you have broken down the silos and increased communication you haven’t realized DevOps.

Do not focus on implementing DevOps by creating another silo in a “DevOps” team. You can create an implementation team that focuses on DevOps processes, tools, and infrastructure, but if this will be a long lived team call them a Delivery Systems team or Delivery Acceleration team and make sure they are embedded in sprint teams and not off in some room guarded by a ticket system. As with some religions, you have to congregate. Your delivery team has to communicate with each other outside of tickets and email.

When you name the team DevOps it pushes responsibility for DevOps to that team, but the byproduct of DevOps is the responsibility of the entire delivery team. This is the same problem with a QA team, your QA team is not responsible for quality, the entire delivery team is responsible for quality. When you have silos like these, it is hard to get a “One Delivery Team” mindset. Find ways to break down silos, then you won’t be one of those companies that missed the DevOps boat because you couldn’t get your new silo’d DevOps team to delivery on the promises of DevOps.

Fast Feedback is a Main By Product

One of the main benefits of doing continuous anything (DevOps includes continuous improvement processes), is you get fast feedback. The tighter, faster your feedback loops the faster you can iterate. Take a small step, get feedback, adjust based on the feedback, and iterate. It’s not rocket science, its simplification. Work in smaller batches, talk about how to make the next batch better; watch your automation pipelines and KPIs, talk about how to make your pipelines and KPIs better… TALK.

Collaboration is the Key that Unlocks the Good Stuff

Having the entire delivery team involved and talking is key. The Business, QA, Security, IT, Operations, Development… everyone must communicate to insure the team delivers the value that end users are looking for. Give end users value, they give the business value, loop. Having a delivery team that huddles in their silos with minimum communication with other teams is a good way to short circuit this loop. DevOps is a way of breaking down the silos and improving collaboration. DevOps is not the best name to convey what it can deliver. Just remember that the DevOps way should extend beyond the development and operations team.

Automation is the Glue that Binds Everything

Having an automated delivery pipeline from source check-in to production enables you to have a repeatable delivery process that is capable of automatically providing fast feedback. It gives the entire team a way to start and stop the pipeline and monitor the pipeline to adjust based on feedback from the pipeline. It also aides in collaboration by providing dashboards and communication mechanisms accessible by the entire delivery team.

If you have no automation, start with automating your build on each check-in. Then automate running of unit tests, then deployment to a test environment, running automated functional tests, deploy to the next environment. Don’t forget virtualization. Figure out how you can virtualize your environments and automate the provisioning of an environment to run your apps in. Start where you are and focus on adding the next piece until you can automatically build once and deploy and test all the way to production. Iterate your way continuous delivery.

Virtualization is Magic Pixie Dust

Many people I have asked think of the DevOps as virtualization and automated server configuration and provisioning. Even though this isn’t everything in DevOps, it’s a big part of it. Being able to spin up a virtual environment to run a test removes environments as a hindrance to more testing. Being able to spin up a virtualized mock environment for a third party service that is not ready allows us to test in spite of the missing dependency. Virtualization in production allows us to hot swap the current environment with a new one when we are ready for the next release or when production nodes are being hammered or being otherwise unruly. Codifying all of this virtualization allows us to treat our infrastructure just like we do product code. We can manage changes in a source control repository and automatically run the infrastructure code as part of our delivery process.

Quality, Security and Health Come First

Before one line of code is written on a change, an analysis of the desired change must be done before delivering it. I’m not saying a large planning document has to be done. The team has to talk through the potential effect on quality, security and health (QSH) and it makes sense to record these discussions somewhere to be used during the iteration. You can create a doc or record it in a ticket, but QSH must be discussed and addressed during the iteration.

QSH is not something that happens after development has declared code complete. It should happen in parallel with development. There should be automated unit, integration and end-to-end checks. There should be automated static analysis and security checks. A load test and analysis of health monitors should be measuring how the applications is responding to changes. This all should happen during development iterations or as close to development as possible.

On a side note, in Health I am lumping performance, scale, stress and any type of test where a simulated load is tested against the application. This could be spinning up a new virtualized environment, running automated tests then turning off the database or a service to see what happens. Health is attempting to introduce scenarios that will give insight into how the application will respond to changes. It may take a lot to get to the level of Netflix and its chaos monkey in production, but having infrastructure and tests in preproduction to measure health will give you something instead of being totally blind to health issues.

Conclusion

I know there is no real meat here or guidance on how to do these things, but that’s what Google is for or read Gene Kim’s the Phoenix Project. Anyway, I may be a little naive on a few points, but the gist is DevOps is more than a job or team title, its more than development and operations signing a peace treaty, more than automated server configuration. Think of it as another step in improving your continuous improvement process with a focus on cross team collaboration where you break down the silos separating all of the teams that deliver your application.

Cross Domain PowerShell Remoting [Fail]

I tried to run our PowerShell environment configuration scripts today and got hit with a nasty error. I double checked my credentials so I know that wasn’t the issue. The scripts worked just a month ago, but we did have some stupid security software installed on our workstations that may be adjusting how remoting works. Let’s see if I can get around it before I open a ticket and start complaining.

Here is the error. This results from a simple call to New-PSSession. The other server is in another domain, but like I said this has been working just fine.

 New-PSSession : [agpjaxd1pciapp1] Connecting to remote server agpjaxd1pciapp1 failed with the following error message : WinRM cannot process the request. The following error with errorcode 0x80090311 occurred while using Kerberos authentication: There are currently no logon servers available to service the logon request.
 Possible causes are:
  -The user name or password specified are invalid.
   -Kerberos is used when no authentication method and no user name are specified.
   -Kerberos accepts domain user names, but not local user names.
   -The Service Principal Name (SPN) for the remote computer name and port does not exist.
   -The client and remote computers are in different domains and there is no trust between the two domains.
  After checking for the above issues, try the following:
   -Check the Event Viewer for events related to authentication.
   -Change the authentication method; add the destination computer to the WinRM TrustedHosts configuration setting or use HTTPS transport.
  Note that computers in the TrustedHosts list might not be authenticated.
    -For more information about WinRM configuration, run the following command: winrm help config. For more information, see the about_Remote_Troubleshooting Help topic.

After I read this, I just stared at this for about 5 minutes; deer in the head lights.

I found some hope on the PowerShell Scripter’s friend, “Hey Scripting Guy” blog – http://blogs.technet.com/b/heyscriptingguy/archive/2013/11/29/remoting-week-non-domain-remoting.aspx.

Anyway, the solution from Honorary Scripting Guy, Richard Siddaway was to add the computer I am connecting to the the trusted host list. The trusted host list basically tells your computer, “Hey, you can trust this computer, go ahead and share my sensitive and private credentials with the.” So, be careful with this.

You can view the trusted host list with this PowerShell command.

Get-Item -Path WSMan:\localhost\Client\TrustedHosts

You can add a computer to the trusted list with this command.

Set-Item -Path WSMan:\localhost\Client\TrustedHosts -Value 'computerNameOfRemoteComputer'
[Y] Yes  [N] No  [S] Suspend  [?] Help (default is "Y"): Y

Now, I run the configuration script and I am deer in the head lights again.

New-PSSession : Opening the remote session failed with an unexpected state. State Broken.

Such a helpful error message. Stackoverflow – http://stackoverflow.com/questions/30617304/exchange-remote-powershell-gets-sporadic-broken-state. Looks like it may be a timeout, and I’m feeling that because the script sat on “Creating Session” forever (why so long is probably the next question). I update my script to increase timeout.

$so = New-PSSessionOption -IdleTimeout 600000
$Session = New-PSSession -ComputerName $node.ComputerName -Credential $credential -SessionOption $so;

10 minute timeout is good right? So, I try again and State is still Broken. Not mission critical at the moment so I will investigate further later.

You can read more about possible solutions at the links above.

GoCD: Install Multiple Agents with Powershell, Take 2

I wrote about how to Automate Agent Install with PowerShell and thought I would provide the script I am using now since I recently had to deploy some new agents. The script is below and it is pretty self explanatory and generally follows my previous blog post and the Go.cd documentation.

We basically, copy an existing agent to a new location, remove some files that are agent specific, and create a Windows service to run the agent. Until I feel the pain of having to do it, I set the service account/password and start the service manually. Also, I configure the agent on the server manually through the Go.cd UI. When I have to install more agents I will probably automate it then.

$currentAgentPath = "D:\Go Agents\Internal\1";
$newAgentName = "Go Agent Internal 3";
$newAgentPath = "D:\Go Agents\Internal\3\";

Write-Host "Copying Files"
Copy-Item "$currentAgentPath\" -Destination $newAgentPath -Recurse;

Write-Host "Deleting Agent Specific Files"
$guidText = "$newAgentPath\config\guid.txt";

if (Test-Path $guidText)
{
 Remove-Item $guidText;
}

Remove-Item "$newAgentPath\.agent-bootstrapper.running";

Write-Host "Create Agent Service"
New-Service -Name $newAgentName -Description $newAgentName -BinaryPathName "`"$newAgentPath\cruisewrapper.exe`" -s `"$newAgentPath\config\wrapper-agent.conf`"";

#$credential = Get-Credential;
#Eventually, we will write a function to set the service account and password and start the service would be nice to have a way to automatically configure the agent on the server too.

I guess I decided to do the work for you 🙂

Enjoy

PowerShell in Visual Studio, Finally, At Last… Almost

Even if you don’t like Microsoft or .Net, you have to admit that Visual Studio is a boss IDE. After being thrust into the world of scripting and PowerShell, it was disappointing to find the PowerShell support in Visual Studio to be lacking. Well, today I received a notice that Microsoft joined Adam Driscoll’s open source project, PowerShell Visual Studio Tools (PVST). They announced a release of a new version and I am ready to give it another go.

Adam makes note that Microsoft submitted a large pull request full of bug fixes and features. This project provides pretty nice PowerShell support inside my favorite IDE including:

  • Edit, run and debug PowerShell scripts locally and remotely using the Visual Studio debugger
  • Create projects for PowerShell scripts and modules
  • Leverage Visual Studio’s locals, watch, call stack for your scripts and modules
  • Use the PowerShell interactive REPL window to execute PowerShell scripts and command right from Visual Studio
  • Automated Testing support using Pester
    From https://visualstudiogallery.msdn.microsoft.com/c9eb3ba8-0c59-4944-9a62-6eee37294597

You can download it for free from the Visual Studio Gallery. A quick double click install of the visx file you download and your ready.

My first test was to create a PowerShell project. In the Visual Studio New Project window there’s a new project template type, PowerShell. Inside of it are two templates: PowerShell Module Project and PowerShell Script Project.

Scripting and Debugging

I start with a script project and bang out a quick Hello World script to see debugging in action.

$myName = "Charles Bryant"
$myMessage = "How you doin?"

function HelloWorld($name, $message) {
  return "Hello World, my name is $name. $message"
}

HelloWorld $myName $myMessage

It feels very comfortable… like Visual Studio. I see IntelliSense, my theme works and I can see highlighting. I can set breakpoints, step in/over, see locals, watches, call stack, console output… feeling good because its doing what it said it can do and scripting PowerShell now feels a little like coding C#.

REPL Window

What about the REPL window. After a little searching, I found it tucked away on the menu: View > Other Windows > PowerShell Interactive Window. You can also get to it by Ctrl + Shift + \. I threw some quick scripts at it… ✓, it works too.

Unit Testing

Last thing I have time for is testing unit testing. First, I install Pester on the solution. Luckily there’s a NuGet package for that.

>Install-Package Pester

Then I create a simple test script file to test my Hello World script.

$here = Split-Path -Parent $MyInvocation.MyCommand.Path
$sut = (Split-Path -Leaf $MyInvocation.MyCommand.Path).Replace(".tests.", ".")
. "$here\$sut"

Describe "HelloWorld" {
 It "returns correct message" {
   HelloWorld "Charles Bryant" "How you doin?" | Should Be "Hello World, my name is Charles Bryant. How you doin?"
 }
}

Houston there’s a problem. When I open the Test Explorer I can see a bunch of tests that come with Pester, but I don’t see my little test. I try to reorganize the tests in the explorer and it freezes. Not sure if this is a problem with PVST, Pester, NuGet, Visual Studio, or user error… oh well. I can’t say it is a problem with PVST because I didn’t try to find out what was wrong (I still have work to do for my day job).

Conclusion

OK, unit testing isn’t as intuitive as the other operations, hence the Almost in the title. It will feel complete when I get unit testing working for me, but none the less, I like this tool a lot so far. I will definitely be watching it. If I see something up to my skills that I can contribute, I will definitely pitch in as this is something I can definitely use.

IIS 8 Configuration File

Note to self

The IIS 8 configuration file is located in %windir%\System32\inetsrv\config\applicationHost.config. It is just an XML file and the schema is well known. You can open it, edit it (if you are brave), and otherwise do configuration stuff with it. You can diff it from system to system to find inconsistencies or save it in a source code repository to standardize on a base configuration across web server nodes, if your project needs that kind of thing. Lastly, you can manage it with Powershell… you can manage it with Powershell… you can manage it with Powershell DSC!

The possibilities are endless so stop depending so much on the IIS Server Manager UI like you are in Dev preschool. You are a big boy now, remove the training wheels, but you might want to wear a helmet.

I don’t want to have this discussion again!

Blameless RCA

Let ye without failure cast the first stone.

I am involved in a workgroup at work that is exploring Root Cause Analysis in the hopes that we can come up with a way to help everyone improve their RCA process and procedures.

I believe it is important in our RCA recommendations to strive to build a culture around RCA. To borrow from a theme brought up by a workgroup member, culture building should be extended to retrospectives and all of our continuous improvement processes in general.

Just Culture

For RCA to be most effective we should instill the idea of the “blameless postmortem” into how we envision RCA. Blameless postmortem is an awesome concept that defines a culture around failure called a “Just Culture” that was introduced to me in a blog post by John Allspaw, Web Operations guru at Etsy. It’s a way to encourage team members to own their failures without fear in the hopes that a less hostile environment towards failure will encourage fast, detailed, feedback in active issue resolution and postmortems. We want team members to volunteer to report an issue as soon as they see it or cause it.

Owning Failure

In terms of RCA, this boils down to instilling the idea that finding who’s at fault, what team missed this or that, is not important. The only thing important is how, when, and why an issue was leaked and “who” is not under investigation. Granted who is at fault will most likely come out, and it should, but there should be no condemnation or negative side effect to owning a failure. We want “who” to come from failure owners themselves, not a lot of intricate detective work. We want the team to freely offer their actions that may have contributed to a failures in hopes that we can compile a timeline of multiple narratives of the failure from various perspectives. When we can freely own failure without retribution we are more apt to own up to a failure and share details that led to the failure so that it can be corrected.

Remove Managerial Blockages on RCA

There are managers that want to know who to blame so that they can monitor who is causing issues. If there is a problem with someone continuously failing, it will be evident without having to expose personal failures in the RCA process formally or as a part of team culture. Root cause is usually deeper than one person or team’s failure There are usually multiple stories that contribute to failures. There are managers that use hindsight to amplify the negative effect of failure to try to shame someone into being better. Highlighting what should have been done is not helpful as it doesn’t lead to change. Often times hindsight is disguised as a solution without ever understanding why the actions were taken that caused the failure or even how the manager’s mismanagement may have contributed to the failure. I only add this because I have seen many RCA or postmortems fail because of a manager trying to place blame and using their limited hindsight to declare the problem solved.

And More

There is a lot of good that comes from a Just Cause Culture. Since I saw some things in the RCA practices at work that may lead to the blame game, I thought that a blameless postmortem should be explicitly built into our RCA process in the hopes that it affects the culture. Just something to think about if you are going down this same road.

Orphaned Powershell PSDrive

I received this strange error while executing a script that creates a new PSDrive.

New-PSDrive : The local device name has a remembered connection to another network resource

I tried to use Remove-PSDrive, but

Remove-PSDrive : Cannot find drive. A drive with the name 'S:' does not exist.

I was able to fix this issue with the “net use” command.

First, I ran it to see if the drive was still mapped. I am still unsure how it is there between Powershell sessions, I must have missed something.

PS C:\> net use
New connections will be remembered.

Status Local Remote Network
------------------------------------------------------------------------------
Unavailable S: \\node1\d$ Microsoft Windows Network
Unavailable I: \\node2\it Microsoft Windows Network
OK P: \\public Microsoft Windows Network
The command completed successfully.

Then I ran “net use” with the delete parameter to remove the orphaned drive.

PS C:\> net use /delete S:
S: was deleted successfully.

I love it when a plan comes together.