tag:blogger.com,1999:blog-87930316429652533672024-02-20T04:16:49.598-05:00Migrating To DevopsIn the current age of internet speed, you can't scale a company without having Dev and Ops close together. In older, or more established organizations, the transition to a Devops mentality has many challenges. Here are my thought on Devops from the perspective of an established organization migrating to Devops.Dan Nemechttp://www.blogger.com/profile/01779123453765625136noreply@blogger.comBlogger19125tag:blogger.com,1999:blog-8793031642965253367.post-12348817400566442782014-10-27T07:11:00.000-04:002014-10-27T07:11:09.581-04:00DevOps is Necessary but not Sufficient<span style="font-size: large;">DevOps is Necessary but not Sufficient (my predictions for the future of DevOps)</span><br />
<div>
<br /></div>
<div>
I know I'm late posting this, but I did write it on the plane October 24. (I didn't watch any of Day 1 of DevOpsDays Ghent, so I, hopefully, am repeating what others have said) Here's a blog post brain dump after the awesome DevOps Enterprise Summit 2014 and thoughts leading into the DevOps 5 year anniversary in Ghent. After listening to so many Enterprise stories and having recently become "Enterprise" myself after my company was bought by IBM, some old thoughts are coming together.</div>
<div>
<br /></div>
<div>
Goldratt warned very much about local optimization and I see a progression of creating and "fixing" local optima. Agile came in to "fix" waterfall and created a local optimization in Dev that crushed Ops. DevOps came along to "fix" the problem by extending the scope of the fix to Product (the business) and Ops, now, in many companies this is sufficient; in the enterprise it is still a local omtimaztion. My rationale goes: A company that revolves around a fairly cohesive product (even if it is complex to implement) organizationally is tighter in its alignment to that product (Etsy is a fairly cohesive brand, product and organization). I think there may be some tie-in to Conway's law here. Once you scale beyond a certain size, an enterprise behaves like many small, coupled companies. New problems arise in managing those couplings at that scale/complexity.</div>
<div>
<br /></div>
<div>
Just as Agile has the Scaled Agile Framework to handle scale, many in the DevOps community are working on the DevOps Scaled framework. It's been mentioned before that the future of DevOps won't be called DevOps, so I respecfully submit the term Lean-Agile as the umbrella name. The reason I choose Lean-Agile and not just Lean is that "Lean" has some connotations in the enterprise because of Lean Six Sigma. I know Agile is a subset/implementation of Lean (and DevOps too, for that matter), but Agile has a very strong brand and a growing, strong, positive connotation in much of the industry. So, if I were to put some defintitions or distinctions, Agile means "Iterate all the things", DevOps means "Continuous all the things" and Lean is the foundation of Experimentation, Improvement and Respect needed to make it all work.</div>
<div>
<br /></div>
<div>
DevOps can move to an industry term that people can use for products and teams and "Lean-Agile" becomes the cultural and professional movement that can include sales, finance, marketing and all parts of the organization, not just those hands-on with the technology.</div>
Dan Nemechttp://www.blogger.com/profile/01779123453765625136noreply@blogger.com1tag:blogger.com,1999:blog-8793031642965253367.post-87466888512576892362014-09-09T20:29:00.001-04:002021-09-13T11:25:19.525-04:00What College Degree for DevOps? -- The FeedbackA huge thank you to @lusis for retweeting my question and <a href="http://blog.geeksgonemad.com/2014/09/what-college-degree-for-devops.html" target="_blank">original post</a> bringing in a ton of great responses from all over the DevOps community. This post will be updated as feedback comes in and as I start to draw conclusions.<br />
<br />
A large part of this process is trying to figure out what is important and what isn't when choosing a major. Also, I'm purposely thinking in terms of how to get a job at DevOps-oriented companies versus classic organizations. I'm also trying to gather evidence that getting a job at a DevOps-oriented company really is different, and that certain traditional college routes may not be the best path.<br />
<br />
One common theme from the Twitter feedback from today is that there are many paths to a good job. I have felt and seen this over my nearly 20 years in the tech industry. When your decisions are challenged by family, it's good to have some friends on your side.<br />
<br />
@kevinbehr: "philosophy of science is my personal favorite right now. Followed by Cognitive Anthropology"<br />
@puppetmasterd: "math or hard science"<br />
@grubernaut: "HS education, willingness, and work ethic. imho doesn’t really matter as long as it is a BS. CS, CINS, CINT, etc…"<br />
@aphyr: "Judging by the engineers in our office, best calls are English, physics, archaeology, neuroscience, history, or philosophy. or math, psychology, biochem, CS, womens studies, etc. Any study, in school or out, that makes them write and think."<br />
@ceejbot: "math, linguistics, physics, traditional EECS. Have also worked with great history majors. Very best were dropouts."<br />
@sdboyer: "*write, think, and challenge"<br />
<br />
From colleagues over email and in conversation at the office:<br />
<br />
Get from college the things you can't get on your own: It's harder to just read a book and learn how to be a good programmer, but it's comparatively much easier to read a book and learn how to configure a system or a switch. Especially if you understand the foundational principles.<br />
<br />
Get from college the things that require some time and discipline to learn. You are paying money, get the most from the people who can coach you and encourage you through learning some difficult things.Dan Nemechttp://www.blogger.com/profile/01779123453765625136noreply@blogger.comtag:blogger.com,1999:blog-8793031642965253367.post-81858947666883296662014-09-08T22:36:00.004-04:002014-09-09T20:30:13.791-04:00What College Degree for DevOps?My son is a senior in high school and we are in the thick of college visits and trying to figure out what degree he might pursue. He has a strong aptitude for leadership and organizational dynamics. (He just finished a year as Senior Patrol Leader in Boy Scouts and loved the challenge of figuring out how the organization worked and leading people in such a way that they want to actually work instead of just goof off all the time.) He is not interested in a degree with heavy math or science, but he is a good problem solver and likes to puzzle over problems and pull pieces together to make a solution. His verbal is better than is math scores. He took a Java class in school and enjoys programming but doesn't think he wants to be a career programmer. On his own he came to me and said he thinks he might like something like a career as a system administrator. As a dad who is a system administrator that made my heart proud. That got me thinking about what would be the best degree to foster a DevOps mindset. The degree should create a person with a well-rounded technical background (some programming and some systems) as well as knowledge of the business and the context around the product. In the vein of the original liberal arts degree, teach the student how to think and problem solve, how to navigate the context of their problem space, and let the job fill in the details. The following is what I have learned so far. I am inviting feedback to challenge or validate the assumptions I am making below. I would love to hear from people with experience with the degrees and from hiring managers if my thinking is correct.<br />
<br />
First assumption: I am searching primarily in the Southeast US and primarily Georgia because the HOPE scholarship is so good. Feedback from other parts of the country will be useful to the discussion by may not help me specifically.<br />
<br />
<span style="font-size: large;">The MIS Degree</span><br />
<span style="font-size: large;"></span><br />
I started off with the only thing I knew that combined business and technology: the MIS degree in the business school. I looked, and the degree seems to be very strong in Business with a few technology classes. It looks to me geared toward large corporation back office technologies. One school's highlight is getting SAP certification. Other schools tout their job placement in consulting companies. As I looked at these the first sinking feeling came. They sure looked like their 100% job placement guaranteed you a job working on a large ERP project at some mega-corporation. My career has been at small-ish technology companies and on the dotcom side of the world, I didn't like the feel of the job prospects for an MIS degree. I would like for my son to get a degree that would help him get a job at a company with a DevOps culture.<br />
<br />
First question to the community: How many companies with DevOps cultures are hiring college graduates with MIS degrees for Ops positions? Is that an appealing degree for you for a system administrator? Would you hire an MIS grad on the production side of the house (ops or dev/product)?<br />
<br />
<span style="font-size: large;">The IT Degree</span><br />
<br />
Next, I discovered the degree called Information Technology. This degree is a Bachelor of Science, typically in the engineering school. It appears to me to be an "applied" computer science degree. I think it may be the closest thing to a degree for system administration. Georgia Southern describes IT as: "The IT program at Georgia Southern University prepares students to hit
the ground running in specializations in information management,
networking and datacenter management as well as web and multimedia."<br />
<br />
Two schools (among many) in Georgia with this degree are: <a href="http://ceit.georgiasouthern.edu/it/" target="_blank">Georgia Southern</a> and <a href="http://www.spsu.edu/itdegrees/" target="_blank">Southern Polytechnic</a> (oddly not Georgia Tech or UGA) and there is even an ABET accreditation for that degree.<br />
<br />
This looks like a great degree, but it is very technical and has little focus on business and leadership. Also, if you are looking for a degree from a big-name university, this program is not offered at many top-tier universities.<br />
<br />
Question two: Would companies with a DevOps culture be drawn to a candidate with a BS in IT for an Ops position?<br />
<br />
<span style="font-size: large;">The CIS Degree</span><br />
<br />
Then I found the degree called "Computer Information Systems". The content of the program varies between schools and it does not have any accreditation. The two that caught my eye were at <a href="http://www.cse.sc.edu/cis" target="_blank">University of South Carolina</a> and <a href="http://www.clemson.edu/ces/computing/curricula/undergraduate/cis%202014.pdf" target="_blank">Clemson</a>. (The CIS programs in Georgia either resemble IT or are in the Business Department and are not a BS degree) As I read the description it seems to be an IT degree with a minor in business. From SC "The CIS major combines computing courses (software, databases, networks, and hardware) from the Computer Science and Engineering (CSE) department with a minor in Business Information Management". From Clemson "The <b>B.S. in Computer Information Systems</b> is a combination of cores computer science courses and courses selected from management, marketing, finance, economics, and accounting."<br />
<br />
This sounds like the ideal degree for my son. It is a BS which should appeal to tech companies, but gives him a rounded education so that he can understand "The Business" and quickly move into a leadership position. The downside is the variation of programs from school to school so that when you say "I have a degree in CIS" no one really knows what you know.<br />
<br />
Question three: Would companies with a DevOps culture be drawn to a candidate with a BS in CIS for an Ops position?<br />
<br />
<span style="font-size: large;">Conclusion</span><br />
<br />
Added bonus. After I did all of the above research I found <a href="http://www.geteducated.com/careers/521-computer-information-systems-vs-computer-science" target="_blank">this site which sums up the degrees fairly well</a>. <br />
<br />
What I am going to try first is to see if the Information Technology degree at Georgia Southern can have all the electives tailored for business. That will give an accredited IT degree with a strong business background.<br />
<br />
Feedback? Please.<br />
<br />
Update: I've started collecting <a href="http://blog.geeksgonemad.com/2014/09/what-college-degree-for-devops-feedback.html" target="_blank">feedback in this new post</a>.Dan Nemechttp://www.blogger.com/profile/01779123453765625136noreply@blogger.com21tag:blogger.com,1999:blog-8793031642965253367.post-72458371963464153382013-12-17T16:32:00.000-05:002013-12-17T16:32:06.660-05:00Logstash Metrics Filter and Graphite OutputNot many people have published more advanced metrics filter configurations. After spending a day with the examples and source code I have a more advanced configuration to share. NOTE: I saw weird behavior with logstash 1.2.2 and I'm not sure if it was my in-progress configuration at the time, but after upgrading to 1.3.1 everything worked as expected.<br />
<br />
The problem: We are trying to get metrics on our API usage by user. We were already logging the operation and the user to disk and picking it up in Logstash. Now we want metrics on how frequently each user makes each call.<br />
<br />
After quite a bit of googling, I couldn't find an example where the metric name has more than one field name in it. The last message in <a href="https://groups.google.com/forum/#!msg/logstash-users/QY_OGU08awA/PNQ5JbOu2fAJ" target="_blank">this thread</a> had some pieces of the puzzle as we needed to translate some special characters to be more graphite-friendly. But I got confused as to what was contained in the metric event. The metrics filter creates a new event, but until I threw the event to a file output and saw it I didn't realize the new metric event didn't have any knowledge of any field of the message that generated it (Actually, the "meter" option can use any name from the original event, but no other option can. I tried to use add_field inside the metrics filter and it didn't work). So instead of mutating the metric event, I have to mutate the original event. Also, you can't gsub a newly added field in the same mutate block, so I had to break the gsub to a second mutate.<br />
<br />
<br />
<pre> if [type] == "apicalls" {
mutate {
add_field => [ "modhost", "%{host}" ]
add_field => [ "modorgname", "%{org_name}" ]
}
mutate {
gsub => [ "modhost", "\.", "_", "modorgname", "[\.\,\ ]", "_" ]
}
metrics {
meter => [ "apioperations.%{pod}.%{modhost}.%{operation}.byOrg.%{modorgname}" ]
add_tag => [ "apiopmetric" ]
}
}</pre>
<br />
Now when I send those metrics events to a file output I see my resulting event is<br />
<br />
<pre>{
"@timestamp":"2013-12-17T19:08:42.968Z",
"@version":"1",
"message":"server.name",
"apioperations.qa1.host_domain.Login.byOrg.my_org_name.count":11,
"apioperations.qa1.host_domain.Login.byOrg.my_org_name.rate_1m":0.0,
"apioperations.qa1.host_domain.Login.byOrg.my_org_name.rate_5m":0.0,
"apioperations.qa1.host_domain.Login.byOrg.v.rate_15m":0.0,
"apioperations.qa1.host_domain.GetModifiedRecipients.byOrg.my_org_name.count":2,
"apioperations.qa1.host_domain.GetModifiedRecipients.byOrg.my_org_name.rate_1m":0.0,
"apioperations.qa1.host_domain.GetModifiedRecipients.byOrg.my_org_name.rate_5m":0.0,
"apioperations.qa1.host_domain.GetModifiedRecipients.byOrg.my_org_name.rate_15m":0.0,
"tags":["apiopmetric"]
}
</pre>
<br />
Next is to figure out how to get those to graphite. Since there is an unknown number of operations and users, I had to go to the source code to really figure out how all the options to the graphite output work. It turns out you can't use the "metrics" option as that would need to enumerate each name. The magical "fields_are_metrics" option sends all the fields in the event to graphite. All you need to do is use "include_metrics" or "exclude_metrics" to get just what you want to graphite. Our graphite output looks like this (the file output was for debug purposes only and is turned off now that it works):<br />
<br />
<pre> if "apiopmetric" in [tags] {
graphite {
host => [ "10.1.1.1" ]
include_metrics => [ "apioperations.*" ]
fields_are_metrics => true
}
file {
path => "/var/log/logstash/apidebug.log"
}
}
</pre>
<br />
And Shazam! All your metrics start flowing into Graphite! Dan Nemechttp://www.blogger.com/profile/01779123453765625136noreply@blogger.com3tag:blogger.com,1999:blog-8793031642965253367.post-71424398719065206222013-04-09T10:01:00.000-04:002013-04-09T10:01:51.253-04:00Marching Off the MapThe title is not a new one, but it is a great image of what I feel like the Devops community is doing. The Map is the way businesses have been run for the last 100 years and which the IT industry adopted in the 80's and was mostly adopted even during the dotcom days. In the 80's and 90's Enterprise is what everone wanted. In the 00's as the large web operations started growing "Enterprise" became a dirty word among the cool kids. Now, Enterprise does describe some very large companies, but many of the Enterprise ways are in many smaller companies (generally older ones). Damon Edwards used the term "Classic Organization" and I think that is a much more inclusive and less emotionally charged term than Enterprise, so I will use that term to mean "Orgainzations operating with the culture and processes akin to Enterprise". Classic Organizations are the epitome of the "before" picture in the Devops transformation. Devops (building on Lean and Agile and others) is marching off the map of business models and, I think, incoporating much of the best of the past into new models to lead us into the future.<br />
<br />
Recently, I realized my personal life has been paralleling my professional life in many ways. I'm seeing the core principals of Devops echoed throughout my life. Many people are discovering that things happening in the tech industry will work in running a household or other community too. Not sure if this is behind a paywall but the Wall Street Journal ran a story by Bill Gates in January where he describes what sounds a lot like Lean thinking as a solution to fixing global problems. <a href="http://online.wsj.com/article/SB10001424127887323539804578261780648285770.html">http://online.wsj.com/article/SB10001424127887323539804578261780648285770.html</a> and some interesting replies that generally uphold Lean principles and illustrate the challenges of applying Lean in a "Classic" culture <a href="http://online.wsj.com/article/SB10001424127887324156204578275993802414124.html">http://online.wsj.com/article/SB10001424127887324156204578275993802414124.html</a>. There are also many articles around the internet on running a household on Agile principles. <br />
<br />
Then I heard a few podcasts from Growing Leaders describing the need to look for new ways of communicating with and educating young people today.<a href="http://growingleaders.com/blog/podcast-7-an-interview-with-dan-pink/">http://growingleaders.com/blog/podcast-7-an-interview-with-dan-pink/</a>, <a href="http://growingleaders.com/blog/podcast-8-the-benefits-of-a-gap-year/">http://growingleaders.com/blog/podcast-8-the-benefits-of-a-gap-year/</a>. One theme they share is that in school you are measured on (roughly) 75% IQ and 25% EQ, but in the workforce the proportions are reversed. This tweet illustrates that shift.<br />
<blockquote class="twitter-tweet">
You know what’s harder to teach than programming?Empathy.<br />
— ashe dryden (@ashedryden) <a href="https://twitter.com/ashedryden/status/314824566788734976">March 21, 2013</a></blockquote>
<script async="" charset="utf-8" src="//platform.twitter.com/widgets.js"></script>
The conclusion is that school is not teaching people how to be productive workers. For Devops and Lean to work there needs to be more focus on EQ development in people. It is said that your IQ is relatively fixed from birth, but that EQ can be trained and developed. When you have your technical people thinking more with their "Right Brain" (Big Picture, Context, Synthesis) you should see the culture fall into place much easier. The "Left Brain" logical, analytical stuff is so easy, probably too easy, that we use it as a crutch to not work on peope, culture, empathy, systems thinking, and such. Just read some of the stories about the Etsy Hacker Grant program and its effects. "Right Brain" thinking can be developed and learned. <br />
<br />
This post is rambling a bit through multiple topics but my main point is that I feel Devops is on the right path because its driving principles are echoed throughout life and so many cultures. I put a lot of weight on "uncommon, common sense" where you re-discover eternal truths that are built into human nature (respect, empathy, purpose, quality) and build on top of those. The name "Devops" doesn't really matter and will pass away, but the principles behind it should always be the foundation of all we do. I'm marching off the map at work and marching my kids off the map in their education at home. It's a little scary, but exciting to be doing something new and discovering a vibrant community around you to let you know you are not alone.Dan Nemechttp://www.blogger.com/profile/01779123453765625136noreply@blogger.com1tag:blogger.com,1999:blog-8793031642965253367.post-4986315947826269302013-01-23T14:34:00.001-05:002013-01-23T14:34:57.427-05:00Upgrading Chef Server from 0.10.8 to 10.18.2Here is my story of upgrading Chef from 0.10.8 to 10.18.2 while moving to a new server and updated OS. Someone please comment and tell me where I may have been able to do better.<br />
<br />
So, we are running Chef Server 0.10.8 on CentOS 5.4 with Ruby 1.8.7 and I want to upgrade to latest release of Chef and go to Centos 6.3 and Ruby 1.9.3 at the same time. So I couldn't just do an in-place upgrade on the existing server. I needed to migrate my Chef server to a new system and upgrade everything.<br />
<br />
My first plan was to take the cautious route since I wasn't sure if Chef could be updated that many revs, so I tried to export all data as JSON, build a new Chef 10.18.2 server, then import all the JSON. It worked perfectly EXCEPT all the client's couldn't authenticate to the server even though I imported its public key. I could create a new client key in the Chef server and the node could authenticate, but it wouldn't with an imported key. I spent about a day on this to no resolution. Maybe someone else will have better results.<br />
<br />
Next I tried to just copy the couchDB database. Unfortunately I flubbed things up a few times and spun my wheels for a few days because things didn't work (mostly my fault). Finally I found this method that works:<br />
<br />
1) Compile Ruby 1.9.3 and rubygems 1.8.23<br />
2) Install Chef via chef-solo <a href="http://wiki.opscode.com/display/chef/Installing+Chef+Server+using+Chef+Solo">http://wiki.opscode.com/display/chef/Installing+Chef+Server+using+Chef+Solo</a><br />
3) Fix for the CentOS 6.3 bug for rabbitmq init documented <a href="https://bugzilla.redhat.com/show_bug.cgi?id=878030" style="font-family: Calibri, sans-serif; font-size: 11pt;">https://bugzilla.redhat.com/show_bug.cgi?id=878030</a>. We decided to change the rabbitmq init script to work around the bug<br />
<br />
<pre>CONTROLPROG=/usr/sbin/rabbitmqctl
CONTROL="sudo -u ${USER} ${CONTROLPROG}"</pre>
<br />
4) Add the chef queues because rabbitmq was broken when chef-solo tried to do it. And change the solr maxfieldlength to 100000 to work around the problem of indexing nodes with lots of attributes.<br />
<br />
<pre>/usr/sbin/rabbitmqctl add_vhost /chef
/usr/sbin/rabbitmqctl add_user chef testing
/usr/sbin/rabbitmqctl set_permissions -p /chef chef ".*" ".*" ".*"
ex /var/lib/chef/solr/home/conf/solrconfig.xml
<maxfieldlength><maxfieldlength>:%s/<maxFieldLength>10000/<maxFieldLength>100000/g
:wq
</maxfieldlength></maxfieldlength></pre>
<br />
5) Shut down couchdb and rename /var/lib/couchdb/chef.couch to chef.couch.bak<br />
6) Copy the couchdb database from the old server (can still be running)<br />
7) chown chef.couch to be "couchdb:couchdb"<br />
8) Start couchdb back up<br />
9) Start rabbitmq, chef-solo, chef-expander, chef-server, chef-server-webui (in that order)<br />
<br />
Now I wish I could say it's working at this point, but all the cookbooks are broken. Maybe someone from Opscode can tell me where the cache is of cookbook files that can be copied. But I tried to load all the cookbooks with a <span style="font-family: Courier New, Courier, monospace;">knife cookbook upload -a -d</span>, but that still didn't give me working cookbooks. In the UI when you click a cookbook it says "end of file reached" and has no data. The name and version are there, but no contents. I had to <span style="font-family: Courier New, Courier, monospace;">knife cookbook delete -p</span> each cookbook, then when it was added back MOST of the cookbooks worked. Some still gave the "end of file" and I had to purge them one-by-one and upload one at a time.<br />
<br />
I hope this helps someone. Let me know if you want more detail on any step. I still haven't done extensive testing on the new server but the few clients I tested seem happy. I'm really looking forward to an omnibus install for Chef Server 11 and hope the migration is not painful.Dan Nemechttp://www.blogger.com/profile/01779123453765625136noreply@blogger.com0tag:blogger.com,1999:blog-8793031642965253367.post-53138140765666066062013-01-03T08:38:00.000-05:002013-01-03T08:48:20.708-05:00Talking Deming with my DadDriving home from a hunting trip with my father, conversation turned towards work. I started trying to explain how we're trying to adapt "some old concepts from the manufacturing industry, now called Lean" to the IT industry and how it fits remarkably well and people are excited to find that when you boil down the patterns that make the best IT companies tick, you re-discover patterns that were spelled out in the manufacturing industry a half-century ago. As I'm feebly trying to put this in words my father stops me and says "Back at Best Foods I was in the Quality Control department and our driving principle was 'Quality is conformance to specification'. That came from a guy named .. umm.." And I pipe up "Deming?" And he lights up, "Yeah, Deming." He then goes on to explain how Best Foods (maker of Skippy Peanut Butter and Hellmann's Mayonnaise) was a great company to work for and the QC department had the ability to stop the line and were an integral part of the business. Unfortunately they closed the plant he worked in and he was unwilling to move out of state so he went to Anderson Clayton Foods (now owned by Kraft) to work in QC there. At Anderson Clayton (ACF) they had the alternate definition where "Quality is fitness for use." He's not sure if it was the definition of quality, or the fact that the QC department at ACF reported to the Plant Manager and the Plant Manager's incentives were based on product shipped. I quote: "We shipped some marginal product."<br />
<br />
Where am I going with this? I'm not exactly sure. It was great bonding with my father over talking about Deming and quality and how the things he dealt with are big concerns in the IT industry today. It just struck me the stark contrast in how he sounded talking about how great it was to work at Best Foods and how he was lifeless talking about Anderson Clayton.<br />
<br />
I guess my conclusion is that I am adding another data point that I believe <a href="http://itrevolution.com/">itrevolution.com</a> is on the right path looking for patterns from Deming and Lean. When I asked my father if he thought based on his experience that Deming would be a good pattern for us to follow he answered without hesitation "Yes". He summed it up by saying "So you're trying to translate Deming from widgets to digits."Dan Nemechttp://www.blogger.com/profile/01779123453765625136noreply@blogger.com0tag:blogger.com,1999:blog-8793031642965253367.post-67757030528781707852012-11-13T08:44:00.002-05:002012-11-13T08:50:12.754-05:00Maybe Treating Computers Like People Isn't BadHere's an interesting take on the notion of treating computers like people. I think it has been well established that it is a bad thing to treat <a href="http://potus98.blogspot.com/2011/04/artisan-server-crafting.html" target="_blank">individual machines like individuals</a> (pets, works of art, etc...). But I just came to a realization that it may not be bad to treat systems of machines like communities of people. After listening to the Devops Cafe podcasts with <a href="http://devopscafe.org/show/2012/5/10/devops-cafe-episode-27.html" target="_blank">Mark Burgess</a> and <a href="http://devopscafe.org/show/2012/4/23/devops-cafe-episode-26.html" target="_blank">Adam Jacob</a> and hearing their thoughts on orchestrating activities in the datacenter and while I'm reading the excellent "<a href="http://amzn.com/1439817561" target="_blank">Lean IT</a>" these thoughts started to gel. Now I'm sure others have thought this and probably written about it, but it's a new idea for me and I haven't heard it put this way.<br />
<br />
The basic idea is that the techniques used to manage teams of people may be analogous to the ways to effectively orchestrate systems. Adam is working hard on how to make the individual actors (machines) take a strong role in decision making. This is like the Lean philosophy of the "Gemba" where the people on the floor are most effective at solving problems, and the all-knowing manager is not as effective. The manager's job is to set The Standard and to keep tabs that The Standard is met, but ideally to dictate as little day-to-day action as possible.<br />
<br />
The good/bad thing in this is that each actor is a computer with it's own set of outside influences. Just like people and a factory floor there are subject to random events. But these actors are also code where we can specify how they behave to certain stimuli. They aren't subject to moods like people, but they are limited by our ability to code their behaviors and responses. Here is where Mark Burgess's AI experience can make cfengine4 leapfrog the competition by making the best code-representation of an actor with the best decision-making capabilities.<br />
<br />
What are your thoughts?Dan Nemechttp://www.blogger.com/profile/01779123453765625136noreply@blogger.com2tag:blogger.com,1999:blog-8793031642965253367.post-57794375123910191202012-06-27T13:52:00.001-04:002012-08-07T09:32:28.208-04:00Lightning Blog - Networks, not HierarchiesA quick post to get a thought out there before it slips out of my brain.<br />
<br />
Just got out of a great talk at the O'Reilly Velocity conference on threats to the internet by Albert Wenger (@albertwenger). The thought that struck me was that the internet was created as a network on the organizational level as well as the infrastructure level. Networks have been a popular topic lately, for good reason. I think network thinking is the best solution to most, if not all distributed problems, and in the internet world, most things are distributed problems. This can apply down to your servers, and all the way to defining the Devops community.<br />
<br />Some thoughts (some mine, some Albert's)<br /><ul>
<li>Hierarchies tend to scale linearly, Networks tend to scale exponentially </li>
<li>We need to focus our activities/thoughts/conversations on preserving the network nature of everything around us. </li>
<li>It's harder and scarier to think about networks </li>
<li>Devops is a network. We don't need to be threatened by groupings that form among members. </li>
</ul>
Now, the big thought: I recently listened to The Federalist Papers by Alexander Hamilton (I love http://www.librivox.org). It is 85 essays arguing in support of the Constitution of the United States of America. At the time after the Revolution was won and the States were independent from Britain, the 13 colonies became states and they had some Articles of Confederation defining a very loose coupling of states. The Founding Fathers argued that a Confederation could not survive and that a Federal srtucture (Republic of States) was a more sustainable and scalable model. In many ways both structures are networks. For Federal structures, the trick is to define just enough central power where cost-benefit makes sense (e.g. national defense and treaties with other nations), but maintaining as much autonomy in the nodes (states) to allow innovation and growth. Will a pure confederacy devolve into chaos? It's interesting to hear a conversation in the 1780's arguing the benefits of a networked political structure.<br /><br />Since this is not a fully thought-out post, I don't know when a confederation or federal structure is best or if it differs by case, but a core philosophy of both is that power is distributed and behave as networks. I know a network is the right answer.<br /><br />What do you think?Dan Nemechttp://www.blogger.com/profile/01779123453765625136noreply@blogger.com0tag:blogger.com,1999:blog-8793031642965253367.post-64625971906949409502012-05-17T11:27:00.000-04:002012-05-18T11:12:27.690-04:00Automating Application ConfigurationA lot of what I will discuss below is useful for anyone beginning any configuration automation project. I will detail out where I feel application configuration has specific challenges that differ from system configuration<br />
<h2>
Before You Begin</h2>
The first and most important thing anyone must do before embarking on an automation project is to create the standards for the environment. You must have a consistent naming scheme, packaging scheme, and directory layout. There has to be a simple way to derive "any file of this type shall be named like this and stored in a place like this". Have you ever written up a presentation to your boss after an outage where one slide is titled "Everything is Different" and go on to tell about how much room for human error exists because you have no standards?<br />
<br />
But consistent naming is not enough.<br />
<h2>
Model Your Environment</h2>
What caught us off guard when implementing Chef was the amount of effort it took to model our environment. At first we didn't even know we needed to model our environment. We had a great set of standards for naming conventions, but the thing Chef offered us was now a hierarchy of attributes. We could define something once and have it used repeatedly. That always sounds great on paper, but when you get down to implementing it, it opens up cans of worms you never knew you had. We had to invent an appropriate hierarchy of configuration data. Even more, it needed to exist in a way compatible with a tool we knew little about.<br />
<br />
I think modeling is more involved for application configuration automation than for system automation. (Please correct me in comments if you think otherwise). For system config automation you focus on the standard dev/qa/prod hierarchy of systems. For application config automation you turn that on its head and your "system" is now an instance of your application suite. The suite is the unit of configuration where you stamp out a fully running suite in dev or qa, or prod. Servers are secondary because the suite resides on a set of servers working as a unit and the suite needs to be aware of all the servers participating in that instance , but your primary focus is on configuring the application suite.<br />
<br />
We built a cookbook called "derivations" that has a bunch of recipes that derive other sets of attributes. Yes, we could have done all the searches on the fly, but our thinking right now is that we like seeing our configuration persisted as node attributes. So our derivations cookbook makes a bunch of node attributes that describe various collections of servers needed to configure the application suite. Here, each node has attributes describing what all the other nodes in the suite are called and some things about what they do. Examples: "All app tier servers", "all memcached servers", etc... Various recipes can use those node attributes to write configuration files. Know that all this is point-in-time and if you add a new server to the environment, you need to run chef-client on all other nodes to have them discover their new neighbor.<br />
<h2>
How much is Just Enough</h2>
First you come up with an elaborate hierarchy that's perfect until you realize that as soon as an actual person six months from now needs to add an attribute they are going to spend half a day debating where it belongs. Or someone needs to figure out where something is defined and has to look through dozens of files and hours tracing complex paths of inheritance. DBA's have wrestled with this issue: just google "normalize too much" for lively discussion. As Adam Jacob said (paraphrasing) "If you can get to a 98% solution you will find that you likely can change something external so that the 2% edge-cases go away."<br />
<br />
Add to this challenge that you barely know how this new tool works. How does the tool handle hierarchy and inheritance? How do I keep from going down a design path that isn't supported by the tool? At some point you have to work from both ends and check your thinking against the tool(s) you are going to use.<br />
<br />
We honestly spent several months coming up with a model of our environment with what we believed was "Just Enough" hierarchy that could meet our design goals of reducing the sources of configuration data to as few locations as possible, while keeping it "human" where we believe someone 1 year later can figure out where things are. I encourage you to spend a lot of time on this. Getting your hierarchy close to right the first time will pay back dividends later. If you skimp here, you may be spending a lot of time reworking code to a new convention. (More below on the "Third Time" rule)<br />
<br />
For our model we chose our nodes to have a run list that is all roles. Most roles have only attributes in them to give us our "just enough" hierarchy of configuration. Only one type of role has a run list with actual recipes in it. It looks like:<br />
<br />
<ul>
<li><b>datacenter </b>- a few attributes defining the datacenter - useful for searches like "Find everything in datacenter X". Exclusively attributes, no run list.</li>
<li><b>logicalsite </b>- the term we came up with sort-of equivalent to environment. We group our servers inside a private DNS Top Level Domain to differentiate environments (dev, qa, load test, staging, production, etc..) so our logicalsite name is the DNS TLD of the environment. A datacenter may have many logicalsites in it. Almost exclusively attributes and the place we get the most bonus of high-level attributes.</li>
<li><b>pod </b>- our name for one complete, running instance of our entire product suite. A logicalsite may have multiple pods running in it.
Exclusively attributes, no run list.</li>
<li><b>tier </b>- based on standard tiers like "app" and "data" but used to break apart the suite into deployed code. In our tier role we set the run list to satisfy all the dependencies to build a node of that tier. A node can have multiple tiers and we specifically designed them that one server can be all tiers or all tiers can be spread among individual servers (here is where your naming convention gets tested). A tier role is mostly run list and only a few attributes.</li>
<li><b>constants </b>- Not a role, but a Chef data bag. Some things you need to be universal constants across every possible environment. Here are things like IP Ports, mbean names and such. Now you know that every environment will have each service listening on the same port without variation.</li>
</ul>
<h2>
The Product Manifest - The Secret Sauce</h2>
<div>
All of the above configuration is great, and in some ways is not really specific to application or system configuration. We have one additional set of attributes that comes into the Chef run from the outside and ties everything together. When our product suite is built one artifact that comes with it is what we call the "product manifest" (props to our awesome CM team who built this). It is a JSON file that describes every piece of code that needs to run and lots of metadata about it. In one blow I know everything about every piece of code that needs to run (build stamp so we get the right version, tier to deploy it on the right servers, dependencies like java or tomcat version). Now I have the ability to say "Deploy manifest version X to this environment" and the right code goes on the right server types with the right configuration data. There is no "dev manifest" and "prod manifest". It is one manifest used for all environments with no variation. Your variation comes only in your Chef roles named above, and that variation is as little as possible (URL names, memory settings and such).</div>
<br />
<h2>
Third Time Rule</h2>
The first time you are so excited simply that it works.<br />
The second time it works again, but you may have some misgivings about imperfections.<br />
The third time the flaws in the design become clear. You realize "We should have called this something else" or "We should have grouped this way" or ... By the third time, you figure out what is really important, and it's often not what you thought the first time.<br />
<br />
Be patient and diligent. Refactor fast and often. Don't let bad code languish. Stamp out technical debt while it's fresh on your mind. Bad automation is REALLY BAD! The tool can implement something destructive really fast across all your server. (See below about testing). You have a limited window to get buy-in and the more you have to stammer "ummm, well, it really should be ..." the harder it is.<br />
<h2>
What Next?</h2>
First, determine where you are on the spectrum of standardization. You may not have one naming standard, but instead have a dozen evolutionary standards set by multiple admins, datacenter moves and company acquisitions. This is the hardest case because you will be implementing one new standard along with new automation software. The new standard has implications on monitoring and daily administration and can be very invasive. The "Third Time" rule will likely bite you frequently as well, because you don't know what you don't know. You may have a good file, package and directory structure, but lack a coherent model of hierarchy (this is where we were). Or you may have great standards and were just waiting to plug them into a tool.<br />
<br />
Spend a lot of time working out your standards on paper. I would expect it to take over 3 months to come up with standards and a model. In this time you aren't writing a single line of code. Resist the urge to code. Learn the tool inasmuch as it will help you make standards, models, and hierarchy. You're asking questions like "What should I call this?" "How does this fit with feature X?"<br />
<h2>
Iterate</h2>
Yes, I just told you to spend a lot of time modeling your environment, but I assume some of that time is learning a new tool, or comparing multiple tools. Once you have your tool and your model I think the Agile philosophy is great for development. Start with a small problem and solve it. Every release should be production ready. When you have one thing done go to the next. Early on, "Production" is your test system, but before long you will have real value and want to start using your shiny new automation everywhere.<br />
<br />
As Ops people we wanted to plan for every possible future scenario. Here is another use of "Just Enough". We decided to iterate and make just enough tuneable to match our current world. If we need more, we will add it later. We didn't want to spend all our time coding for situations where we don't have an immediate need. Be diligent about writing "Just Enough" code. It's easy to fall into the trap of "but we might need this feature..."<br />
<br />
Try to minimize backward compatibility by fixing as many standards first, but you will surely find yourself needing to support some one-off situation by having some compatibility logic in code. The good thing about automation is that you can turn it all into code. "If OSVERSION=X then install pkgX, else install pkgY". Get a plan to fix the on-off in your environment and remove the code as soon as possible.<br />
<h2>
Follow Your Development Lifecycle</h2>
Caveat: We are not a continuous deployment shop. If you are continuous deployment, your lifecycle will probably be very different.<br />
<br />
The other thing that caught us off guard was the strong need to follow our product's development lifecycle. It helped immensely that the group implementing Chef were Ops guys inside Dev and we sat really close to the software CM team. If you are in Ops, make friends with the team managing your source code. I can't say it strongly enough: If you are automating application configuration, the automation will follow to a high degree the lifecycle of the product. You need to know the release cycle and branching strategy.<br />
<br />
With Chef, our Chef Environments are primarily Product Release Versions, not dev/qa/prod. We branch many of our cookbooks with the product, and we release some new Chef features with product releases. With a focus on application configuration we have found we use Chef in ways not common in the community.<br />
<h2>
Automation needs its own QA</h2>
<div>
Again it helped that we were inside Dev and managing the dev and QA systems, so we had a clear sense of building and testing the automation before releasing to production. I know how fallible I am when it comes to making changes in production. I, personally, have been responsible for building tools for production only and never implementing them in Dev and QA. Picture in your head what it would be like if your lowliest dev server and your biggest production server were all configured by the same templates. Birds sing and unicorns dance. It is possible. Don't settle for less. You have one shot at this. (re-read above about the product manifest)</div>
<div>
<br /></div>
<div>
Practice, practice, practice. Test your automation on a bare system and on built, running systems.<br />
<br />
Your "development" Chef server does not manage your development systems, it is a Chef server that manages your sandbox environments where you can write new code and break stuff. Your product development systems are still "production" to someone, so you need to develop and test your Chef code in a way that won't break any environment that the business depends on.<br />
<h2>
The Devops Problem - Application vs. System</h2>
What if you want to keep system management and application management separate events? Out of the box you think about Chef in a one node, one run list way where system and application configuration converge in one run. There is a way to separate the events where you can let Chef be used by the systems team for system work, and the application team use it for application work and have as little overlap as possible. It turns out to be quite useful (not perfect, but quite workable) to maintain that separation where you can have the ability to implement a "system" change event or a "product" change event without having to coordinate between them.<br />
<br />
In Chef, one server can have multiple nodes defined on it. One node is used for "system" management and the other for "product" management. They have separate run lists so you can tell them exactly what needs to be done. Given this, the app support team can now update application configuration without impacting any system configuration. Some system events may need to have a "not_if" tied to the running application, because some system configuration can have negative impact on running code.<br />
<br />
Here you find ways to tie into your orchestration tool. We write out application config files to a directory named "predeploy" and the orchestration tools are responsible for copying the files to the running location at the right time. Chef makes the application config files but does not implement the changes into the applications.<br />
<br />
There is a gray area here. In the perfect world your product run list will have all dependencies needed for the product (including system dependencies), but then you find yourself in the one run list model. We settled on a tight-loose coupling between system and product. When a server is bootstrapped the system node bootstrap goes first then the product node bootstrap, then there is a running system. After that they are loosely coupled and can iterate at their own pace. There may be some cases where one may depend on another (e.g. big datacenter-wide structure change), but by taking the 98% lesson, try to make those go away instead of coding for them or ensure they are so seldom they can be a one-off event.</div>
<h2>
Lessons Learned</h2>
<div>
It was a huge victory the day someone said "server X isn't working the way it should" and I immediately, without effort, threw away the notion that it was configured wrong, and started looking elsewhere. Think of the mental energy you save when you can eliminate one whole set of variables with a wave of a hand.</div>
<div>
<br /></div>
<div>
The Toss Test. When we needed to upgrade the OS, we re-kicked the boxes and bootstrapped from scratch. Throw the old box away and build it fresh.<br />
<br />
If you automate something you have to be 100%. If automation manages a file put a banner at the top saying "Managed by Chef. All manual changes will be overwritten!!!!"<br />
<br />
As scary as it sounds you do want chef-client to run often (you define often). The less change between runs the better. Once all change is converged, subsequent runs will do nothing and are safe.</div>Dan Nemechttp://www.blogger.com/profile/01779123453765625136noreply@blogger.com4tag:blogger.com,1999:blog-8793031642965253367.post-3898692761029851132011-08-30T09:12:00.001-04:002011-08-30T09:12:58.428-04:00Automation Getting Started Guide Chapter QI don't know if I'm actually writing an entire Getting Started Guide to automation tools, but here is a chapter in the book that is getting written all through the community.<br />
<br />
After you have installed your tool Configuration Management tool (e.g. Chef, Puppet, etc...) and have done your proof of concept and think it's really cool and is going to be really useful and have your first set of Kanban stickies on the board to write some code, read this.<br />
<br />
But before you read this, read the chapter called Infrastructure as Code in "Test-Driven Infrastructure with Chef" by Stephen Nelson-Smith. Specifically the section titled "The Risks of Infrastructure as Code" It will justify you for being where you are, and also set the appropriate tone of sobriety to your automation endeavors, and get you off to the right start.<br />
<br />
One thing that hit me full force was the need to emphasize the <strong>CODE</strong> part of Infrastructure as Code. If you are a sysadmin like me, strong coding practice is not in your DNA. Reach out to your CM Team or whoever manages your source code repository and get some lessons in code management from them. Then reach out to a developer or two and get some lessons in (very) basic design patterns and a few "What Not To Do" tips.<br />
<br />
Now, when you write some automation code (cookbook, manifest, etc...) that is destined for production, after you get the first draft written so that it compiles and runs without error your first job of refactoring is to answer the following questions. Repeat this exercise for every block of code you write (generically described below as a "function")<br />
<ol>
<li>What will happen if this function runs on a brand-new system? (What are the prerequisites?)</li>
<li>What will happen if this function runs on an existing system but has never been run before? (different prerequisites?)</li>
<li>What will happen if this function's behavior is changed from the last run?</li>
<li>What will happen if this function is re-run on the same system with no other changes? (idempotence)</li>
<li>What will happen if the prior run failed? How will the function recover from a failed or partial run?</li>
</ol>
We found from experience that until all of the above questions have been considered your code is not ready for production, because you are at risk of unexpected behavior. The key to automation is predictable behavior. You will be amazed at how many ways automation can be unpredictable because you coded it poorly.<br />
<br />
Not every question needs to be answered every time, and they are most important when new code is written.<br />
<br />
There is some great discussion on this topic in the <a href="https://groups.google.com/d/topic/devops-toolchain/ZboQ4gwWMSY/discussion">devops-toolchain Google Group</a>, notably advice on using as many features of your tool to help make the above questions non-issues and to avoid code repetition. Also some interesting discussion on Greenfield systems (question 1) and Brownfield systems (question 2).<br />
<br />
What are some of your getting started lessons learned? Care to add a chapter to the guide?<br />
<br />
Dan Nemechttp://www.blogger.com/profile/01779123453765625136noreply@blogger.com1tag:blogger.com,1999:blog-8793031642965253367.post-36009110672879046242011-07-19T09:28:00.000-04:002011-07-19T09:28:42.871-04:00Done Means DeployedJohn Willis of <a href="http://www.dtosolutions.com/">DTO Solutions</a> was shaking out their "<a href="http://www.dtosolutions.com/devops-workshops/">Devops Workshop</a>" here in Atlanta. Trying to cover 2 days of material in 1 day was a herculean effort. I took a bunch of notes, but one sentence resonated strongly with me.<br />
<br />
<span style="color: orange; font-size: large;">Done Means Deployed.</span><br />
<br />
If you read Agile books and blogs, or probably if you work in a software shop you have read or heard discussions about "Done-Done". We have them fairly regularly. For us, Done-Done means QA accepted all available tests. I don't think regression is even a requirement. Done-Done means the code is <strong>ready</strong> to be deployed to production (not all code gets regression tests).<br />
<br />
When I heard John say "Done means Deployed" it all clicked. If your Development Department believes "Done-Done" is "QA Accepted" then your developers have the classice "Throw it over the wall" mentality discussed by the <a href="http://dev2ops.org/blog/2010/2/22/what-is-devops.html">dev2ops blog.</a><br />
<br />
If your developers adopt a "Done means Deployed (to production)" mentality, then they are invested in the code all the way until it is in front of the customer. They can't disengage from it until it passes the ultimate QA (the end-user). We haven't achieved this here, but I would bet anyone who has has seen a dramatic increase in software qualtiy as a result.<br />
<br />
In order to do this you need, at a minimum, frequent deployments, and ideally approaching continuous deployment. If your deployment cycles are on the order of months, it is impossible because too much happens between check-in and deployment. You can't wait 3 months to call something done, and you can't expect a software developer to stay fresh on that much code.<br />
<br />
We're working on speeding up our deployment cycle. I'm keeping this idea in front of us as a stretch goal.Dan Nemechttp://www.blogger.com/profile/01779123453765625136noreply@blogger.com3tag:blogger.com,1999:blog-8793031642965253367.post-58954961918224058732011-06-01T09:28:00.195-04:002011-06-14T09:32:51.689-04:00What makes a good Generalist?I am glad that Devops is bringing generalists out of the closet and showing how valuable they are and how much companies need them. Also, I would bet money that the majority of successful "specialists" have a fairly broad set of knowledge outside of their specific job function. So, your best generalist is a competent specialist, and your best specialist is a competent generalist.<br />
<br />
So, "What makes a good generalist?"<br />
<br />
Since we can only learn one thing at a time, I think most generalists started out as specialists (Solaris System Administrator, got CCNA certified, passed the MCSE test, Oracle certified, etc...). But, over time they gained knowledge outside of their special area in order to solve a problem. Over time that accumulation of knowledge became patterns of understanding they can use to synthesize information and make decisions. Finally they gain wisdom to have insight into things and project into the future.<br />
<br />
I think one of the greatest contributions Devops makes is in shining a strong light on the fact that there is only one problem: The Business Problem. There is not a system problem and a software problem and a network problem and a security problem, there is only the business problem. The more everyone in the business <strong>knows</strong> about the rest of the business, the more they can <strong>understand</strong> how the parts relate, and ultimately make <strong>wise</strong> decisions to help the business succeed.<br />
<br />
You can develop knowledge, understanding, and wisdom from others. One way to foster generalist growth is to rotate new employees through various departments when they are hired in order to give them a full picture of their peers. It's not quite cross-training because you don't need them to be competent in the job. It's more like cross-exposure so they can feel some of the pain and absorb some of the knowledge, understanding, and wisdom of the other departments. Every department has wisdom worth knowing. The level 1 CS person knows a lot about the software after the 10th customer calls up about a feature that doesn't work like it should. Or, the Ops person gains knowledge as they sift through all the "normal" errors in the log file to find the one, little "important" error buried in the noise. And the developer knows how to work with a team sharing a software repository; and knows what is the right version of the application and how a lifecycle is important for any code.<br />
<br />
Our company rotates new developers into a support role which is a good start. I think it would be better if every Dev or Ops employee spent a week in some rotation like: Customer Support -> QA -> Security -> Development -> Operations to see the full context of their work and how it affects all departments and how all departments affect what they do.<br />
<br />
When we have a broad knowledge of our business and feel some of the pain of our peers, I think we will be more successful at whatever specific role we have.Dan Nemechttp://www.blogger.com/profile/01779123453765625136noreply@blogger.com0tag:blogger.com,1999:blog-8793031642965253367.post-27809499232317969162011-05-23T09:14:00.162-04:002011-06-14T09:32:51.676-04:00Two Approaches to DevopsI know I'm a little late in posting my follow up from the April Devops Meetup in Atlanta, but it was a great morning and I wanted to share some of the things discussed. Not everything in this post was explicitly discussed in the meetup, but some are thoughts I had related to the topics.<br />
<br />
<span style="font-size: large;"><strong>Artisan Server Crafting</strong></span><br />
First, I put out a challenge/request for John Christian to record his routine on "Artisan Server Crafting". He talks about how the traditional system administrator "crafts" each server, gives them personal names, and treats them like family. "Oh Look! Gandalf has been up for 365 days, let's throw a party!" The cloud makes the family too big that you can't give each of your children the attention they "deserve". I guess we just have to treat our machines like, well, machines. Automation, not art.<br />
<br />
<span style="font-size: large;"><strong>Devops, bottom-up</strong></span><br />
Next, John talked about how Devops got started inside his company. They started with one person from Dev and one from Ops working together on some automation to improve their lives. This is a common vector for Devops in companies. Start small as a skunkworks project, produce some results that show business value, then get management buy-in to continue the work and hopefully dedicate more time to the project, show more business value from your incremental success, then the business is hooked and you can't go back. I'll ask John to write a guest post to go into more detail.<br />
<br />
The challenge of the bottom up process is that it is hard to get past the 1.0. You start out with a few energetic people that get things going, but scaling up can only happen with management's support. How do you cope with the original team moving on to something else? Are the new people going to be able to sustain progress on energy and enthusiasm alone? To move beyond 1.0 you have to show business value and show how the goals of Devops are aligned with the goals of the company. Also, you need to maintain the balance of Dev and Ops. A dominant personality can sway projects one way or another and alienate one side of the team. <br />
<br />
Bottoms-up does work and has the potential to create a great deal of cohesion between Dev and Ops. Just be aware that at some point someone from the team is going to have to sell the story to management and get the business bought in. Devops is not complete unless the alignment goes all the way to the top of the management chain.<br />
<br />
<span style="font-size: large;"><strong>The Devops Team</strong></span><br />
<br />
A second way of introducing Devops occurred at my company. We unintentionally, through a series of reorgs, created a "Devops Department" without really knowing it. We created an Ops team inside Dev to take care of the non-production (Dev, QA, Load Test, CM, etc..) systems. Since this team reported up to the Dev executive and was chartered to take care of Dev's needs, there was a natural alignment of goals. This team and the Configuration Management Team got together and started automating deployments. After about a year building up Control Tier to deploy the code and succeeding in automating all deployments from Dev through Production the focus went to configuration. We have a suite of Java apps that are "overconfigured". Our current project is automating configurations of applications.<br />
<br />
Automation and "Devops" got started with a standalone team inside Dev, but through a reorg that team merged with Production Operations. This was actually the best thing that could possibly happen because the biggest risk of a dedicated Devops Team in an organization with separate executives for Dev and Ops is that the team must naturally report up one silo and not have an affinity to the other silo. Also there is a more subtle factor that comes to play. The Devops team is not a part of any one Dev team, and not a part of any one Ops team, so Dev and Ops both think the team is an outsider. The whole point of Devops is lost. Dev doesn't have any ownership and Ops doesn't have any ownership. The team spends a lot of time trying to sell to the bottom and to the top.<br />
<br />
Now, we were fortunate that the original Devops Team was populated with some of the senior people in the company that had deep relationships inside Dev and Ops so the selling wasn't too hard, but if you are considering a Devops team, the team will have to have strong support from both Dev and Ops executives with the ability to roam freely within both organizations. (This assumes Dev and Ops have different executives.) <br />
<br />
So now our Devops 1.0 was a standalone team inside of Dev, but after the reorg the members are in Ops. But we have the benefit that the first project was to automate deployments which helps both teams, the second was to automate configuration which simplifies life in Ops and gives more ownership to Dev (that's just how things have been over here). Our third phase now fully branches with Ops embracing automation for system configuration and Dev thinking of the code in terms of operational impact and how it can be run and maintained easier and with automation.<br />
<br />
But, you argue, Devops is about People first, then process, then tools. I think if we analyze the stories from companies we will find that the tools are the gateway drug to bring in Devops. You can stop at tools and just have some automation, or you can show the business value your tools bring and start a revolution that will ultimately encompass people and process. We have our tools now, we are in the long stage of battle to align the people and process.<br />
<br />
I'll conclude with my wholehearted agreement that I believe that Devops will be most successful if Dev and Ops report to the same executive before the CEO. If you have two silos and they don't share goals, Devops will remain a bottoms-up battle.Dan Nemechttp://www.blogger.com/profile/01779123453765625136noreply@blogger.com1tag:blogger.com,1999:blog-8793031642965253367.post-53884518541382737042011-01-27T10:00:00.000-05:002011-06-14T09:32:51.681-04:00Chef Is My DocumentationWe have an ongoing project to automate the management of our custom software's configuration files. There is a hierarchy and some groupings of configuration data so we wanted to define configuration at the highest level possible and have its use be inherited at lower levels and with groups. We looked at all the "configuration management" software in the open source and decided that Chef had the right flexibility for our need. It wasn't a perfect match, but it was the closest thing available.<br />
<br />
When we started the project to automate configuration I was in dev, but have since moved back into ops. Since one of Chef's main purposes is "system configuration management", the work we have done for configuration files is directly applicable to production operations' system management. So we're training and "selling" the tool to the Ops and Infrastructure teams as more than an application configuration tool. As I was putting the finishing touches on a <a href="https://docs.google.com/present/view?id=dgmvg9bb_114fwrvwpcp">presentation</a> giving an overview of the effort and the rationale behind a new configuration management system, I came across a blog post by Jez Humble in his post in <a href="http://www.agileweboperations.com/what-devops-means-for-enterprises">Agile Web Operations</a> where he said:<br />
<br />
"Effective configuration management – including automation of the build, deploy, test and release process, provisioning and maintenance of infrastructure, and data management – make the whole delivery process completely transparent. As any good auditor will verify, there is no better documentation than a fully automated system that is responsible for making all changes to your systems, and a version control repository that contains everything required to take you from bare metal to a working system."<br />
<br />
That quote summed up the presentation I wrote on why we needed to automate our configuration file management. I didn't have as cogent a thought when I wrote my presentation, but am thankful for Jez to frame the problem so well. The words "there is no better documentation" jumped out at me and I used that to shift my thinking and reframe the rationale behind why we are automating configurations.<br />
<br />
<strong><span style="color: #9fc5e8;">Chef is our documentation</span></strong><br />
<br />
Everyone makes an attempt at documenting their configuration. Between wiki pages and emails you can probably piece together 80% of your documentation. The problem comes when you make changes you have to keep your documentation up to date and in the heat of the battle documentation almost always gets left behind. Then you are tasked with building another system and you spend weeks of trial and error, copy and paste, search and replace to build a system. Your documentation never seems to be complete or up-to-date enough.<br />
<br />
I've lived there in all my ops jobs and we're now fixing that problem. Our documentation is <span style="color: #9fc5e8;"><strong>runnable</strong> </span>configuration. The emphasis is on the runnable. If your documentation is a copy of your configuration (a wiki) you will never be able to keep it up to date. If your documentation is runnable, it will always be up to date. Now if that "good auditor" comes by and asks for documentation of how our systems and apps are configured we have complete, accurate documentation at all times. We don't have to scramble at the last minute to update a bunch of wiki pages. As soon as a new release of software goes into production the act of configuring the software for the deployment is also the act of updating the documentation.<br />
<br />
The way we are doing it, the Chef database is the presentation of the documentation but not the source of the documentation. All configurations are saved in version controlled JSON files (either roles with attributes or databags) so all configuration is versioned and even if the Chef database gets destroyed we can re-create the database from source JSON. The files are named, scoped, versioned, and updated in a way that requires the fewest places to make changes while maintaing adequate clarity and re-use.<br />
<br />
I'll follow up with a post on some of the technical details of what we are doing and the decisions we made along the way. We just discovered a few behaviors of Chef that somewhat complicate this plan, but nothing severe enough to be a showstopper.<br />
<br />
Let me know what you are doing or what you think about the plan.Dan Nemechttp://www.blogger.com/profile/01779123453765625136noreply@blogger.com0tag:blogger.com,1999:blog-8793031642965253367.post-56652140915810150262010-08-04T14:33:00.000-04:002011-06-14T09:32:51.684-04:00First Atlanta Devops Meetup Thursday 7PMRegister and details at http://www.eventbrite.com/event/764112481<br />
<div class="panel_head2"><h3>Event Details</h3></div><div class="panel_body"> <span class="description"> <!-- class description is SEO, not css --> <span class="vevent"><span class="description"> <span style="font-size: small;">We'll be talking DevOps, Agile Infrascturcture, Agile Operations, or whatever you want to call it (Damon Edwards, from DTO Solutions, <a href="http://dev2ops.org/blog/2010/2/22/what-is-devops.html" target="_blank">explains it best</a>). </span><br />
<span style="font-size: small;">Come hang out and talk with folks who deal with the same issues of Systems Administration, Development, Deployment, and Operations that you do. </span><br />
<span style="font-size: small;">Talk withyour peers that are actively working on building tools, providing services and architecting frameworks to assist and build a DevOps community; and companies like Maxmedia, Turner, and t_sys, who run their organizations using Agile and DevOps concepts.</span><br />
<span style="font-size: small;">This is our 1st Meetup and it'll be casual - no presentations, no speakers, no slideshow, just food, beer and a bunch of people who want to talk about the challenges of making the world a better place for IT.</span><br />
</span></span></span><div class="panel_head2"><h3>Where</h3></div><div class="panel_body location vcard" id="panel_address"> <!-- location and vcard are SEO, not css --> <h2> <span style="font-size: small;"><b><span class="fn org">Taco Mac Perimeter</span></b><br />
<span class="adr"> <span class="street-address">1211 Ashford Crossing <br />
</span> <span class="locality">Atlanta</span>, <span class="region">GA</span> 30338</span></span> </h2></div><span class="description"><span class="vevent"><span class="description"><br />
</span></span><br />
</span></div>Dan Nemechttp://www.blogger.com/profile/01779123453765625136noreply@blogger.com0tag:blogger.com,1999:blog-8793031642965253367.post-82637918524091393902010-06-29T11:25:00.000-04:002011-06-14T09:33:06.986-04:00Timeboxed Project Management with a Task-Oriented Team I wrote this almost 2 years ago when our company was implementing agile for the development team. I thought about how to apply agile to an Ops team. This is what I came up with.<br />
<br />
https://docs.google.com/Doc?docid=0AYQ27_I8CiooZGdtdmc5YmJfMzZjc3FyMjdmMg&hl=en<br />
<br />
I haven't edited the document since I wrote it. We have used this more or less since then and it is really helpful for keeping focus and priority for a highly distracted team. I really should review the document and update it with what I've learned since then. If the date of the document is later than November 2008, then I've updated it.Dan Nemechttp://www.blogger.com/profile/01779123453765625136noreply@blogger.com1tag:blogger.com,1999:blog-8793031642965253367.post-48923983984696360332010-06-29T08:13:00.000-04:002010-06-29T08:44:21.507-04:00The Devops Elevator PitchThere is a growing list of posts by people trying to describe what Devops is. The definitive one is, of course, from Patrick Debois who actually coined the term Devops. You can find his article at his <a href="http://www.jedi.be/blog/2010/02/12/what-is-this-devops-thing-anyway/">JEDI blog</a> . The other article I really like is from the <a href="http://dev2ops.org/blog/2010/2/22/what-is-devops.html">Dev2Ops blog</a>. I was debating whether I should write another article on what Devops is, and I think since the community is so new and so many people are coming to Devops from varied experiences that the more brushstrokes will help paint a better picture. The reason I say this is that in the three days I was at Velocity and Devops Days I found myself several times asking "What is Devops?" I heard so many great speakers talking about so many topics that occasionally I would get muddled and have a hard time describing to myself what Devops is. After I got back to work and am faced with my co-workers who looked at my Devops Days T-Shirt and asked "What is that all about?" I had to come up with a quick elevator pitch to grab their attention and see if they get it and are interested in learning more.<br /><br />So, when I'm in the break room and someone asks me "What is Devops?" I say something like:<br /><br />Devops is a name that is a rallying point for a community of people who primarily work in the software industry (primarily SaaS) who see the need for Development and Operations to work closely together to produce better software and meet the growth objectives of their companies. Yes, you are probably telling yourself that you've been "Devops" for a decade. This is nothing new, but a concerted effort by a global community to provide support for everyone in the community.<br /><br />You ask "Why is this special?" or "Why now?" What I learned from the hundreds of operations and development folks at Velocity and Devops Days is that the internet has passed a maturity milestone and many of the practices from the 80's and 90's just don't cut it in the stratospheric-growth, super-fast release cycle web-centric internet. I was blown away by the number of people who weren't in the top 0.01% of the internet (Google, Yahoo, Facebook, Twitter) who are coming out saying "We need a new way of writing and operating software. Things are different now and we need some different skills and tools in our belt. The way we've been doing business just isn't cutting it any more"<br /><br />So, Devops is a community where people can share their ideas and help the internet keep growing and most of all help all of us who make the internet tick have fun and do even more of the cool stuff that brought us into the tech world in the first place.Dan Nemechttp://www.blogger.com/profile/01779123453765625136noreply@blogger.com22tag:blogger.com,1999:blog-8793031642965253367.post-33969643092702942242010-06-28T08:17:00.000-04:002011-06-14T09:33:06.964-04:00Yet Another Devops BloggerGreetings Devops community (or the curious rubberneckers who just googled Devops). I've just come back from the <a href="http://en.oreilly.com/velocity2010">O'Reilly Velocity</a> conference and <a href="http://www.devopsdays.org/">Devops Days</a> and am pumped up. It was awesome meeting so many people interested in making their organizations more efficient and effective by fostering tighter integration between the Development and Operations teams. I have a notebook full of ideas from all the great presenters and all the conversations from last week: from Sysadmin 2.0 to Developer 2.0, from experiences at my company to stories from other companies.<br /><br />So, welcome to my blog. I hope you enjoy it and come back. The next post will have some real content, I promise.<br /><br />While you are waiting for me to write my next post you can check of the videos of Velocity at their Youtube page http://www.youtube.com/view_play_list?p=D1D3B0B233F2AD66Dan Nemechttp://www.blogger.com/profile/01779123453765625136noreply@blogger.com0