Wednesday, January 23, 2013

Upgrading Chef Server from 0.10.8 to 10.18.2

Here is my story of upgrading Chef from 0.10.8 to 10.18.2 while moving to a new server and updated OS. Someone please comment and tell me where I may have been able to do better.

So, we are running Chef Server 0.10.8 on CentOS 5.4 with Ruby 1.8.7 and I want to upgrade to latest release of Chef and go to Centos 6.3 and Ruby 1.9.3 at the same time. So I couldn't just do an in-place upgrade on the existing server. I needed to migrate my Chef server to a new system and upgrade everything.

My first plan was to take the cautious route since I wasn't sure if Chef could be updated that many revs, so I tried to export all data as JSON, build a new Chef 10.18.2 server, then import all the JSON. It worked perfectly EXCEPT all the client's couldn't authenticate to the server even though I imported its public key. I could create a new client key in the Chef server and the node could authenticate, but it wouldn't with an imported key. I spent about a day on this to no resolution. Maybe someone else will have better results.

Next I tried to just copy the couchDB database. Unfortunately I flubbed things up a few times and spun my wheels for a few days because things didn't work (mostly my fault). Finally I found this method that works:

1) Compile Ruby 1.9.3 and rubygems 1.8.23
2) Install Chef via chef-solo http://wiki.opscode.com/display/chef/Installing+Chef+Server+using+Chef+Solo
3) Fix for the CentOS 6.3 bug for rabbitmq init documented https://bugzilla.redhat.com/show_bug.cgi?id=878030. We decided to change the rabbitmq init script to work around the bug

CONTROLPROG=/usr/sbin/rabbitmqctl
CONTROL="sudo -u ${USER} ${CONTROLPROG}"

4) Add the chef queues because rabbitmq was broken when chef-solo tried to do it. And change the solr maxfieldlength to 100000 to work around the problem of indexing nodes with lots of attributes.

/usr/sbin/rabbitmqctl add_vhost /chef
/usr/sbin/rabbitmqctl add_user chef testing
/usr/sbin/rabbitmqctl set_permissions -p /chef chef ".*" ".*" ".*"

ex /var/lib/chef/solr/home/conf/solrconfig.xml
:%s/<maxFieldLength>10000/<maxFieldLength>100000/g
:wq

5) Shut down couchdb and rename /var/lib/couchdb/chef.couch to chef.couch.bak
6) Copy the couchdb database from the old server (can still be running)
7) chown chef.couch to be "couchdb:couchdb"
8) Start couchdb back up
9) Start rabbitmq, chef-solo, chef-expander, chef-server, chef-server-webui (in that order)

Now I wish I could say it's working at this point, but all the cookbooks are broken. Maybe someone from Opscode can tell me where the cache is of cookbook files that can be copied. But I tried to load all the cookbooks with a knife cookbook upload -a -d, but that still didn't give me working cookbooks. In the UI when you click a cookbook it says "end of file reached" and has no data. The name and version are there, but no contents. I had to knife cookbook delete -p each cookbook, then when it was added back MOST of the cookbooks worked. Some still gave the "end of file" and I had to purge them one-by-one and upload one at a time.

I hope this helps someone. Let me know if you want more detail on any step. I still haven't done extensive testing on the new server but the few clients I tested seem happy. I'm really looking forward to an omnibus install for Chef Server 11 and hope the migration is not painful.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.