Ellipsis
What happened to Cloud Particle? Did you present at DevOpsDays? Is it open source? I'm really interested, as I'm on the path of using CloudFormation, but looking for something better.

Thanks for asking. The crux of the issue is that I’ve moved on from that employer and no longer have any say in when CloudParticle is open-sourced. I’ve pinged some of my team members, but one is at Burning Man and the other may have mostly dropped of Twitter.

I do hope they open-source CloudParticle soon. It’s definitely superior to CloudFormation, and the syntax is much cleaner than Terraform. The only feature lacking, really, are elastic IPs, because every node resource is realized with auto-scale groups, and EIPs don’t map well to that. (and you want even single nodes to be a 1-node ASG so they get re-instantiated upon failure).

Most likely you can’t wait, so Terraform is definitely worth looking at. Good luck, Peter

Clearing the Counter Pt II: knife cleanup tweaks #chef

Part Two: knife cleanup versions

A colleague at $WORK discovered a plugin by Marius Ducea which would remove unused cookbook versions from a chef server. The plugin provided you the option of dumping all cookbooks locally before deleting them. The problem we saw was that ‘unused’ meant ‘not explicitly pinned’ with equality; it disregarded ~> and >=, etc. So we could not use the plugin without trashing many of our cookbooks in use.

I extended the plugin to take a --runlist parameter, so it would query the chef server for the cookbook versions needed to satisfy that runlist, and do so for each of the environments present on the server. At the time we were not using Chef’s built-in roles, as some of us had mis-read/mis-understood work such as The Berkshelf Way or A Year with Chef. We had one cookbook work-roles, which in turn had recipes such as workroles::mongo or workroles::api. To clean up cookbooks, I’d run:

knife cleanup versions --runlist 'workroles::default'

and it would list all the cookbooks eligible for deletion. Then I’d run that again with the -D option to actually do the deletion. A few score cookbooks would poof go away and our depsolver issues would be gone.

I have a PR open with Marius but I’ve not heard back since Christmas.

Meanwhile, you can use it from my GitHub runlist_feature branch.

What about roles?

Turns out that one master roles cookbook is an anti-pattern. That’s a story for another time. Chef roles are great if they’re kept lightweight; they were unfairly maligned by abuses heaped upon them.

My branch of knife-cleanup doesn’t handle roles well, since it expects a runlist of recipes, so I’ll need to address that.

Unless cookbook clean up has been built into Chef via some other avenue, I’ll fork the current knife-cleanup plugin into a knife-scrub plugin which will build up a runlist for all roles, and keep the versions in use in any environment. Or if Marius has time then we can work on this issue together.

I have a short Part Three forthcoming with some other musings around depsolver and cookbook cleanup. Stay tuned.

Notes:

Clearing the Counter: cookbook clutter and knife cleanup

Part One

In a Chef development group with a high rate of cookbook churn, you may eventually find that your Chef server is timing out as the dependency solver (depsolver) works out the correct cookbook version to send to clients based on environment constraints and cookbook constraints. This gets ugly pretty quickly, since the Chef workers tied up doing depsolving aren’t available for servicing other clients. At $WORK, we’d see lots of failed chef client runs, and usually at the most inconvenient of times.

How did things go wrong? Well, we’d have a number of internal cookbooks with metadata.rb include constraints such as:

depends 'java', '< 1.14.0'
depends 'apt', '>= 1.8.2'
depends 'yum', '>= 3.0'
depends 'python'
depends 'runit', '>= 1.5.0'
depends 'bar', '~> 1.1'
depends 'baz', '~> 2.0'

And when bar has versions 1.1.0, 1.1.1, 1.1.2, and baz has all its versions, and the upstream cookbooks have all their version iterations, all with their own constraints, the depsolver problem space grows exponentially. Eventually, the chef server will kill the long-running depsolvers, where long-running means about five seconds.

Buying yourself time with longer depsolver timeout.

In a pinch, you can throw more resources at the problem, increasing the timeouts and the threads available to the chef server. This is only a short-term stop-gap. As you update more cookbooks, you’ll soon be back to your earlier pain.

Good luck finding where to change those settings, as the omnibus-chef-server project (https://github.com/opscode/omnibus-chef-server) has an attribute defined for default['chef_server']['erchef']['depsolver_timeout'], but that attibute isn’t used anywhere *.

What you need to do is edit /var/opt/chef-server/erchef/etc/app.config to change the depsolver_timeout (under the chef_objects key) and the max_count under the pooler key, as show in this diff where the timeout goes to 10,000 ms, and the worker count is bumped to 12:

--- app.config  2014-07-23 20:57:06.714838003 +0000
+++ app.config~ 2014-07-23 21:24:09.674838002 +0000
@@ -114,7 +114,8 @@
                   {s3_external_url, host_header},
                   {s3_url_ttl, 900},
                   {s3_parallel_ops_timeout, 5000},
-                  {s3_parallel_ops_fanout, 20}
+                  {s3_parallel_ops_fanout, 20},
+                  {depsolver_timeout, 10000}
                  ]},
   {stats_hero, [
                {udp_socket_pool_size, 1 },
@@ -148,7 +149,7 @@
                      {init_count, 20},
                      {start_mfa, {sqerl_client, start_link, []}} ],
                     [{name, chef_depsolver},
-                     {max_count, 5},
+                     {max_count, 12},
                      {init_count, 5},
                      {start_mfa, {chef_depsolver_worker, start_link, []}}]
                  ]},

Then restart the chef-server.

In the next part, I’ll cover how to fix this with a safe cookbook clean-up.

* I submitted a fix to the missing attribute problem with this pull request: https://github.com/opscode/omnibus-chef-server/pull/79

#DNDwKids: Back to HeroLab after DndInsider #dnd

Last spring and fall I ran a Pathfinder game with 4th, 5th, and 6th graders at my sons’ school. Each term I had to turn away kids ‘cause I had to cap participation at six players, and even that was too many for the attention span of some of the kids.

Now that I’ve gotten two kids (6th grade and 7th grade) interested in DM’ing, I’m going have them run two games while I provide support, feedback and guidance.

But we’re switching to DnD 4e since Jesse and Harry each have stacks of 4e books and know the rules much better than for Pathfinder.

At our first session we just worked on converting our existing characters, and rolling up new ones for those who have just joined. I know there are a lot errors but that I would resolve them all by moving from paper to …, well from paper to DnD Insider Character Builder (CB) is what I thought but, jeez, what a beast that is.

I’d used HeroLab last fall and thought it pretty good, but got the sense that the WOTC online tools would be better suited for 4e. So I thought I would be ready to roll once I signed up for 3 months. But the tools require Silverlight, burn through my CPU, and really, really laggy, turning a data entry task into what seems like 15s of waiting between advancing between fields. Creating just one character took about an hour.

Also, it’s clear we’ll be using 4e for at least of couple of years, and there’s rumors that 4e support will be coming to an end.

So, I’m going to give 4e a whirl on HeroLab. The starting process is a bit arcane. First, since I’d already had a fully licensed HeroLab with Pathfinder, I didn’t seem to be able to use the 4e support in demo mode (I may be mistaken on that) unless I purchased the 4e license. Since LoneWolf offers a money-back option within 60 days, I thought I would lay down my $20 and hope for the best.

The 4e package for HeroLab has no content, since LoneWolf doesn’t have a license from WOTC, but instead has a downloader tool. Since I have a DnDInsider subscription it uses those credentials to d/l the data files.

I’m hoping once this completes I’ll have a faster, more usable character management system. Let’s see…

Chef Shell Attribute example

I find the dearth of chef-shell examples on the web really frustrating, so starting my own.

Getting started

root@logstash-i-ab57c6d1:~# export PATH=/opt/chef/embedded/bin:$PATH root@logstash-i-ab57c6d1:~# chef-shell -z

Querying node attributes

Get the logstash server outputs: ““

attributes_mode chef:attributes > node[‘logstash’] … chef:attributes > node[:logstash][:server][:outputs].length => 5 chef:attributes > node[:logstash][:server][:outputs][0] => {“elasticsearch_http”=>{“host”=>”logstash-elasticsearch.infra.example.in”}} ““

Pretty excited about jumbo dice for new season of #rpgkids. Giving them each a set last fall was a mess #fb

Pretty excited about jumbo dice for new season of #rpgkids. Giving them each a set last fall was a mess #fb

Fixing #sensuapp OpenSSL peer cert validation issues

Today I used Chef to configure a test sensu-server, but my Hipchat notifications were failing with this snippet in the logs:

/opt/sensu/embedded/lib/ruby/2.0.0/net/http.rb:917:in `connect': SSL_connect returned=1 errno=0 state=SSLv3 read server certificate B: certificate verify failed (OpenSSL::SSL::SSLError)\n"

I soon determined that the httparty gem was at 0.11.0 on the prod sensu servers, and at 0.12.0 on the new one. Further, that httparty had (wisely) been changed to verify peer certs. No problem, but where to put the CA (Certificate Authority) bundle?

Tracking this down took more of the afternoon than ideal, but eventually I determined that the default SSL cert path can be determined with:

# irb
irb(main):001:0> require 'openssl'
=> true
irb(main):002:0> File.dirname OpenSSL::Config::DEFAULT_CONFIG_FILE
=> "/opt/sensu/embedded/ssl"

To get the CA certs into embedded ruby we can update the default sensu install with a bit of Chefery

cookbook_file '/opt/sensu/embedded/ssl/cert.pem' do
  source "cert.pem"
  mode 0755
end

Where cert.pem contents are pulled from ‘http://curl.haxx.se/ca/cacert.pem’ so we have a complete list of acceptable Certificate Authorites.

Ideally, would submit a PR to https://github.com/sensu/sensu-build/pulls, but for now I’ll have to content myself with an issue report.

References:

Update: - https://github.com/sensu/sensu-build/pull/79 has a PR to sensu Omnibus to fix this.

JMX - collectd - graphite

I finally started sending some key JMX stats into Graphite via our collectd setup. A few notes since I’ll probably forget all about this until I next need to configure this.

JMX

JMX listens on a random port. Ended up adding all of the following to JAVA_OPTS

-Dcom.sun.management.jmxremote 
-Dcom.sun.management.jmxremote.authenticate=false 
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.port=19876

What objects and attributes are available to monitor? Enable jmxproxy in tomcat with the following in /etc/tomcat7/tomcat-users.xml

<tomcat-user>
  <user username="tomcat" password="tomcat" roles="tomcat,manager-gui,manager-jmx"/>
</tomcat-users>

and then peruse http://localhost:8080/manager/jmxproxy/

collectd

Configured the plugin with Miah’s chef-collectd cookbook. See my recipe and the template at:

https://gist.github.com/pburkholder/8341458

The main changes to the plugin configuration is a change to prefix for the thread_pools and the ‘Type’ for class loading.

carbon-writer

We use the carbon-writer plugin from Gregory Szorc. The plugin didn’t sanitize out double-quotes, which pretty much horked the Graphite browser. This pull request fixes that.

debugging collectd

The Ubuntu build doesn’t include debugging so turning up the log level to ‘debug’ does nothing. And the ‘info’ level gives you almost nothing. The most useful steps for tracking down my issues (which came down to the aforementioned double-quote) was a) running in the foreground:

/usr/sbin/collectd -f -C /etc/collectd/collectd.conf

and enabling the CSV plugin to see what was getting written before going to carbon/graphite.

LoadPlugin csv
<Plugin csv>
  DataDir "/var/lib/collectd/csv"
  StoreRates false
</Plugin>
It’s been real, Tumblr

So I tried Tumblr as a blogging platform. Since Twitter has worked out so well, I thought Tumblr might have some appeal that wouldn’t be apparent until I dove in and tried it.

But I have a hard time taking myself seriously here, so I’m moving to Jekyll (at GitHub, but I can take it anywhere). The preview is at http://blog.pburkholder.com. I need to get a Disqus account set up and clean up the old posts. I hope it doesn’t take long, as I have some real content (sensu + chef, Puppet/Chef lessons) that deserve my real attention.

Update

Er, back on Tumblr again. Why, well, as cool as Jekyll is, I can’t quite justify the time to get it ‘just so’ when I can come here and just write.

Meanwhile, I can use http://import.jekyllrb.com/docs/tumblr/ to export/backup my content here, just in case.

References for that: