Managing data.fao.org with Chef
The data.fao.org project is ambitious. It is challenging both for the software developers to implement as well as for the operations staff to support. Highly available and highly distributed systems with multiple entry points offer many advantages but they also come at the cost of creating greater complexity with higher maintenance requirements. Just as the distributed architecture for data.fao.org forces its developers to change their mindset, it also forces the operations team to work in a different way. We can no longer treat infrastructure as a set of static artifacts but we must literally treat our “infrastructure as code”, as best described by Stephen Nelson-Smith.
Past Approaches
In the past, the operations team at FAO managed servers with a collection of Bash and Perl scripts. In my time here at FAO, I have written a provisioning script that totals over 900 lines of Bash. This, along with other scripts, allowed us to set up new servers fairly quickly. However, these scripts were brittle, inflexible, and critically not idempotent. For this reason they could only be used for initial setup purposes and could not be used to keep our servers in a known state.
This meant that our servers were subject to what is called 'configuration drift' which means that our machines would begin mutating into uniquely special artifacts, or snowflakes, due to small, individual, and undocumented changes. These snowflakes increasingly consumed time because they continued to mutate. Besides from the issue of configuration drift, automating application configuration with shell scripts is a lonely process. There is no ecosystem of shell scripts for automating system administration. This is because Bash does not have any abstraction capabilities upon which to build a framework or reusable modular components. Perl, in my humble experience, is only slightly better in these regards. A Bash script is instantaneous technical debt.
It was painfully obvious to me that I needed to seriously upgrade our tooling. The new tools needed to have the following characteristics:
- Maintain servers in a known state. No snowflakes!
- Be part of a larger community that shares code and best practices in order to minimize technical debt.
- Be easy to get started with so that new staff members could quickly come on board.
- Be powerful enough to meet our configuration needs.
As a result of these requirements that we set, I looked at three different configuration management tools:
Puppet and Chef are true configuration management systems while Fabric is really a tool for remote job execution. I had used it in the past to make small settings changes to hundreds of servers at the same time. For example, I used it one time to update the ntp configuration on all of our servers and then to restart the NTP daemon. I quickly found that Fabric was not sufficient for my needs. While it is great for making ad-hoc changes, it was designed for, and is primarily used to, deploy Django applications and its community is focused accordingly.
I then spent several months working with Puppet. I found that it was very easy to get started with the Puppet DSL, with its easy-to-use primitives for working with system resources. The following is an example of a puppet 'manifest' for configuring SSH:
These resources are idempotent meaning that they achieve the same effect no matter how many times they are applied.
Another big bonus of using Puppet is that there is a large and supportive community around the project. Help was always prompt and available on IRC and the Puppet mailing list.
While it was easy for me to get started with Puppet, I soon ran into pain points. Puppet has its own configuration language, the Puppet DSL. To extend this DSL beyond the core resource types you have to write your own Ruby modules. This meant that do anything moderately complicated I had to learn two distinct languages. Jumping between the two was a serious impedance mismatch and I frequently found myself confused. To make matters worse, the Puppet DSL does not have an interactive Read-Eval-Print-Loop (REPL) like the one provided by Bash, Perl, and Python.
Chef has an internal DSL that provides the best of both worlds. The following Chef code is in a Domain Specific Language (DSL) and it is also valid Ruby code:
The simple fact that the Chef DSL is pure Ruby code provides innumerable benefits, mainly stemming from the fact that the DSL can reuse all of Ruby's tooling. My favorite example of this is Chef's interactive shell, the unfortunately name Shef (Chef’s Shell). I have used it to debug countless issues. Here is an example session:
The ability to use pure Ruby in Chef 'recipes' is one of several features that Chef provides and Puppet does not. It must be noted that Chef has a substantially higher learning curve than Puppet but in my opinion it is not so high to outweigh its benefits. Additionally, you should keep in mind that Chef is a much younger project than Puppet and has a substantially smaller user base. Despite these concerns, I chose Chef for managing our system configurations because I believe that it is the more productive platform.
Data-Driven Infrastructure
There are many excellent introductory articles on Chef so I will only highlight some of the more novel portions of them here, particularly the ability to allow data to drive the configuration of your infrastructure.
Chef has two components, data bags and search, that make Chef recipes highly dynamic and remove the need to hardcode information such as IP addresses, host names, database connection values, JVM heap sizes, and so on into recipes. Data bags can be thought of as global variables for your infrastructure.
Let’s start by specifying which server is a ‘production’ server and which one is a development server:
roles/production.rb
roles/development.rb
I didn’t cover roles earlier but the following is a good description that comes from the Chef wiki:
“A role provides a means of grouping similar features of similar nodes, providing a mechanism for easily composing sets of functionality. At web scale, you almost never have just one of something, so you use roles to express the parts of the configuration that are shared by a group of Nodes.”
After creating these roles, they must be applied to the 'nodes' or individual servers. I will not cover this step in this article. The following data bag specifies the values according to the application environment:
data_bags/applications/enterprise_service_bus.json
jboss/recipes/enterprise_service_bus.rb
The search feature is exciting because it allows components to dynamically find each other. Essentially, search allows a node to query the configurations of other nodes. The Infochimps have taken this to the extreme with their silverware cookbook, which allows components to essentially wire themselves together. They use Chef to provision and configure hadoop clusters. The following small example shows how search can be used to configure a master postgresql server to forward its write-ahead-log to slave servers:
First configure the roles…
roles/postgresql_master.rb
roles/postgresql_slave.rb
postgresql/recipes/master.rb
Using this technique, the postgresql master will begin replicating its data to any slave servers added since the last Chef run.
Java's Special Challenges
Managing Java applications presents special challenges. Most popular applications can be installed from system packages like .rpm or .deb. For whatever reason, the rpm and debian packages available for java-related packages are long out-of-date. For example, the main provider of Java-related rpms is jpackage.org, which has no packages available for tomcat 7. The latest JBoss package was uploaded in May of 2009. There is a pronounced disconnect between the Java developer community and the Linux distribution packages.
To make matters worse, no version of the Oracle Hotspot JDK is available as a system package. Oracle forces you to download it directly from their website using a browser. You can not automate this process without violating Oracle’s legal terms. Many Linux systems administrators create their own rpm and debian packages in private repositories using tools like fpm.
I have not taken this route as I do not want to have to maintain my own private repository. Also, any Chef recipe that relies on a private repository is hard for other people to reuse. Installing a Java application is actually far less complicated than installing an application that must be compiled. Instead I created the "ark"
resource that downloads a compressed file from a given URL, unpacks the file to a given location, and optionally updates the system PATH variable. The ark resource is explained in great detail on the developsanywhere blog.
The following quick example shows how I use ark to install the Java development kit (JDK):
When you run this resource, it downloads a tarball, unpacks it to the /usr/local/jdk directory and creates the following symbolic link: /usr/local/bin/java -> /usr/local/jdk/bin/java
Take note that I specified a dummy URL, http://www.example.com, for the Oracle JDK. I actually use a private webserver to serve the Java tarball that I downloaded earlier using a desktop machine.
The next issue I encountered was how to download dependencies such as JDBC connectors, EJBs, and other artifacts. I was lucky enough to meet Carlos Sanchez at the FOSDEM 2012 conference in Brussels whose puppet-maven module solves this issue elegantly by sourcing Java artifacts from public or private maven repositories.
I have ported most of his puppet-maven module to Chef https://github.com/bryanwb/chef-maven, building on top of the existing maven cookbook. Here is an example of how I use it source the JDBC connector for postgresql:
Now I will take some time to walk you through the configuration of a basic Jboss application. This is a simplified example. You can see the complete recipe on github.com:
https://github.com/bryanwb/chef-jboss/blob/master/recipes/standalone_jdbc.rb
The default values for all jboss applications are in the jboss/attributes/default.rb file. However, I override them with the values that are specific to the ESB application. I put those values in the roles/esb.rb file:
Next, I create a data bag that holds the values specific to each application environment:
data_bags/applications/esb.json
Here is a simplified subset of the actual recipe code:
cookbooks/jboss/recipes/standalone_jdbc.rb
This may seem like a massive amount of custom code but you should consider how little technical debt it contains. This JBoss recipe is built on established patterns and tooling from the Chef community. Anyone with Chef experience can come in and understand these recipes in a very short period. The same cannot be said of homegrown Bash and Perl scripts.
Future Plans
I hope this article has given you a sense of how we use Chef to support the data.fao.org project. It is a basic overview and does not cover the full extent of how we intend to use Chef. In the future, I will implement high-availability and load-balancing configurations. Furthermore, I plan to use cucumber to test cookbooks with tools like minitest, simple_cuke or cuken.
Additional Resources
Author: Bryan Berry