Using direnv to help with local development

One of the tools I find most useful for local development is direnv. It’s one of a number of similar utilities that hooks into your shell, and alters the environment when you enter or leave a directory.

To give an example, if you have a folder ~/example with the file ~/example/.envrc

export EXAMPLE_NAME="Danie"

Then when in your shell session you cd into ~/example it will export the EXAMPLE_NAME variable:

 cd ~/example
direnv: loading ~/example/.envrc
direnv: export +EXAMPLE_NAME
 echo Hello $EXAMPLE_NAME
Hello Danie

Just as importantly, when you navigate away, the variable will be unset:

 cd ~
direnv: unloading
 echo Hello $EXAMPLE_NAME
Hello

Simple use cases

More command line tools than you might think make use of environment variables, as of course does your shell. There are some obvious and less obvious uses for direnv. Here are a few that I use:

QUICKLY LAUNCH THE CORRECT DEVELOPMENT DATABASE

Setting Postgres Environment Variables PGDATABASE, PGUSER, and PGPASSWORD1 to match my development database credentials means that when in a project folder, psql will open the correct database.

SECURELY USE AWS SECRETS (via 1Password)

Unlike a development database password, I’m more careful with AWS credentials. While AWS_PROFILE, AWS_REGION, and AWS_ACCESS_KEY_ID are safe to share, I prefer to keep AWS_SECRET_ACCESS_KEY values out of my .envrc files. But that doesn’t mean I can’t use them. I store these secrets in 1Password and use the CLI to retrieve them. So my .envrc looks like this:

export AWS_REGION=eu-west-2
export AWS_ACCESS_KEY_ID=AKABCDEFGHIJKLMNO
export AWS_SECRET_ACCESS_KEY="$(op read op://dev/aws/SECRET_ACCESS_KEY)"

The environment variable is still set to the secret, but the secret itself is stored more securely than in an unencrypted .envrc.

Add arbitrary commands to the path

In one of the clojure projects I work on, the command I use start the REPL is:

clojure -A:dev:test:backend:backend-test:system-test:repl

To save having to type this (or more accurately — search the shell history), I put it in a very short script, saved (and .gitignored) in .local/bin/repl. Then my .envrc I’ve put:

PATH_add ./.local/bin

When I’m in the project folder, typing repl launches the repl.

Footnotes

  1. If I’d set one. YOLO.

Infinite sequences in ruby

One feature of harmonia is tasks that recur on a schedule, e.g. every Thursday, or on the 30th day of each month. For these tasks we need to know not just when they’ll next occur, but also things like the next 4 occurrences, or all occurrences this month.

To do this we’ve used a technique more common in clojure: using an infinite sequence.

Defining simple infinite sequences

Ruby 1.9 and above let us define infinite sequences using the Enumerator class. A simple example is the sequence of integers:

integers = Enumerator.new do |yielder|
  n = 0
  loop do
    yielder.yield n
    n = n + 1
  end
end

>> integers.take(5)
=> [0, 1, 2, 3, 4]
>> integers.take(10)
=> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

Here’s how this works. The block passed to Enumerator.new defines our sequence. It takes a yielder argument with a special method #yield, used to return elements in the sequence. Whenever #yield is called, execution of the block stops. Execution only restarts if more elements are needed, making the sequence lazy. The Enumerator class handles this stopping and starting execution — we only need to worry about how to generate each element.

Most of the code above is concerned with looping and yielding values, not generating them. We can factor this out, giving us a method that makes it trivial to define new sequences:

def sequence(&generator)
  Enumerator.new do |yielder|
    n = 0
    loop do
      yielder.yield generator.call(n)
      n = n + 1
    end
  end
end

>> integers = sequence {|n| n}
>> integers.take(10)
=> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

>> squares = sequence {|n| n * n}
>> squares.take(10)
=> [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

When using infinite sequences laziness is extremely important, as it’s impossible to generate all members of an infinite list in anything less than infinite time. A sequence that outputs whenever a new value is calculated shows this laziness in action:

integers = sequence {|n| puts "Calculating result #{n}"; n}

>> integers.take(3)
Calculating result 0
Calculating result 1
Calculating result 2
=> [1, 2, 3]

Now we can define sequences, how can we use them? We’ve already seen that we can #take any number of elements from our sequence. We can also use #take_while to take elements until a condition is met, such as finding all square numbers under 250:

>> squares.take_while {|n| n < 250}
=> [0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225]

In fact all Enumerable methods are available, but we have to take care in using them. As our sequences are infinite, any method that iterates over all members has the potential to take an infinite amount of time. For example calling #any? will either return true (if a matching element exists) or never return.

Another big drawback is that when we call Enumerable methods, laziness isn’t preserved. Suppose we want the first 5 odd square numbers. We might try the following:

>> squares.select {|n| n % 2 == 1}.take(5)

Unfortunately this will never return. Even though we only want a finite set of results, the call to #select operates on the full infinite sequence before it returns. #take(5) is never called. The same problem exists with #map, #drop, #reject and more.

Preserving laziness across derived sequences

Without laziness preservation, our sequences seem of limited use. In ruby 2.0 we can use #lazy to make our sequences lazier, but in 1.9 this isn’t available to us. Thankfully we can get around this by generating lazy versions of Enumerable methods ourselves. Let’s take the previous example, finding the first 5 odd square numbers. We hit a roadblock because #select never returned. If instead of using #select we use a new Enumerator to do our selecting, we can work around this:

odd_squares = Enumerator.new do |yielder|
  squares.each do |square|
    yielder.yield square if (square % 2 == 1)
  end
end

>> odd_squares.take(5)
=> [1, 9, 25, 49, 81]
>> odd_squares.take(10)
=> [1, 9, 25, 49, 81, 121, 169, 225, 289, 361]

Our new Enumerator iterates lazily through our original sequence, yielding only odd values. We’ve chained two enumerators together to preserve laziness.

This is all a bit cumbersome as is, but we can turn this into a ‘#select’ method on a new LazyEnumerator class:

class LazyEnumerator < Enumerator
  def select(&block)
    self.class.new do |yielder|
      each do |value|
        yielder.yield value if block.call(value)
      end
    end
  end
end

def lazy_sequence(&generator)
  LazyEnumerator.new do |yielder|
    n = 0
    loop do
      yielder.yield generator.call(n)
      n = n + 1
    end
  end
end

>> lazy_squares = lazy_sequence {|n| n * n}
>> lazy_squares.select {|n| n % 2 == 1}.take(5)
=> [1, 9, 25, 49, 81]

#reject and #map can be chained in a similar way to #select:

class LazyEnumerator < Enumerator
  def reject(&block)
    self.class.new do |yielder|
      each do |value|
        yielder.yield value unless block.call(value)
      end
    end
  end

  def map(&block)
    self.class.new do |yielder|
      each do |value|
        yielder.yield block.call(value)
      end
    end
  end
end

#drop and #drop_while are slightly more complicated, but follow a similar pattern. The main difference being that they need to keep track of how much to drop:

class LazyEnumerator < Enumerator
  def drop(n)
    self.class.new do |yielder|
      dropped_enough = false
      dropped = 0
      each do |element, index|
        dropped_enough ||= dropped >= n
        yielder.yield element if dropped_enough
        dropped = dropped + 1
      end
    end
  end

  def drop_while(&block)
    self.class.new do |yielder|
      match_found = false
      each do |element|
        match_found ||= !block.call(element)
        yielder.yield element if match_found
      end
    end
  end
end

Together, these methods give us a LazyEnumerator that can be chained in a great number of ways, giving our sequences a lot of power. #take and #drop let us select which members of a sequence we’re interested, while #select, #reject and #map allow us to build new sequences from existing ones:

>> integers = lazy_sequence {|n| n}
>> squares = integers.map {|n| n * n }
>> odd_squares = squares.select {|n| n % 2 == 1}
>> odd_squares.drop(10).take(10)
=> [441, 529, 625, 729, 841, 961, 1089, 1225, 1369, 1521]

Using only a few simple methods, we’ve been able to answer a complicated (though contrived) question, what are the second ten odd square numbers? This particular answer may not be interesting, but the technique of defining and deriving infinite sequences is much more general and useful. This is only a small sample of what you can do with them.

Deploying a rails app from scratch using recap

If you follow our company blog you’ll know that we’re working on Harmonia, our virtual office manager. I thought I’d explain how we use recap to deploy harmonia, to show how easy and fast recap makes application deployment.

Harmonia is a fairly standard rails application. As well as a web front-end, it has two other processes. A queue worker is used to send outgoing emails, whilst the core of the application is the ticker; a process which ‘ticks’ every minute, assigning tasks to team members. We use foreman to declare these processes in the following Procfile:

web: bundle exec unicorn -p $PORT -c unicorn.conf.rb
ticker: bundle exec rails runner script/ticker.rb
worker: bundle exec rake environment resque:work QUEUE=assignments VVERBOSE=1

All of these processes touch application code, so whenever we deploy a new version of the app (which we do frequently) they need to be restarted. Our app also has a database with associated migrations, uses environment variables like DATABASE_URL for configuration, and has a number of gem dependencies managed by bundler.

This is all handled by recap.

Getting started - adding recap to the project

Using recap with a rails project is simple. First add gem 'recap' to the Gemfile and run bundle install. Next run bundle exec recap setup, which will generate a Capfile, guessing values for the git repository and app name. You should check these values and change the server to point to your app server. As an example, the complete Capfile for harmonia is shown below:

require 'recap/recipes/rails'

set :application, 'harmonia'
set :repository, 'git@github.com:freerange/harmonia.git'

server 'bison.harmonia.io', :app

Applications deployed with recap need their own user, owning all files and processes. Assuming we can ssh into our server and are listed as a sudoer, we can create this user automatically running cap bootstrap. This will also add our own ssh user to the application group, allowing it to deploy the application.

Next we can set any environment variables we need for configuration. These are loaded in the application user’s .profile, so are available to all processes started by recap. In harmonia we set our smtp credentials, the server port, some api keys and more, using commands like cap env:set PORT=7000 and cap env:set SMTP_PASSWORD=secret.

The app is now almost ready to deploy. We can prepare it for deployment with cap deploy:setup, which clones the code repository, installs our gem bundle, sets up the database and precompiles our assets.

Finally, running cap deploy will start the app for the first time, launching each process defined in the Procfile with the environment variables we previously set.

Really fast deployments

While recap makes it very easy to get apps up and running the first time, it comes into its doing subsequent deployments. At Go Free Range we like to deploy apps we’re working on very frequently. One thing that helps ensure we do this is making each deployment as fast as it can be.

Using git as recap does is already a very quick way to get code changes onto servers, but recap takes things a step further. By testing to see which files have changed it knows which tasks can be skipped. For example, database migrations won’t be run if db/schema.rb has not changed; the gem bundle won’t be re-installed unless Gemfile.lock has been updated, and foreman process scripts won’t be exported if the Procfile is unchanged. In fact, if these files don’t exist, these tasks will never run at all.

The future

Using recap with Harmonia has made our deployment process very fast and simple. When the main harmonia server became over-burdened and we decided to commission a new machine dedicated to harmonia, recap made that process quick and painless. As well as harmonia, recap is also used to deploy the Go Free Range website, this blog, and a number of other small sites and projects where it has proven itself well. For larger projects, there are some features (such as more control as to what processes run where) that are missing, but I plan to add these in the next release. For all other sites, recap has proven itself a lightweight and capable alternative to the standard Capistrano deployment recipes.

Past, Present and Future

For me and the rest of Go Free Range, the big event this week was the public launch of Inside Government, the government focussed part of GOV.UK. Inside Government aims to replace the majority of government department and agency websites with a single place for all news, policies, publications and consultations. On Wednesday, the first two department websites (for the Department for Transport and the Department for Communities and Local Government) were moved across.

We’ve been working on this project for 14 months, from the initial commit, through the beta launch right up until today. I’ve enjoyed it, but it has been extremely hard work. And although there are hundreds of things I’d still like to change, there are hundreds more I’m proud of.

Harmonia

As well as Inside Government, we’ve also been slowly inviting people to Harmonia, our ‘virtual office manager’. In case you haven’t read the Go Free Range blog, Harmonia is a tool we use to assign all the tasks needed to keep our company running. Rather than use a schedule, each piece of work is assigned in a way designed to be both random and fair.

One variation or another of Harmonia has been helping us for well over a year, but now we’ve built a web app for everyone to use. It’s still a bit rough around the edges, but we’d love it if people signed up and gave us feedback. We’re building this because we want to learn how to build better products, for the future of Go Free Range.

Tip: Bundler with --binstubs

In a previous blog, I wrote how I’d aliased commands such as rake, cap and rspec to run either with or without bundle exec, based on the presence of a Gemfile. I gave up on that a while ago. Instead, I’ve started installing all my bundles like this:

bundle install --path .bundle/gems --binstubs .bundle/bin

I often use features like bundle open <gem> to debug and edit failing gems, so I like to keep each application’s gems isolated. The --path .bundle/gems installs them within an application’s .bundle directory. As well as isolating my gems, it has the added benefit that I can blow away the gemset with rm -rf .bundle

The --binstubs .bundle/bin option installs bundle-aware scripts for each command provided by a bundled gem. For example, a bundle including rake will generate a .bundle/bin/rake script. By adding ./.bundle/bin to the front of my environment PATH, the bundled version of rake will run when I’m in the application folder. I never have to type bundle exec!

Obviously typing that long bundle install command each time is tedious, so I’ve aliased it to bi:

alias bi='bundle install --path .bundle/gems --binstubs .bundle/bin'

I’ve been using these options for a few months, and so far I’m very happy with them.

Working inside government

Since September, I and the other Go Free Range guys have been working for the government. We’ve been helping the able and growing Government Digital Service to build whitehall, our code name for the Inside Government section of gov.uk. Yesterday we removed the password and opened our doors to the public.

For those who don’t know, gov.uk is the UK government’s attempt to not only consolidate government websites to a single domain, but also build these sites in the right way. That means public, open source, continual delivery and more.

While the first part of gov.uk focussed on helping citizens, Inside Government is about the business of government. What policies the government has, how it’s implementing them, not to mention the news and speeches, publications, consultations and other things the government does each day.

It’s only the first release of many, a small glimpse into what will one day be. How it ends up will be shaped by feedback from the people who use it.

A small toy to explore geohashes

For an app I’ve been building, I’ve been looking into geohashes. For those who don’t know, the geohash format is a simple way to encode latitude and longitude into a single string. As an example, Nelson’s Column in London (51.507794, -0.127952) has the geohash gcpvj0dyds.

Geohashes have a couple of interesting features. First, as you remove characters, you lose precision. gcpvj0dyds fairly accurately points to Nelson’s Column; gcpvj0d represents the South-West of Trafalgar Square and some of the Mall; and gcpvj covers most of Central London, as well as Islington and King’s Cross. A geohash doesn’t really represent a point, but rather a bounding area within which a point may lie. The longer the geohash, the smaller that bounding area.

The other interesting property geohashes have is that nearby locations usually (but not always) share similar prefixes. So much of North London is in gcpv, while much of South London is in gcpu. However, due to the Prime Meridian passing through Greenwhich, South East London has the geohash u10h - wildly different than the other two.

This probably sounds a bit complicated. I was having trouble getting my head around the concept, so to try and get to grips with geohashes I’ve written a toy app that draws them on a map. To try it out, go to http://geohash.gofreerange.com, click the map, zoom and play. If you find it useful, let me know.

Update: This code is now available on github.

Tip: Automatic bundle exec for rake and other gems

It’s irritating to run gem commands like rake, cap, rspec and others, only to find they needed to be executed via bundle exec. As a simple solution, I use a simple zsh function, combined with aliases for commonly used commands.

Here’s the function (which I’ve named be):

if [[ -a Gemfile ]]; then
  bundle exec $*
else
  command $*
fi

It’s very simple. If there’s a Gemfile in the pwd, it runs commands through bundle exec. Otherwise it just runs them.

I’ve combined this with some aliases for much less pain and less frustration:

alias rake='be rake'
alias cap='be cap'
alias rspec='be rspec'

Presenting the #blue api

In building #blue (sign up now!), one of the problems we faced was how to build json data in response to requests to our API. The typical rails solution would be to override #as_json in a model class, then write a controller like this:

class ContactsController < ApiController
  responds_to :json

  def show
    respond_with Contact.find(params[:id])
  end
end

I always prefer to keep my controllers as skinny as possible, so this looks like a great solution. The respond_with call takes care of converting the message to json and responding with the right Content-Type, all in a simple call. However it has a number of problems and disadvantages.

The biggest issue for our API is that rather than expose the id of each model, we’ve tried to encourage the use of the uri instead. So the json returned for a single contact (for example) looks like this:

{
  "contact": {
    "uri": "https://api.example.com/contacts/ccpwjc",
    "name": "George",
    "email": "george@handmade.org",
    "msisdn": "447897897899",
    "phone_number": "07897897899",
    "messages": "https://api.example.com/contacts/ccpwjc/messages"
  }
}

It doesn’t just have a uri for the actual contact, but also for the messages belonging to that contact (and yes, I regret not calling that attribute messages_uri). Models can’t generate uris, and shouldn’t really be aware of them, so overriding #as_json doesn’t work. In any case, the json structure is really presentation logic, not business logic. It doesn’t belong in the model.

Presenting a single model

The solution we’ve used is to build a presenter for each model, solely responsible for building the json. Here’s an example for a contact:

class ContactPresenter
  include Rails.application.routes.url_helpers

  attr_accessor :controller, :subject
  delegate :params, :url_options, :to => :controller
  delegate :errors, :to => :subject

  def initialize(controller, subject)
    @controller = controller
    @subject = subject
  end

  def as_json(options = {})
    {:contact => {
      :uri => uri,
      :email => subject.email,
      :name => subject.name,
      :msisdn => subject.msisdn,
      :phone_number => subject.phone_number,
      :messages => api_contact_messages_url(:contact_id => subject.id)
    }
  end

  def uri
    api_contact_url(:id => subject.id)
  end
end

It’s now simple to rewrite our controller to use the new presenter:

class ContactsController < ApiController
  responds_to :json

  def show
    respond_with ContactPresenter.new(self, Contact.find(params[:id]))
  end
end

Presenting pages of models

The presenter above works well for a single model, but many of our API calls return a page of results. The /contacts for example returns all the contacts belonging to a user (of which there may be hundreds). Luckily it’s simple to adapt this pattern to present pages like this. First, we change our original #as_json method slightly:

def as_json(options = {})
  if options[:partial]
    {
      :uri => uri,
      :email => subject.email,
      :name => subject.name,
      :msisdn => subject.msisdn,
      :phone_number => subject.phone_number,
      :messages => api_contact_messages_url(:contact_id => subject.id)
    }
  else
    {:contact => as_json(:partial => true)}
  end
end

This change allows us to call as_json with the options :partial. With the option, a hash of data is returned. Without, the same hash is returned, wrapped in another hash.

Next, add a page presenter:

class ContactPagePresenter
  include Rails.application.routes.url_helpers

  attr_accessor :controller, :subject
  delegate :params, :url_options, :to => :controller
  delegate :errors, :to => :subject

  def initialize(controller, subject)
    @controller = controller
    @subject = subject
  end

  def as_json(options = {})
    contacts = subject.map {|o| ContactPresenter.new(controller, o).as_json(:partial => true) }

    {:contacts => contacts}.tap do |result|
      if subject.previous_page
        result[:previous_page_uri] = contacts_url(subject.previous_page)
      end

      if subject.next_page
        result[:next_page_uri] = contacts_url(subject.next_page)
      end
    end
  end
end

Finally, we can add an index action using this presenter:

class ContactsController < ApiController
  responds_to :json

  def index
    respond_with ContactPagePresenter.new(self, Contact.paginate(
      :page => params[:page],
      :per_page => 50)
    )
  end
end

Refactoring common logic

The code above is a very much simplified version of what we do in #blue. We have many controllers, and several different models, so in our actual code we’ve abstracted out as much common logic as possible. In reality, our contacts controller looks more like this:

class ContactsController < ApiController
  before_filter :find_contact, :only => [:show, :update]

  def show
    present @contact
  end

  def index
    present_page_of current_account.contacts
  end

  def create
    @contact = current_account.contacts.build(attributes)
    @contact.save
    present @contact
  end

  def update
    @contact.update_attributes(attributes)
    present @contact
  end

  private

  def find_contact
    @contact = current_account.contacts.where(:_id => params[:id]).first
    head :status => :not_found unless @contact
  end
end

I think the code looks pretty clean. The clever stuff happens in the #present and #present_page_of methods, defined in the superclass:

class ApiController < ApplicationController::Base
  protected

  def present(instance, options = {})
    presenter = presenter_class.new(self, instance)
    options[:location] ||= presenter.uri if request.post? && subject.errors.empty?
    respond_with presenter, options
  end

  def present_page_of(collection, options = {})
    presenter = page_presenter_class.new(self, page_of(collection))
    respond_with presenter, options
  end

  def page_of(collection)
    collection.paginate(:page => params[:page], :per_page => 50)
  end

  def presenter_class
    (self.class.name.gsub!("Controller", "").singularize + "Presenter").constantize
  end

  def page_presenter_class
    (self.class.name.gsub!("Controller", "").singularize + "PagePresenter").constantize
  end
end

The #present and #present_page_of methods handle determining the correct presenter to us, as well as paginating the collection where required. They still use rails build in #respond_with method, which helps provide the correct response headers for each request. As the ContactPresenter delegates #errors to its subject, if there are validation errors, #respond_with correctly returns a 422.

One further motivation for this pattern (other than moving presentation logic out of the model) is that should we want to release a new version of our API, we’ll be able to get a lot of the way there simply by swapping which presenter is used. We started using this code about 8 months ago, and I’m still pretty happy with it. I hope you find something useful in it too.

Any comments or suggestions, please get in touch with me on twitter.

#blue opens for business

If you’re an O2 customer based in the UK, you might be interested that #blue, a project we’ve built, has recently (re)opened for business. It’s a soft launch, but feel free to tell your friends.

#blue gets copies of every SMS message you send or receive, and makes them available to you on the web. You can search your messages, read whole conversations, and even reply, all from the comfort of your browser.

Screenshot of the main hashblue messages page

As well as a nice-looking web UI, there’s also an API that allows other apps to read and manipulate your messages and contacts (once you give them permission). Think automatic posting to twitter, or showing new messages as alerts on your desktop. The possibilities are endless.

Screenshot of the hashblue API explorer

Oh, and it’s all free. All you need is an O2 phone.

If you think it sounds interesting, ask for a beta request. You should get an invitation very quickly. Once on board, please send me any suggestions and feedback. It’s going to be interesting to see where we can take this project.

Mongo instrumentation released as a gem

Enough people seemed to comment and like the mongo instrumentation code I wrote about yesterday that I’ve packaged it up and released it as a gem.

The mongo-rails-instrumentation gem is available on rubygems, and the code is up on github.

Adding it to a project is simple, just put the following in your Gemfile, run bundle install and restart your app.

    gem 'mongo-rails-instrumentation', '~>0.1'

Please add any suggestions, improvements and comments to the code in github. I hope people find it useful.

Experimental Mongo instrumentation (for Rails 3)

Update: Changed to instrument methods on the Mongo::Connection

One of our latest rails projects uses Mongo as a backend. We’re just starting to get some traffic, and as we do, we’re monitoring the logs for slow requests. When using ActiveRecord, rails splits out the recorded request time like so:

    Completed 200 OK in 6ms (Views 370.5ms | ActiveRecord: 2.3ms)

We wanted to do the same for our Mongo accesses, just to give a rough idea as to what our requests were doing. Luckily Rails 3 makes this relatively straightforward, providing hooks to instrument methods, subscribe to log messages and add information to the request log. Here, then, is my first stab (mainly harvested from ActiveRecord):

module Mongo
  module Instrumentation
    def self.instrument(clazz, *methods)
      clazz.module_eval do
        methods.each do |m|
          class_eval %{def #{m}_with_instrumentation(*args, &block)
            ActiveSupport::Notifications.instrumenter.instrument "mongo.mongo", :name => "#{m}" do
              #{m}_without_instrumentation(*args, &block)
            end
          end
          }

          alias_method_chain m, :instrumentation
        end
      end
    end

    class Railtie < Rails::Railtie
      initializer "mongo.instrumentation" do |app|
        Mongo::Instrumentation.instrument Mongo::Connection, :send_message, :send_message_with_safe_check, :receive_message

        ActiveSupport.on_load(:action_controller) do
          include Mongo::Instrumentation::ControllerRuntime
        end

        Mongo::Instrumentation::LogSubscriber.attach_to :mongo
      end
    end

    module ControllerRuntime
      extend ActiveSupport::Concern

      protected

      attr_internal :mongo_runtime

      def cleanup_view_runtime
        mongo_rt_before_render = Mongo::Instrumentation::LogSubscriber.reset_runtime
        runtime = super
        mongo_rt_after_render = Mongo::Instrumentation::LogSubscriber.reset_runtime
        self.mongo_runtime = mongo_rt_before_render + mongo_rt_after_render
        runtime - mongo_rt_after_render
      end

      def append_info_to_payload(payload)
        super
        payload[:mongo_runtime] = mongo_runtime
      end

      module ClassMethods
        def log_process_action(payload)
          messages, mongo_runtime = super, payload[:mongo_runtime]
          messages << ("Mongo: %.1fms" % mongo_runtime.to_f) if mongo_runtime
          messages
        end
      end
    end

    class LogSubscriber < ActiveSupport::LogSubscriber
      def self.runtime=(value)
        Thread.current["mongo_mongo_runtime"] = value
      end

      def self.runtime
        Thread.current["mongo_mongo_runtime"] ||= 0
      end

      def self.reset_runtime
        rt, self.runtime = runtime, 0
        rt
      end

      def mongo(event)
        self.class.runtime += event.duration
      end
    end
  end
end

It looks complicated, but it’s actually pretty simple. Data access methods in Mongo::DB and Mongo::Collection Mongo::Connection are hijacked and surrounded by an ActiveSupport::Notifications.instrumenter.instrument block. This triggers events which are listened to by the LogSubscriber, summing the total time spent in Mongo. The ControllerRuntime then collects this count to be displayed, and resets the sum to zero ready for the next request. The output looks like this:

    Completed 200 OK in 838ms (Views: 370.5ms | ActiveRecord: 2.3ms | Mongo: 441.5ms)

It’s just a first stab, so any comments and improvements are more then welcome. It’s here on gist so please fork away.

A home for my Active Record column reader

@kraykray was nice enough to let me know that Rails 3.03 had broken my column-reader code.

I’ve always seen it as a toy rather than a serious project, but getting a bug report made me re-evaluate. If people are using it, even a tiny piece of code like this deserves a proper home.

So here it is, as both a github repository and a gem. Enjoy!

An updated rails template for gem bundler

Update 8th February 2011:

Bundler has changed a lot since I wrote these instructions. Use them at your own risk!

A few months ago I wrote a rails template for gem bundler. Since then, bundler has changed a lot, and my template no longer works. Here then is an updated version, based on this gist from Andre Arko. Using it, you should be able to get a rails 2.3.5 project working with bundler in less than 5 minutes.

The first step is to install the latest bundler. At the time of writing, this was 0.9.9.

gem install bundler

Now you should be able to run the template, either on a new project, or on an existing rails 2.3.5 project.

rails -m http://github.com/tomafro/dotfiles/raw/master/resources/rails/bundler.rb <project>

On a fresh project, that should be all you need to do. On an existing that used an older version of bundler, you’ll need to remove the old hooks in config/preinitializer.rb and config/environment.rb, and the gems folder.

Explaining the template, step by step

The first step creates the project Gemfile, with rails available in all environments, and ruby-debug included in development. If the project has other gems, they should be added here, rather than using rails own config.gem mechanism.

file 'Gemfile', %{
source 'http://rubygems.org'

gem 'rails', '#{Rails::VERSION::STRING}'

group :development do
  gem 'ruby-debug'
end
}.strip

The next step is get bundler to load correctly. This is done in two stages. First, in config\preinitializer.rb bundler needs to be setup. This adds all the bundled gems to the ruby load path, but doesn’t initialise them.

append_file '/config/preinitializer.rb', %{
begin
  # Require the preresolved locked set of gems.
  require File.expand_path('../../.bundle/environment', __FILE__)
rescue LoadError
  # Fallback on doing the resolve at runtime.
  require "rubygems"
  require "bundler"
  if Bundler::VERSION <= "0.9.5"
    raise RuntimeError, "Bundler incompatible.\n" +
      "Your bundler version is incompatible with Rails 2.3 and an unlocked bundle.\n" +
      "Run `gem install bundler` to upgrade or `bundle lock` to lock."
  else
    Bundler.setup
  end
end
}.strip

Second, the rails boot process is modified to start the bundler environment. This ‘requires’ all gems in the bundle, letting them run initialisation code.

gsub_file 'config/boot.rb', "Rails.boot!", %{

class Rails::Boot
 def run
   load_initializer
   extend_environment
   Rails::Initializer.run(:set_load_path)
 end

 def extend_environment
   Rails::Initializer.class_eval do
     old_load = instance_method(:load_environment)
     define_method(:load_environment) do
       Bundler.require :default, Rails.env
       old_load.bind(self).call
     end
   end
 end
end

Rails.boot!
}

All that’s left now is a little cleaning up. The .bundle folder should never be checked into the code repository as it holds machine-local configuration, so it’s added to .gitignore. Finally, bundle install is run to fetch the bundled gems.

append_file '/.gitignore', %{
/.bundle
}

run 'bundle install'

And that’s it. I hope you find it useful.

Rails 3 direct column reader

Whilst trying to get my head around arel and it’s relationship to ActiveRecord in rails 3, I’ve updated the simple ColumnReader class I introduced last year. It lets you read the (correctly cast) column values for an ActiveRecord class, without the overhead of instantiating each object.

Here’s the updated code:

module ColumnReader
  def column_reader(column_name, options = {})
    name = options.delete(:as) || column_name.to_s.pluralize
    column = columns_hash[column_name.to_s]

    self.module_eval %{
      def self.#{name}
        query = scoped.arel.project(arel_table[:#{column_name}])
        connection.select_all(query.to_sql).collect do |value|
          v = value.values.first
          #{column.type_cast_code('v')}
        end
      end
    }
  end

  ActiveRecord::Base.extend(self)
end

The code isn’t that different, though using scoped over construct_finder_sql feels a lot nicer. If you’ve got suggestions for improvement gist away.

Usage is similar to before, only using the new rails 3 syntax:

class Animal < ActiveRecord::Base
  column_reader 'id'
  column_reader 'name'

  named_scope :dangerous, :conditions => {:carnivorous => true}
end

Animal.names
#=> ['Lion', 'Tiger', 'Zebra', 'Gazelle']

Animal.limit(1).names
#=> ['Lion'] (Normal finder options supported)

Animal.dangerous.names
#=> ['Lion', 'Tiger'] (Scoping respected)

Animal.ids
#=> [1, 2, 3] (Values cast correctly)

I’m still not entirely convinced of the value of this helper, so if you find a good use tweet me. Enjoy!

How to easily use Rails 3 now

Update 10th February 2010:

The instructions below were useful earlier in the development cycle. Now the beta gem has been released, the process is much easier:

gem uninstall bundler
gem install tzinfo builder memcache-client rack rack-test rack-mount
gem install erubis mail text-format thor bundler i18n
gem install rails --pre

Now that rails 3 is getting closer to release, I wanted to start playing around with it. I’ve seen a few articles on getting it up and running, but they all seemed a little bit complicated. To use rails 2.3.5 before its release, I just built the gems myself and installed them. It turns out you can easily do the same with rails 3.

First, install rails main dependencies:

gem install rake rack bundler
gem install arel --version 0.2.pre

Next, get the latest rails code from github, and install the rails gems. There may be a few errors you can safely ignore:

git clone git://github.com/rails/rails.git
cd rails
rake install

And bang, you can start your first rails 3 project:

rails ~/apps/playground/rails3

Your existing projects shouldn’t be affected as rails is installed as a prerelease gem, but to be safe I’d recommend a tool like rvm to switch to a clean set of gems.

Tip: Relative paths with File.expand_path

You probably know about the __FILE__ magic constant. It holds the filename of the currently executing ruby source file, relative to the execution directory. So with the following saved as /code/examples/path_example.rb:

puts __FILE__

Running this file from the /code folder will output examples/path_example.rb

This is often used to load files on paths relative to the current file. The way I’ve used it before is like this:

config_path = File.expand_path(File.join(File.dirname(__FILE__), "config.yml"))

This works, but it’s a bit clunky.

What I didn’t realise until reading the rails source code the other day, is that File.expand_path can take a second argument - a starting directory. Also, this argument doesn’t actually have to be a path to a directory, it also accepts a path to a file. With this knowledge we can shorten the above to the following:

config_path = File.expand_path("../config.yml", __FILE__)

Much simpler.

Taking screenshots of web pages with macruby

Whilst playing around with the very exciting macruby last weekend, I thought I’d try building a web page screenshot grabber, based on Ben Curtis’ code. The code was very easy to change translate from rubycocoa, looks cleaner and seems to work really well:

framework 'Cocoa'
framework 'WebKit'

class Snapper
  attr_accessor :options, :view, :window

  def initialize(options = {})
    @options = options
    initialize_view
  end

  def save(url, file)
    view.setFrameLoadDelegate(self)
    # Tell the webView what URL to load.
    frame = view.mainFrame
    req = NSURLRequest.requestWithURL(NSURL.URLWithString(url))
		frame.loadRequest req

    while view.isLoading  && !timed_out?
      NSRunLoop.currentRunLoop.runUntilDate NSDate.date
    end

    if @failedLoading
      puts "Failed to load page at: #{url}"
    else
      docView = view.mainFrame.frameView.documentView
      docView.window.setContentSize(docView.bounds.size)
      docView.setFrame(view.bounds)

      docView.setNeedsDisplay(true)
      docView.displayIfNeeded
      docView.lockFocus

      bitmap = NSBitmapImageRep.alloc.initWithFocusedViewRect(docView.bounds)
      docView.unlockFocus

      # Write the bitmap to a file as a PNG
      representation = bitmap.representationUsingType(NSPNGFileType, properties:nil)
      representation.writeToFile(file, atomically:true)
      bitmap.release
    end
  end

  private

  def webView(view, didFailLoadWithError:error, forFrame:frame)
    @failedLoading = true
  end

  def webView(view, didFailProvisionalLoadWithError:error, forFrame:frame)
    @failedLoading = true
  end

  def initialize_view
    NSApplication.sharedApplication

    self.view = WebView.alloc.initWithFrame([0, 0, 1024, 600])
    self.window = NSWindow.alloc.initWithContentRect([0, 0, 1024, 600],
      styleMask:NSBorderlessWindowMask, backing:NSBackingStoreBuffered, defer:false)

    window.setContentView(view)
    # Use the screen stylesheet, rather than the print one.
    view.setMediaStyle('screen')
    # Set the user agent to Safari, to ensure we get back the exactly the same content as
    # if we browsed directly to the page
    view.setCustomUserAgent 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_2; en-us)' +
        'AppleWebKit/531.21.8 (KHTML, like Gecko) Version/4.0.4 Safari/531.21.10'
    # Make sure we don't save any of the prefs that we change.
    view.preferences.setAutosaves(false)
    # Set some useful options.
    view.preferences.setShouldPrintBackgrounds(true)
    view.preferences.setJavaScriptCanOpenWindowsAutomatically(false)
    view.preferences.setAllowsAnimatedImages(false)
    # Make sure we don't get a scroll bar.
    view.mainFrame.frameView.setAllowsScrolling(false)
  end

  def timed_out?
    @start ||= Time.now
    (Time.now.to_i - @start.to_i) > (options[:timeout] || 30)
  end
end

To use:

macruby -r snapper.rb -e "Snapper.new.save('http://tomafro.net', 'tomafro.net.png')"

The next step is to add some command line options, then try compilation and deployment with macrubyc and macruby_deploy

Tip: Zoom keyboard shortcut for OS X

In the Terminal run:

defaults write NSGlobalDomain NSUserKeyEquivalents '{"Zoom" = "@^Z"; "Zoom Window" = "@^Z"; }'

Quit and relaunch your applications, and ⌃⌘Z should zoom and unzoom.

Stolen from macoshints.com, posted here for my own benefit.

Building rails gems from the 2-3-stable branch

For the latest application I’ve been working on, I wanted to use Michael Koziarski’s rails_xss plugin, to turn default escaping on in my erb templates. I’m also using wycats gem bundler to manage gems and their dependencies, including rails.

This posed a problem: xss_rails requires changes made in rails 2-3-stable branch, but not yet released in a gem (though they will be included in 2.3.5).

The solution was obvious: build my own gems, and get bundler to use them. Luckily, rails makes this an easy process.

First, clone rails from github, and change to the 2-3-stable branch:

git clone git://github.com/rails/rails.git
cd rails
git co -b 2-3-stable origin/2-3-stable

Next, we need to build the gems. Rails currently doesn’t seem to have a Raketask to build all its constituent projects (though I’m planning a patch to include one), so you have to build each one in turn:

cd actionmailer
rake gem PKG_BUILD=1
cd ../actionpack
rake gem PKG_BUILD=1
cd ../activerecord
rake gem PKG_BUILD=1
cd ../activeresource
rake gem PKG_BUILD=1
cd ../activesupport
rake gem PKG_BUILD=1
cd ../railties
rake gem PKG_BUILD=1
cd ..

The key is the PKG_BUILD variable. It appends a suffix to the rails version, so rather than building 2.3.4 (the version I checked out), it will build 2.3.4.1. If I decided to update my gems, I’d use PKG_BUILD=2, then 3 and so on.

Finally, once all these gems are built, it’s simply a case of finding them and using them. For gem bundler, this means placing them in the cache and updating the Gemfile to look for rails ‘2.3.4.1’. The gems are all built in pkg folders in their respective subprojects, so to copy them all somewhere else you can run:

cp **/pkg/*.gem <project-folder>/gems/cache

A rails template for gem bundler

Update 28th February 2010:

Bundler has changed a lot since I first wrote this template, so I’ve written a new version. Please use the updated version rather than the one below.

Following Nick Quaranto’s article ‘Gem Bundler is the Future’, I was inspired to try out bundler on my latest rails project. Previously, I’ve found rails’ own gem management a massive headache. In contrast, using bundler has been a pleasure.

Getting it set up how I wanted took a fair bit of experimentation, so to make things easier both for me and the wider community, I’ve made a rails template to do the hard work.

Give it a try by running the following. You should be up and running in a couple of minutes:

rails -m http://github.com/tomafro/dotfiles/raw/master/resources/rails/bundler.rb <project>

That will give you a bundled project, ready for you to add your own gems. Here’s what each step of the template actually does:

Gem bundler is itself a gem. It can’t be used to manage itself, so to ensure that all environments use the same version, the first step is to install it right into the project:

inside 'gems/bundler' do
  run 'git init'
  run 'git pull --depth 1 git://github.com/wycats/bundler.git'
  run 'rm -rf .git .gitignore'
end

Just having bundler installed is no good without any way to run it; a script is needed. Once this is installed the local bundler can be run with script/bundle <options>:

file 'script/bundle', %{
#!/usr/bin/env ruby
path = File.expand_path(File.join(File.dirname(__FILE__), "..", "gems/bundler/lib"))
$LOAD_PATH.unshift path
require 'rubygems'
require 'rubygems/command'
require 'bundler'
require 'bundler/commands/bundle_command'
Gem::Commands::BundleCommand.new.invoke(*ARGV)
}.strip

run 'chmod +x script/bundle'

Bundler uses Gemfiles to declare which gems are required in each environment. This simple Gemfile includes rails in all environments, and ruby-debug in all environments other than production:

file 'Gemfile', %{
clear_sources
source 'http://gemcutter.org'

disable_system_gems

bundle_path 'gems'

gem 'rails', '#{Rails::VERSION::STRING}'
gem 'ruby-debug', :except => 'production'
}.strip

Most of the files bundler will place in the gem path can be regenerated; they shouldn’t be added to the project repository. The only things that should be added are the .gem files themselves, and the local copy of bundler. All the rest should be ignored:

append_file '.gitignore', %{
gems/*
!gems/cache
!gems/bundler}

The bundle script needs to be run for the first time:

run 'script/bundle'

Finally rails needs to be modified to ensure the bundler environment is loaded. This is done it two parts. First, a preinitializer is added to load the bundler’s environment file before anything else:

append_file '/config/preinitializer.rb', %{
require File.expand_path(File.join(File.dirname(__FILE__), "..", "gems", "environment"))
}

Second, rails initialization process is hijacked to require the correct bundler environment:

gsub_file 'config/environment.rb', "require File.join(File.dirname(__FILE__), 'boot')", %{
require File.join(File.dirname(__FILE__), 'boot')

# Hijack rails initializer to load the bundler gem environment before loading the rails environment.

Rails::Initializer.module_eval do
  alias load_environment_without_bundler load_environment

  def load_environment
    Bundler.require_env configuration.environment
    load_environment_without_bundler
  end
end
}

And that’s it. The project is now fully bundled. More gems can be added to the Gemfile and pulled into the project with script/bundle.

Tip: The case for from_param

There’s one small method I add to every new rails project I work on:

module Tomafro::FromParam
  def from_param(param)
    self.first :conditions => { primary_key => param }
  end
end

ActiveRecord::Base.extend(Tomafro::FromParam)

In my controllers, where you might use Model.find(params[:id]) or Model.find_by_id(params[:id), I use Model.from_param(params[:id]) instead.

All three methods have almost the same behaviour, the only difference being the handling of missing records. find throws a RecordNotFound, while find_by_id and from_param return nil. So why use from_param over the others?

The answer comes when you want to change to_param, the method rails uses to turn a record into a parameter. It’s a good principal (though often broken) not to expose database ids in urls. An example might be to use a users nickname rather than their id in user urls, so /users/12452 becomes /users/tomafro. In rails this is easy to achieve, by overriding the to_param method:

class User < ActiveRecord::Base
  def to_param
    self.nickname
  end
end

Rails will automatically use this method when generating routes, so users_path(@user) will return /users/tomafro as we’d like. If I was using find or find_by_id in my controllers, I’d then have to go through each one and change it to find_by_nickname. Luckily though, I’ve used from_param, so whenever I override to_param I just have to remember to provide an equivalent implementation for from_param, and my controllers will work without modification:

class User < ActiveRecord::Base
  def self.from_param(param)
    self.first :conditions => {:nickname => param}
  end

  def to_param
    self.nickname
  end
end

I’ve been doing this for years, but it’s hardly a new principle, to provide a from method for every to method. There’s even an old ticket on trac asking for it, but it’s been considered too trivial to add.

I disagree - for me the value comes from having the method from the start, not when you need it. Luckily it’s easy to add to my own projects.

Quickly list missing foreign key indexes

Run this code in a rails console to list foreign keys which aren’t indexed.

c = ActiveRecord::Base.connection
c.tables.collect do |t|
  columns = c.columns(t).collect(&:name).select {|x| x.ends_with?("_id" || x.ends_with("_type"))}
  indexed_columns = c.indexes(t).collect(&:columns).flatten.uniq
  unindexed = columns - indexed_columns
  unless unindexed.empty?
    puts "#{t}: #{unindexed.join(", ")}"
  end
end

This list will look something like this:

attachments: parent_id, asset_id
domain_names: organisation_id
event_memberships: user_id, event_id
events: editor_id
group_actions: user_id, group_id
groups: user_id
icons: parent_id
invitations: sender_id
legacy_actions: item_upon_id
news_items: author_id
organisations: midas_id
pages: author_id
pending_event_memberships: invitation_id, event_id
resources: user_id, resourceable_id
subscriptions: subscribable_id, user_id
taggings: tag_id, taggable_id, user_id

For each column in the list, ask yourself why you don’t need an index.

Update: Andrew Coleman has added output in migration format. If you want to play around with it further, here’s the original code on gist.

Tip: Open new tab in OS X Terminal

Another simple shell function, this time just for OS X.

Usage is simple: tab <command> opens a new tab in Terminal, and runs the given command in the current working directory. For example tab script/server would open a new tab and run script/server.

tab () {
  osascript 2>/dev/null <<EOF
    tell application "System Events"
      tell process "Terminal" to keystroke "t" using command down
  	end
  	tell application "Terminal"
      activate
      do script with command "cd $PWD; $*" in window 1
    end tell
  EOF
}

Using indexes in rails: Choosing additional indexes

The first part in this series of posts looked at adding indexes to foreign keys, to improve the performance when navigating rails associations. But many queries involve data other than just foreign keys. With the judicious use of indexes, we can improve these too.

Let’s take the conversations table used in the first article, and add a column to hold the language, and some timestamps. Here’s the full schema:

create_table conversations do |table|
  table.string   :subject, :null => false
  table.integer  :user_id, :null => false
  table.integer  :subject_id
  table.string   :subject_type
  table.string   :language_code
  table.datetime :created_at
  table.datetime :updated_at
end

We want to split conversations by their languages, so we’ll add a named_scope to the Conversation class:

class Conversation
  belongs_to :user
  belongs_to :subject, :polymorphic => true

  named_scope :in_language, lambda {|language|
    { :conditions => {:language_code => language}}
  }
end

Using Conversation.in_language 'en' will now get us all conversations in English. Like we did for foreign keys, we can see how long the query takes and read the explain plan.

mysql> SELECT * FROM conversations WHERE language_code = 'en';
90791 rows in set (3.94 sec)

mysql> EXPLAIN SELECT * FROM conversations WHERE language_code = 'en';
+-------------+------+---------------+---------+-------+---------+-------------+
| select_type | type | key           | key_len | ref   | rows    | Extra       |
+-------------+------+---------------+---------+-------+---------+-------------+
| SIMPLE      | ref  | NULL          | NULL    | NULL  | 1000111 | Using where |
+-------------+------+---------------+---------+-------+---------+-------------+
1 row in set (0.00 sec)

Adding an index to the language_code column should improve the query performance, so let’s do that and see the effect on our query:

mysql> SELECT * FROM conversations WHERE language_code = 'en';
90791 rows in set (3.02 sec)

mysql> EXPLAIN SELECT * FROM conversations WHERE language_code = 'en';
+-------------+------+---------------+---------+-------+---------+-------------+
| select_type | type | key           | key_len | ref   | rows    | Extra       |
+-------------+------+---------------+---------+-------+---------+-------------+
| SIMPLE      | ref  | lang_code_ix  | 3       | const |   98345 | Using where |
+-------------+------+---------------+---------+-------+---------+-------------+
1 row in set (0.00 sec)

So the query now uses the index, and the time taken has gone from almost 4 seconds to just over 3. That’s not nearly as big a performance gain as before, but why? The answer is in the number of rows returned: 90791. The index helps the database find the relevant rows quickly. However, it still has to read those rows, and reading over 90,000 rows will always take a significant amount of time.

In a real app we’re unlikely to want to read all these rows at once, so let’s do another quick comparison limiting the query to the first 100 rows:

Without the index:

mysql> SELECT * FROM conversations WHERE language_code = 'en' LIMIT 100;
100 rows in set (1.32 sec)

And with the index:

mysql> SELECT * FROM conversations WHERE language_code = 'en' LIMIT 100;
100 rows in set (0.01 sec)

Much better.

Choosing between indexes

We’ve seen that by using an index and limiting the number of results we can quickly get the ‘first’ 100 English conversations. But in this case ‘first’ doesn’t really mean anything. When no order clause is specified, MySQL may appear to order its results by id, but this is just a coincidence and shouldn’t be relied on. Let’s instead order our results by created_at to get the 100 most recent conversations.

mysql> SELECT * FROM conversations WHERE language_code = 'en' ORDER BY created_at DESC;
100 rows in set (4.42 sec)

mysql> EXPLAIN SELECT * FROM conversations WHERE language_code = 'en' ORDER BY created_at DESC;
+-------------+------+---------------+---------+-------+---------+-----------------------------+
| select_type | type | key           | key_len | ref   | rows    | Extra                       |
+-------------+------+---------------+---------+-------+---------+-----------------------------+
| SIMPLE      | ref  | lang_code_ix  | 3       | const |   98345 | Using where; using filesort |
+-------------+------+---------------+---------+-------+---------+-----------------------------+
1 row in set (0.00 sec)

Even though this query uses our index and only returns 100 rows, it has still taken almost 4.5 seconds! The reason for this terrible performance is hinted in the extra information in the explain plan: using filesort. The database is reading all rows that match the condition (all 90791 of them), then using a filesort to order them before returning the first 100.

If we add an index on created_at and do the query again we get:

mysql> SELECT * FROM conversations WHERE language_code = 'en' ORDER BY created_at DESC;
100 rows in set (4.39 sec)

mysql> EXPLAIN SELECT * FROM conversations WHERE language_code = 'en' ORDER BY created_at DESC;
+-------------+------+---------------+---------+-------+---------+-----------------------------+
| select_type | type | key           | key_len | ref   | rows    | Extra                       |
+-------------+------+---------------+---------+-------+---------+-----------------------------+
| SIMPLE      | ref  | lang_code_ix  | 3       | const |   98345 | Using where; using filesort |
+-------------+------+---------------+---------+-------+---------+-----------------------------+
1 row in set (0.00 sec)

It’s pretty much exactly the same - still almost 4.5 seconds. This is because MySQL can only use one index per table in a query. It has to choose between the index on language_code and the one on created_at, and in this case chooses the language code index. We can force it to use our other index for comparison:

mysql> SELECT * FROM conversations
       USE INDEX(created_ix) WHERE language_code = 'en'
       ORDER BY created_at DESC LIMIT 100;
100 rows in set (0.02 sec)

mysql> EXPLAIN SELECT * FROM conversations
       USE INDEX(created_ix) WHERE language_code = 'en'
       ORDER BY created_at DESC LIMIT 100;
+-------------+------+---------------+---------+-------+---------+-----------------------------+
| select_type | type | key           | key_len | ref   | rows    | Extra                       |
+-------------+------+---------------+---------+-------+---------+-----------------------------+
| SIMPLE      | ref  | created_ix    | 8       | const | 9903411 | Using where                 |
+-------------+------+---------------+---------+-------+---------+-----------------------------+
1 row in set (0.00 sec)

Using a trick stolen from Pratik Naik (in the comments of his article) we can force the use of a particular index in rails with a special named scope, and perform our query:

Conversation.named_scope :use_index, lambda {|index|
  {:from => "#{quoted_table_name} USE INDEX(#{index})"}
}

Conversation.in_language('en').use_index('created_ix').all(:order => 'created_at desc')

But there is also another way - using indexing multiple columns.

Using compound indexes

A compound index indexes across two or more columns. When defining a compound index, the order of the columns is significant, as the database reduces the set of candidate rows by comparing the columns in turn. So an index created with add_index :conversations, [:language_code, :created_at] will compare language_code first, then created_at.

Because of this, we need to take some care in choosing the order of our columns. In general, the rule is to specify the most selective column first. That is, the column with the most unique values. So for our query, we’ll add the following:

add_index :conversations, [:created_at, :language_code]

If we explain the query without forcing the index we find it is still efficient:

mysql> EXPLAIN SELECT * FROM conversations
       WHERE language_code = 'en'
       ORDER BY created_at DESC LIMIT 100;
+-------------+------+----------------+---------+-------+---------+-----------------------------+
| select_type | type | key            | key_len | ref   | rows    | Extra                       |
+-------------+------+----------------+---------+-------+---------+-----------------------------+
| SIMPLE      | ref  | lang_and_ca_ix |      48 | const |  640231 | Using where                 |
+-------------+------+----------------+---------+-------+---------+-----------------------------+
1 row in set (0.00 sec)

A technique for choosing index column order

Sometimes it’s hard to know which order your columns should be in an index, but there’s an easy way to get a rough idea. Rewrite the query, removing all conditions, and selecting count(distinct column_to_index) for each column. So for our query, we’d do the following:

mysql> SELECT count(distinct language_code), count(distinct created_at)
       FROM conversations;
+-------------------------------+----------------------------+
| count(distinct language_code) | count(distinct created_at) |
+-------------------------------+----------------------------+
|                            21 |                     584089 |
+-------------------------------+----------------------------+
1 row in set (1.90 sec)

From this it’s clear that there are more distinct created_at values, so we probably want to index this column first. Note though that I said probably. When deciding on indexes, there are no hard and fast rules. Instead, we need to measure and analyse the queries used in our particular app, to ensure we are using the best indexes.

The next (and last) article in the series will go through some more advanced techniques, including when not to add an index, and how to spot redundant indexes.

ZSH Completion for gem and gem open

I’ve been trying to get my head around the ZSH completion system. It’s not easy, but I’m slowly making progress.

Here’s my first semi-successful attempt. It provides command completion for gem (including installed commands) and gem name completion for specific gem commands (update, and open from Adam Sanderson).

So typing gem <tab> gives a list of possible commands, whilst gem open <tab> will complete with the names of the currently installed gems.

#compdef gem

_gem_commands () {
  if [[ -z $gem_commands ]] ; then
    gem_commands=$(gem help commands | grep '^    [a-z]' | cut -d " " -f 5)
  fi

  # This seems unnecessary, but if I try to set gem_commands to an array, it falls over.

  typeset -a gem_command_array
  gem_command_array=($(echo $gem_commands))
  compadd $gem_command_array
}

_installed_gems () {
  if [[ -z $installed_gems ]] ; then
    installed_gems=($(gem list | grep '^[A-Za-z]' | cut -d " " -f 1))
  fi

  typeset -a installed_gem_array
  installed_gem_array=($(echo $installed_gems))
  compadd $installed_gem_array
}

if (( CURRENT == 2 )); then
  _gem_commands
else
  if [[ $words[2] == open || $words[2] == update ]] ; then
    _installed_gems
  fi
fi

As it’s a first attempt, it’s a long way from perfect. I’ve put it on gist, for other people to play with, and I’d appreciate any advice or improvements. Specifically I’d like to know how to avoid the use of both gem_command_array and gem_commands. I’d also like to know a better way to check if the given command is in the list [open, update].

Please fork the gist, or tweet me with your suggestions.

Using indexes in rails: Index your associations

Many rails developers are great at building applications but have limited experience in database design. As a consequence, projects often have half-baked indexing strategies, and as a result suffer bad performance.

To try and improve this I’ve planned a series of posts on indexes, targetted at rails developers. In this first post I’ll [introduce indexes and how to index your associations]([object Object]), then I’ll write about choosing additional indexes to improve query performance, and finally how to avoid redundant and duplicate indexes.

A brief overview of database indexes

Wikipedia states that ‘a database index is a data structure that improves the speed of operations on a database table’. Unfortunately, this improvement comes at a cost.

For every index on a table, there is a penalty both when inserting and updating rows. Indexes also take up space on disk and in memory, which can affect the efficiency of queries. Finally, having too many indexes can cause databases to choose between them poorly, actually harming performance rather than improving it.

So while indexing is important, we shouldn’t just throw indexes at our slow queries: we need to choose carefully how to index our data.

Indexing simple associations

By far the most common performance problem I’ve encountered in rails projects is a lack of indexes on foreign keys. There’s no real excuse for this - not indexing foreign keys can cripple your app.

Take the following schema:

create_table users do |table|
  table.string :login
end

create_table conversations do |table|
  table.string  :subject, :null => false
  table.integer :user_id, :null => false
end

We can use this to map a one-to-many relationship between users and conversations, where user_id as the foreign key.

Here are the models to do that:

class User < ActiveRecord::Base
  has_many :conversations
end

class Conversation < ActiveRecord::Base
  belongs_to :user
end

With these models, to find all conversations for a particular user we’d use user.conversations, which in turns uses sql like this:

SELECT * FROM conversations WHERE user_id = 41;

I can run this query on a test database which I’ve randomly populated with 1,000,000 rows, to see how long it takes. Note, I’ve cut out the actual results as they are unimportant:

mysql> SELECT * FROM conversations WHERE user_id = 41; 12 rows in set (1.42 sec)

mysql> EXPLAIN SELECT * FROM conversations WHERE user_id = 41; +-------------+------+---------------+---------+-------+---------+-------------+ | select_type | type | key | key_len | ref | rows | Extra | +-------------+------+---------------+---------+-------+---------+-------------+ | SIMPLE | ALL | NULL | NULL | NULL | 1001111 | Using where | +-------------+------+---------------+---------+-------+---------+-------------+ 1 row in set (0.00 sec)

Although the query is simple, it took 1.42 seconds. The key column show the key or index that MySQL decided to use, in this case NULL as there are no indexes. The rows column is also relevant. It shows that MySQL will need to look at around 1,000,000 rows; that’s a lot of data being loaded and compared.

What a difference just an index makes

If we then add an index on user_id:

add_index :conversations, :user_id, :name => 'user_id_ix'

And do the same select:

mysql> SELECT * FROM conversations WHERE user_id = 41; 12 rows in set (0.01 sec)

mysql> EXPLAIN SELECT * FROM conversations WHERE user_id = 41; +-------------+------+---------------+---------+-------+---------+-------------+ | select_type | type | key | key_len | ref | rows | Extra | +-------------+------+---------------+---------+-------+---------+-------------+ | SIMPLE | ref | used_id_ix | 5 | const | 108 | Using where | +-------------+------+---------------+---------+-------+---------+-------------+ 1 row in set (0.00 sec)

The difference is remarkable. From over 1.4 seconds to about 1 hundredth. Unless you have a cast-iron reason not to, index your foreign keys.

Indexing polymorphic associations

So for simple associations, we can add an index on the foreign_key column. For polymorphic associations the foreign key is made up of two columns, one for the id and one for the type. Let’s add another association to our models to illustrate this.

add_column :conversations, :subject_id, :integer
add_column :conversations, :subject_type, :string

create_table :artworks do |table|
  table.string :title
end

class Artwork < ActiveRecord::Base
  has_one :conversation, :as => :subject
end

class Conversation < ActiveRecord::Base
  belongs_to :subject, :polymorphic => true
end

Here we’ve added an association between Artwork and Conversation, where an artwork can be the subject of a conversation. From an artwork, we can find the related conversation (if any) with artwork.conversation which will use the following SQL:

SELECT * FROM conversations WHERE subject_id = 196 and subject_type = 'Artwork';

Again the query takes around 1.4 seconds without any indexes. Now though, we have a choice on what to index. We can index either subject_type on its own, subject_id on its own, or both together.

Let’s try each in turn, and measure the performance.

First, an index on just subject_type:

mysql> SELECT * FROM conversations WHERE subject_id = 196 and subject_type = ‘Artwork’; 12 rows in set (0.31 sec)

mysql> EXPLAIN SELECT * FROM conversations WHERE subject_id = 196 and subject_type = ‘Artwork’ +-------------+------+---------------+---------+-------+---------+-------------+ | select_type | type | key | key_len | ref | rows | Extra | +-------------+------+---------------+---------+-------+---------+-------------+ | SIMPLE | ref | sub_type_ix | 5 | const | 89511 | Using where | +-------------+------+---------------+---------+-------+---------+-------------+ 1 row in set (0.00 sec)

An index on just subject_id:

mysql> SELECT * FROM conversations WHERE subject_id = 196 and subject_type = ‘Artwork’; 12 rows in set (0.01 sec)

mysql> EXPLAIN SELECT * FROM conversations WHERE subject_id = 196 and subject_type = ‘Artwork’ +-------------+------+---------------+---------+-------+---------+-------------+ | select_type | type | key | key_len | ref | rows | Extra | +-------------+------+---------------+---------+-------+---------+-------------+ | SIMPLE | ref | sub_id_ix | 5 | const | 204 | Using where | +-------------+------+---------------+---------+-------+---------+-------------+ 1 row in set (0.00 sec)

An index on subject_id, subject_type:

mysql> SELECT * FROM conversations WHERE subject_id = 196 and subject_type = ‘Artwork’; 12 rows in set (0.01 sec)

mysql> EXPLAIN SELECT * FROM conversations WHERE subject_id = 196 and subject_type = ‘Artwork’ +-------------+------+--------------------+---------+-------+---------+-------------+ | select_type | type | key | key_len | ref | rows | Extra | +-------------+------+--------------------+---------+-------+---------+-------------+ | SIMPLE | ref | sub_id_and_type_ix | 5 | const | 5 | Using where | +-------------+------+--------------------+---------+-------+---------+-------------+ 1 row in set (0.00 sec)

So subject_type compared ~90,000 rows in 0.31 seconds, subject_id compared ~200 rows in 0.01 seconds and subject_id, subject_type compared 4 rows also in 0.01 seconds. We should add an index to subject_id, subject_type as so:

add_index :conversations, [:subject_id, :subject_type]

Wrapping up

This should give a basic overview of indexes and the performance improvements they can give. Hopefully I’ve shown that foreign_keys should always be indexed, and how to index them. The next article (which I hope to publish later this week) will explain more about how to reason about indexes, and how to identify additional indexes (beyond those on foreign keys) to add.

The cost of explicit returns in ruby

Yesterday Pratik Naik wrote a post on one of his pet hates - explicit returns in ruby. I agree 100% with Pratik, I can’t stand them either.

In one of the comments, Chris Wanstrath posted this microbenchmark showing that using explicit returns incurs a surprising performance hit. It seemed crazy to me, so I thought I’d investigate further, running the benchmark on jruby and ruby 1.9 for comparison.

ruby 1.8.7 (2008-08-11 patchlevel 72)

user system total real explicit 3.420000 0.020000 3.440000 ( 3.478501) implicit 2.220000 0.000000 2.220000 ( 2.236413)

jruby 1.1.6 (ruby 1.8.6 patchlevel 114)

user system total real explicit 2.611000 0.000000 2.611000 ( 2.611001) implicit 2.541000 0.000000 2.541000 ( 2.541385)

ruby 1.9.1p0 (2009-01-30 revision 21907)

user system total real explicit 1.580000 0.010000 1.590000 ( 1.614273) implicit 1.520000 0.000000 1.520000 ( 1.537492)

So neither jruby nor ruby 1.9 incur this penalty.

Is this good news or not? If, like me, you consider explicit returns a code smell, I guess it doesn’t matter either way.

Kernel specific ZSH dotfiles

I work on a number of different machines, OS X based for development and Linux based for hosting. I’ve added various aliases and other commands to my shell, and use a github repository to share this configuration between these machines.

This works well, but while most commands work commonly across Linux and OS X, there are some nasty differences. One example is ls which takes different arguments on both systems; the default ls alias I use on OS X doesn’t work on Linux. So how can we accommodate these differences, without removing all the shared configuration?

The answer is really simple. Create kernel specific configuration files, and use a case statement to load the correct one. The main obstacle was finding a way to distinguish between each kernel. In the end I found the $system_name environment variable, which is set on both OS X and the servers I use.

Here’s the code:

case $system_name in
  Darwin*)
    source ~/.houseshare/zsh/kernel/darwin.zsh
    ;;
  *)
    source ~/.houseshare/zsh/kernel/linux.zsh
    ;;;
esac

As I said, simple.

dscl - the easy way to add hosts on OS X

As a web developer, I often want several host names pointing at my local machine whilst developing and testing applications. I may want to use Apache virtual hosts to serve multiple apps at once, or use subdomains to distinguish different accounts within a single application.

Previously, to set these host names up, I would manually edit /etc/hosts, adding entries like:

127.0.0.1      twitter-killer.localhost
127.0.0.1      my-url-shortener-is-better-than-yours.localhost
127.0.0.1      yet-another-half-baked-idea.localhost

This worked well on one level, but on another it just seemed wrong. It’s very manual, prone to error and pretty hard to script. Recently though, thanks to a hint from Chris Parsons, I’ve found another way: using dscl.

dscl, or Directory Service command line utility to give it its full name, has many uses. For a web developer, the most relevant is probably the ability to add, list and create local directory entries under /Local/Defaults/Hosts in the directory (not the file system). These act like lines in /etc/hosts but can be manipulated easily from the command line.

To add an entry (easily the most interesting command) use:

sudo dscl localhost -create /Local/Default/Hosts/twitter-killer.localhost IPAddress 127.0.0.1

Once entries have been added, listing them is just as simple:

sudo dscl localhost -list /Local/Default/Hosts IPAddress

This produces a list in the form:

twitter-killer.localhost                         127.0.0.1
my-url-shortener-is-better-than-yours.localhost  127.0.0.1
yet-another-half-baked-idea.localhost            127.0.0.1

Finally, you can remove entries with:

sudo dscl localhost -delete /Local/Default/Hosts/twitter-killer.localhost

Once you’ve mastered these commands, the possibilities are endless. Here’s a rake task to add all subdomains used in a project:

class Application
  def self.subdomains
    Client.all.collect {|client| client.subdomain }
  end
end

namespace :subdomains do
  task :add do
    Application.subdomains.each do |subdomain|
      `sudo dscl localhost -create /Local/Default/Hosts/#{subdomain}.app.localhost IPAddress 127.0.0.1`
    end
  end

  task :remove do
    Application.subdomains.each do |subdomain|
      `sudo dscl localhost -delete /Local/Default/Hosts/#{subdomain}.app.localhost`
    end
  end
end

Or if you’re using passenger for development, you can use a tool like James Adam’s hostess to automatically set up the host entry and virtual host entry for a site in one simple command.

Pimp my script/console

For a long time I’ve had an .irbrc file and a .railsrc file, setting up some simple helpers methods in my irb and script/console sessions. Today though, I wanted to add some more helpers specific to the project I’m working on. Specifically, I wanted to be able to use my machinist blueprints, to help me play around with some models.

Adding machinist isn’t as simple as just requiring my blueprints though — they require my ActiveRecord models, which aren’t available when .irbrc is loaded. Luckily the solution is simple — just add a couple of lines to the configuration in environment.rb:

Rails::Initializer.run do |config|
  if defined?(IRB)
    config.gem 'faker'
    config.gem 'notahat-machinist', :lib => 'machinist', :source => "http://gems.github.com"
    IRB.conf[:IRB_RC] = Proc.new { require File.join(RAILS_ROOT, "config", "console") }
  end
end

The key part is the line starting IRB.conf[:IRB_RC], which tells irb to run the given when the session starts. Luckily, this happens after rails has finished initializing. I’ve set it to require config/console.rb, to which I can add all sorts of configuration and helpers, knowing it will only get loaded in script/console sessions where I want these shortcuts, not in my general code. Here it is:

require File.expand_path(File.dirname(__FILE__) + "/../spec/blueprints.rb")

def tomafro
  Account.find_by_login("tomafro")
end

def bbi
  Client.find_by_name("Big Bad Industries")
end

Read ActiveRecord columns directly from the class

Sometimes you want to read just a single column from a collection of records, without the overhead of instantiating each and every one. You could just execute raw SQL, but it’s a shame to do away with the nice type conversion ActiveRecord provides. It’d also be a pity to get rid of find scoping, amongst other goodness.

Enter Tomafro::ColumnReader:

module Tomafro::ColumnReader
  def column_reader(column_name, options = {})
    name = options.delete(:as) || column_name.to_s.pluralize
    column = columns_hash[column_name.to_s]

    self.module_eval %{
      def self.#{name}(options = {})
        merged = options.merge(:select => '#{column_name}')
        connection.select_all(construct_finder_sql(merged)).collect do |value|
          v = value.values.first
          #{column.type_cast_code('v')}
        end
      end
    }
  end
end

Once you’ve extended ActiveRecord::Base with it, usage is simple. In your models, declare which columns you want access to:

ActiveRecord::Base.extend Tomafro::ColumnReader

class Animal < ActiveRecord::Base
  column_reader 'id'
  column_reader 'name'

  named_scope :dangerous, :conditions => {:carnivorous => true}
end

Once you’ve done this, you can access values directly from the class, respecting scope, limits and other finder options.

Animal.names
#=> ['Lion', 'Tiger', 'Zebra', 'Gazelle']

Animal.names :limit => 1
#=> ['Lion'] (Normal finder options supported)

Animal.dangerous.names
#=> ['Lion', 'Tiger'] (Scoping respected)

Animal.ids
#=> [1, 2, 3] (Values cast correctly)

Using Rack Middleware for good and evil

So we all know that Rack is awesome, and that we can use Rack::Middleware for all sorts of things: debugging, caching and a whole host more.

What all these have in common (apart from maybe Rack::Evil) is that they’re all helpful. They all make writing Rack applications easier. Not my Middleware though.

Introducing Rack::Shuffler

module Rack
  class Shuffler
    def initialize(app)
      @app = app
      @responses = []
    end

    def call(env)
      @responses << @app.call(env)
      @responses[rand(@responses.size)]
    ensure
      @responses.delete_at(rand(@responses.size)) if @responses.size > 100
    end
  end
end

I suggest you add it to a colleague’s app late on a Friday afternoon, and see how long it takes to drive them to insanity.

Automatching rails paths in cucumber

If you’re using cucumber as part of your testing, you probably have a paths.rb file that looks something like this:


module NavigationHelpers
  def path_to(page_name)
    case page_name

    when /the home page/
      root_path
    when /the new client page/
      new_client_path
    when /the clients page/
      clients_path
    # Add more page name => path mappings here
    else
      raise "Can't find mapping from \"#{page_name}\" to a path.\n" +
      "Now, go and add a mapping in features/support/paths.rb"
    end
  end
end

World(NavigationHelpers)

This let’s us use nice descriptive names in our scenarios, but it starts to become a pain when we add more and more paths. So how can we make it better?

By automatically matching some rails paths. Here’s the code:


module NavigationHelpers
  def path_to(page_name)
    case page_name

    when /the home page/
      root_path
    # Add more page name => path mappings here
    else
      if path = match_rails_path_for(page_name)
        path
      else
        raise "Can't find mapping from \"#{page_name}\" to a path.\n" +
        "Now, go and add a mapping in features/support/paths.rb"
      end
    end
  end

  def match_rails_path_for(page_name)
    if page_name.match(/the (.*) page/)
      return send "#{$1.gsub(" ", "_")}_path" rescue nil
    end
  end
end

World(NavigationHelpers)

What it does is pretty simple. Given a page name the clients page (with no other matches defined) it will try and send clients_path. If successful, then it returns the result, otherwise nil.

Not the biggest improvement in the world, but it’s made my cucumber tests just a little bit easier to write.

Adam Sanderson's open_gem

The latest version of rubygems (1.3.2) now has an interface to add commands. Making great use of this feature, Adam Sanderson has written open_gem, a simple but amazingly useful tool.

You use it like this:

$ gem open activerecord

This opens the activerecord gem in your favourite editor (taken from either $GEM_OPEN_EDITOR or $EDITOR environment variables). If there are multiple versions of the gem installed, it will show a menu, letting you choose which version you require.

$ gem open activerecord
Open which gem?
 1. activerecord 2.1.0
 2. activerecord 2.3.2
>

open_gem itself is a gem, and can be installed with:

$ gem install open_gem

To get it working, you need to have $EDITOR set to something sensible:

$ export EDITOR=mate

If you’re running on OS X and use TextMate, you may have already set $EDITOR to mate -w, which let’s you use TextMate as the editor for git commit messages and much more. However, the -w flag doesn’t work with open_gem, so set the $GEM_OPEN_EDITOR variable, and open_gem will use that instead:

$ export GEM_OPEN_EDITOR=mate

You should now be good to go. If you want to see how it works, just use it on itself!

$ gem open open_gem