An updated rails template for gem bundler

A few months ago I wrote a rails template for gem bundler. Since then, bundler has changed a lot, and my template no longer works. Here then is an updated version, based on this gist from Andre Arko. Using it, you should be able to get a rails 2.3.5 project working with bundler in less than 5 minutes.

The first step is to install the latest bundler. At the time of writing, this was 0.9.9.

gem install bundler

Now you should be able to run the template, either on a new project, or on an existing rails 2.3.5 project.

rails -m http://github.com/tomafro/dotfiles/raw/master/resources/rails/bundler.rb <project>

On a fresh project, that should be all you need to do. On an existing that used an older version of bundler, you'll need to remove the old hooks in config/preinitializer.rb and config/environment.rb, and the gems folder.

Explaining the template, step by step

The first step creates the project Gemfile, with rails available in all environments, and ruby-debug included in development. If the project has other gems, they should be added here, rather than using rails own config.gem mechanism.

file 'Gemfile', %{
source 'http://rubygems.org'

gem 'rails', '#{Rails::VERSION::STRING}'

group :development do
  gem 'ruby-debug'
end
}.strip

The next step is get bundler to load correctly. This is done in two stages. First, in config\preinitializer.rb bundler needs to be setup. This adds all the bundled gems to the ruby load path, but doesn't initialise them.

  
append_file '/config/preinitializer.rb', %{
begin
  # Require the preresolved locked set of gems.
  require File.expand_path('../../.bundle/environment', __FILE__)
rescue LoadError
  # Fallback on doing the resolve at runtime.
  require "rubygems"
  require "bundler"
  if Bundler::VERSION <= "0.9.5"
    raise RuntimeError, "Bundler incompatible.\n" +
      "Your bundler version is incompatible with Rails 2.3 and an unlocked bundle.\n" +
      "Run `gem install bundler` to upgrade or `bundle lock` to lock."
  else
    Bundler.setup
  end
end
}.strip

Second, the rails boot process is modified to start the bundler environment. This 'requires' all gems in the bundle, letting them run initialisation code.

gsub_file 'config/boot.rb', "Rails.boot!", %{
  
class Rails::Boot
 def run
   load_initializer
   extend_environment
   Rails::Initializer.run(:set_load_path)
 end

 def extend_environment
   Rails::Initializer.class_eval do
     old_load = instance_method(:load_environment)
     define_method(:load_environment) do
       Bundler.require :default, Rails.env
       old_load.bind(self).call
     end
   end
 end
end

Rails.boot!
}

All that's left now is a little cleaning up. The .bundle folder should never be checked into the code repository as it holds machine-local configuration, so it's added to .gitignore. Finally, bundle install is run to fetch the bundled gems.

append_file '/.gitignore', %{
/.bundle
}

run 'bundle install'

And that's it. I hope you find it useful.

Rails 3 direct column reader

Whilst trying to get my head around arel and it's relationship to ActiveRecord in rails 3, I've updated the simple ColumnReader class I introduced last year. It lets you read the (correctly cast) column values for an ActiveRecord class, without the overhead of instantiating each object.

Here's the updated code:

module ColumnReader
  def column_reader(column_name, options = {})
    name = options.delete(:as) || column_name.to_s.pluralize
    column = columns_hash[column_name.to_s]
  
    self.module_eval %{
      def self.#{name}
        query = scoped.arel.project(arel_table[:#{column_name}])
        connection.select_all(query.to_sql).collect do |value| 
          v = value.values.first
          #{column.type_cast_code('v')}
        end
      end
    }
  end

  ActiveRecord::Base.extend(self)
end

The code isn't that different, though using scoped over construct_finder_sql feels a lot nicer. If you've got suggestions for improvement gist away.

Usage is similar to before, only using the new rails 3 syntax:

class Animal < ActiveRecord::Base
  column_reader 'id'
  column_reader 'name'  
 
  named_scope :dangerous, :conditions => {:carnivorous => true} 
end

Animal.names 
#=> ['Lion', 'Tiger', 'Zebra', 'Gazelle']
 
Animal.limit(1).names 
#=> ['Lion'] (Normal finder options supported)
 
Animal.dangerous.names 
#=> ['Lion', 'Tiger'] (Scoping respected)
 
Animal.ids
#=> [1, 2, 3] (Values cast correctly)

I'm still not entirely convinced of the value of this helper, so if you find a good use tweet me. Enjoy!

How to easily use Rails 3 now

Update 10th February 2010:

The instructions below were useful earlier in the development cycle. Now the beta gem has been released, the process is much easier:
gem uninstall bundler
gem install tzinfo builder memcache-client rack rack-test rack-mount 
gem install erubis mail text-format thor bundler i18n
gem install rails --pre

Now that rails 3 is getting closer to release, I wanted to start playing around with it. I've seen a few articles on getting it up and running, but they all seemed a little bit complicated. To use rails 2.3.5 before its release, I just built the gems myself and installed them. It turns out you can easily do the same with rails 3.

First, install rails main dependencies:

gem install rake rack bundler
gem install arel --version 0.2.pre

Next, get the latest rails code from github, and install the rails gems. There may be a few errors you can safely ignore:

git clone git://github.com/rails/rails.git
cd rails
rake install

And bang, you can start your first rails 3 project:

rails ~/apps/playground/rails3 

Your existing projects shouldn't be affected as rails is installed as a prerelease gem, but to be safe I'd recommend a tool like rvm to switch to a clean set of gems.

Building rails gems from the 2-3-stable branch

For the latest application I've been working on, I wanted to use Michael Koziarski's rails_xss plugin, to turn default escaping on in my erb templates. I'm also using wycats gem bundler to manage gems and their dependencies, including rails.

This posed a problem: xss_rails requires changes made in rails 2-3-stable branch, but not yet released in a gem (though they will be included in 2.3.5).

The solution was obvious: build my own gems, and get bundler to use them. Luckily, rails makes this an easy process.

First, clone rails from github, and change to the 2-3-stable branch:

git clone git://github.com/rails/rails.git
cd rails
git co -b 2-3-stable origin/2-3-stable

Next, we need to build the gems. Rails currently doesn't seem to have a Raketask to build all its constituent projects (though I'm planning a patch to include one), so you have to build each one in turn:

cd actionmailer
rake gem PKG_BUILD=1
cd ../actionpack
rake gem PKG_BUILD=1
cd ../activerecord
rake gem PKG_BUILD=1
cd ../activeresource
rake gem PKG_BUILD=1
cd ../activesupport
rake gem PKG_BUILD=1
cd ../railties
rake gem PKG_BUILD=1
cd ..

The key is the PKG_BUILD variable. It appends a suffix to the rails version, so rather than building 2.3.4 (the version I checked out), it will build 2.3.4.1. If I decided to update my gems, I'd use PKG_BUILD=2, then 3 and so on.

Finally, once all these gems are built, it's simply a case of finding them and using them. For gem bundler, this means placing them in the cache and updating the Gemfile to look for rails '2.3.4.1'. The gems are all built in pkg folders in their respective subprojects, so to copy them all somewhere else you can run:

cp **/pkg/*.gem <project-folder>/gems/cache

A rails template for gem bundler

Update 28th February 2010:

Bundler has changed a lot since I first wrote this template, so I've written a new version. Please use the updated version rather than the one below.

Following Nick Quaranto's article 'Gem Bundler is the Future', I was inspired to try out bundler on my latest rails project. Previously, I've found rails' own gem management a massive headache. In contrast, using bundler has been a pleasure.

Getting it set up how I wanted took a fair bit of experimentation, so to make things easier both for me and the wider community, I've made a rails template to do the hard work.

Give it a try by running the following. You should be up and running in a couple of minutes:

rails -m http://github.com/tomafro/dotfiles/raw/master/resources/rails/bundler.rb <project>

That will give you a bundled project, ready for you to add your own gems. Here's what each step of the template actually does:

Gem bundler is itself a gem. It can't be used to manage itself, so to ensure that all environments use the same version, the first step is to install it right into the project:

inside 'gems/bundler' do  
  run 'git init'
  run 'git pull --depth 1 git://github.com/wycats/bundler.git' 
  run 'rm -rf .git .gitignore'
end

Just having bundler installed is no good without any way to run it; a script is needed. Once this is installed the local bundler can be run with script/bundle <options>:

file 'script/bundle', %{
#!/usr/bin/env ruby
path = File.expand_path(File.join(File.dirname(__FILE__), "..", "gems/bundler/lib"))
$LOAD_PATH.unshift path
require 'rubygems'
require 'rubygems/command'
require 'bundler'
require 'bundler/commands/bundle_command'
Gem::Commands::BundleCommand.new.invoke(*ARGV)
}.strip

run 'chmod +x script/bundle'

Bundler uses Gemfiles to declare which gems are required in each environment. This simple Gemfile includes rails in all environments, and ruby-debug in all environments other than production:

file 'Gemfile', %{
clear_sources
source 'http://gemcutter.org'

disable_system_gems

bundle_path 'gems'

gem 'rails', '#{Rails::VERSION::STRING}'
gem 'ruby-debug', :except => 'production'
}.strip

Most of the files bundler will place in the gem path can be regenerated; they shouldn't be added to the project repository. The only things that should be added are the .gem files themselves, and the local copy of bundler. All the rest should be ignored:

append_file '.gitignore', %{
gems/*
!gems/cache
!gems/bundler}

The bundle script needs to be run for the first time:

run 'script/bundle'

Finally rails needs to be modified to ensure the bundler environment is loaded. This is done it two parts. First, a preinitializer is added to load the bundler's environment file before anything else:

append_file '/config/preinitializer.rb', %{
require File.expand_path(File.join(File.dirname(__FILE__), "..", "gems", "environment"))
}

Second, rails initialization process is hijacked to require the correct bundler environment:

gsub_file 'config/environment.rb', "require File.join(File.dirname(__FILE__), 'boot')", %{
require File.join(File.dirname(__FILE__), 'boot')

# Hijack rails initializer to load the bundler gem environment before loading the rails environment.

Rails::Initializer.module_eval do
  alias load_environment_without_bundler load_environment
  
  def load_environment
    Bundler.require_env configuration.environment
    load_environment_without_bundler
  end
end
}

And that's it. The project is now fully bundled. More gems can be added to the Gemfile and pulled into the project with script/bundle.

Tip: The case for from_param

There's one small method I add to every new rails project I work on:

module Tomafro::FromParam
  def from_param(param)
    self.first :conditions => { primary_key => param }
  end
end

ActiveRecord::Base.extend(Tomafro::FromParam)

In my controllers, where you might use Model.find(params[:id]) or Model.find_by_id(params[:id), I use Model.from_param(params[:id]) instead.

All three methods have almost the same behaviour, the only difference being the handling of missing records. find throws a RecordNotFound, while find_by_id and from_param return nil. So why use from_param over the others?

The answer comes when you want to change to_param, the method rails uses to turn a record into a parameter. It's a good principal (though often broken) not to expose database ids in urls. An example might be to use a users nickname rather than their id in user urls, so /users/12452 becomes /users/tomafro. In rails this is easy to achieve, by overriding the to_param method:

class User < ActiveRecord::Base
  def to_param
    self.nickname
  end
end

Rails will automatically use this method when generating routes, so users_path(@user) will return /users/tomafro as we'd like. If I was using find or find_by_id in my controllers, I'd then have to go through each one and change it to find_by_nickname. Luckily though, I've used from_param, so whenever I override to_param I just have to remember to provide an equivalent implementation for from_param, and my controllers will work without modification:

class User < ActiveRecord::Base
  def self.from_param(param)
    self.first :conditions => {:nickname => param}
  end
  
  def to_param
    self.nickname
  end
end

I've been doing this for years, but it's hardly a new principle, to provide a from method for every to method. There's even an old ticket on trac asking for it, but it's been considered too trivial to add.

I disagree - for me the value comes from having the method from the start, not when you need it. Luckily it's easy to add to my own projects.

Quickly list missing foreign key indexes

Run this code in a rails console to list foreign keys which aren't indexed.

c = ActiveRecord::Base.connection
c.tables.collect do |t|  
  columns = c.columns(t).collect(&:name).select {|x| x.ends_with?("_id" || x.ends_with("_type"))}
  indexed_columns = c.indexes(t).collect(&:columns).flatten.uniq
  unindexed = columns - indexed_columns
  unless unindexed.empty?
    puts "#{t}: #{unindexed.join(", ")}"
  end
end

This list will look something like this:

attachments: parent_id, asset_id
domain_names: organisation_id
event_memberships: user_id, event_id
events: editor_id
group_actions: user_id, group_id
groups: user_id
icons: parent_id
invitations: sender_id
legacy_actions: item_upon_id
news_items: author_id
organisations: midas_id
pages: author_id
pending_event_memberships: invitation_id, event_id
resources: user_id, resourceable_id
subscriptions: subscribable_id, user_id
taggings: tag_id, taggable_id, user_id

For each column in the list, ask yourself why you don't need an index.

Update: Andrew Coleman has added output in migration format. If you want to play around with it further, here's the original code on gist.

Using indexes in rails: Choosing additional indexes

The first part in this series of posts looked at adding indexes to foreign keys, to improve the performance when navigating rails associations. But many queries involve data other than just foreign keys. With the judicious use of indexes, we can improve these too.

Let's take the conversations table used in the first article, and add a column to hold the language, and some timestamps. Here's the full schema:

create_table conversations do |table|
  table.string   :subject, :null => false
  table.integer  :user_id, :null => false
  table.integer  :subject_id
  table.string   :subject_type
  table.string   :language_code
  table.datetime :created_at
  table.datetime :updated_at
end

We want to split conversations by their languages, so we'll add a named_scope to the Conversation class:

class Conversation
  belongs_to :user
  belongs_to :subject, :polymorphic => true
  
  named_scope :in_language, lambda {|language| 
    { :conditions => {:language_code => language}}
  }
end

Using Conversation.in_language 'en' will now get us all conversations in English. Like we did for foreign keys, we can see how long the query takes and read the explain plan.

mysql> SELECT * FROM conversations WHERE language_code = 'en';
90791 rows in set (3.94 sec)

mysql> EXPLAIN SELECT * FROM conversations WHERE language_code = 'en';
+-------------+------+---------------+---------+-------+---------+-------------+
| select_type | type | key           | key_len | ref   | rows    | Extra       |
+-------------+------+---------------+---------+-------+---------+-------------+
| SIMPLE      | ref  | NULL          | NULL    | NULL  | 1000111 | Using where | 
+-------------+------+---------------+---------+-------+---------+-------------+
1 row in set (0.00 sec)

Adding an index to the language_code column should improve the query performance, so let's do that and see the effect on our query:

mysql> SELECT * FROM conversations WHERE language_code = 'en';
90791 rows in set (3.02 sec)

mysql> EXPLAIN SELECT * FROM conversations WHERE language_code = 'en';
+-------------+------+---------------+---------+-------+---------+-------------+
| select_type | type | key           | key_len | ref   | rows    | Extra       |
+-------------+------+---------------+---------+-------+---------+-------------+
| SIMPLE      | ref  | lang_code_ix  | 3       | const |   98345 | Using where | 
+-------------+------+---------------+---------+-------+---------+-------------+
1 row in set (0.00 sec)

So the query now uses the index, and the time taken has gone from almost 4 seconds to just over 3. That's not nearly as big a performance gain as before, but why? The answer is in the number of rows returned: 90791. The index helps the database find the relevant rows quickly. However, it still has to read those rows, and reading over 90,000 rows will always take a significant amount of time.

In a real app we're unlikely to want to read all these rows at once, so let's do another quick comparison limiting the query to the first 100 rows:

Without the index:  

mysql> SELECT * FROM conversations WHERE language_code = 'en' LIMIT 100;
100 rows in set (1.32 sec)

And with the index:

mysql> SELECT * FROM conversations WHERE language_code = 'en' LIMIT 100;
100 rows in set (0.01 sec)

Much better.

Choosing between indexes

We've seen that by using an index and limiting the number of results we can quickly get the 'first' 100 English conversations. But in this case 'first' doesn't really mean anything. When no order clause is specified, MySQL may appear to order its results by id, but this is just a coincidence and shouldn't be relied on. Let's instead order our results by created_at to get the 100 most recent conversations.

mysql> SELECT * FROM conversations WHERE language_code = 'en' ORDER BY created_at DESC;
100 rows in set (4.42 sec)

mysql> EXPLAIN SELECT * FROM conversations WHERE language_code = 'en' ORDER BY created_at DESC;
+-------------+------+---------------+---------+-------+---------+-----------------------------+
| select_type | type | key           | key_len | ref   | rows    | Extra                       |
+-------------+------+---------------+---------+-------+---------+-----------------------------+
| SIMPLE      | ref  | lang_code_ix  | 3       | const |   98345 | Using where; using filesort | 
+-------------+------+---------------+---------+-------+---------+-----------------------------+
1 row in set (0.00 sec)

Even though this query uses our index and only returns 100 rows, it has still taken almost 4.5 seconds! The reason for this terrible performance is hinted in the extra information in the explain plan: using filesort. The database is reading all rows that match the condition (all 90791 of them), then using a filesort to order them before returning the first 100.

If we add an index on created_at and do the query again we get:

mysql> SELECT * FROM conversations WHERE language_code = 'en' ORDER BY created_at DESC;
100 rows in set (4.39 sec)

mysql> EXPLAIN SELECT * FROM conversations WHERE language_code = 'en' ORDER BY created_at DESC;
+-------------+------+---------------+---------+-------+---------+-----------------------------+
| select_type | type | key           | key_len | ref   | rows    | Extra                       |
+-------------+------+---------------+---------+-------+---------+-----------------------------+
| SIMPLE      | ref  | lang_code_ix  | 3       | const |   98345 | Using where; using filesort | 
+-------------+------+---------------+---------+-------+---------+-----------------------------+
1 row in set (0.00 sec)

It's pretty much exactly the same - still almost 4.5 seconds. This is because MySQL can only use one index per table in a query. It has to choose between the index on language_code and the one on created_at, and in this case chooses the language code index. We can force it to use our other index for comparison:

mysql> SELECT * FROM conversations 
       USE INDEX(created_ix) WHERE language_code = 'en' 
       ORDER BY created_at DESC LIMIT 100;
100 rows in set (0.02 sec)

mysql> EXPLAIN SELECT * FROM conversations 
       USE INDEX(created_ix) WHERE language_code = 'en' 
       ORDER BY created_at DESC LIMIT 100;
+-------------+------+---------------+---------+-------+---------+-----------------------------+
| select_type | type | key           | key_len | ref   | rows    | Extra                       |
+-------------+------+---------------+---------+-------+---------+-----------------------------+
| SIMPLE      | ref  | created_ix    | 8       | const | 9903411 | Using where                 | 
+-------------+------+---------------+---------+-------+---------+-----------------------------+
1 row in set (0.00 sec)

Using a trick stolen from Pratik Naik (in the comments of his article) we can force the use of a particular index in rails with a special named scope, and perform our query:

Conversation.named_scope :use_index, lambda {|index| 
  {:from => "#{quoted_table_name} USE INDEX(#{index})"}
}

Conversation.in_language('en').use_index('created_ix').all(:order => 'created_at desc')

But there is also another way - using indexing multiple columns.

Using compound indexes

A compound index indexes across two or more columns. When defining a compound index, the order of the columns is significant, as the database reduces the set of candidate rows by comparing the columns in turn. So an index created with add_index :conversations, [language_code, created_at] will compare created_at first, then language_code.

Because of this, we need to take some care in choosing the order of our columns. In general, the rule is to specify the most selective column first. That is, the column with the most unique values. So for our query, we'll add the following:

add_index :conversations, [created_at, language_code]

If we explain the query without forcing the index we find it is still efficient:

mysql> EXPLAIN SELECT * FROM conversations 
       WHERE language_code = 'en' 
       ORDER BY created_at DESC LIMIT 100;
+-------------+------+----------------+---------+-------+---------+-----------------------------+
| select_type | type | key            | key_len | ref   | rows    | Extra                       |
+-------------+------+----------------+---------+-------+---------+-----------------------------+
| SIMPLE      | ref  | lang_and_ca_ix |      48 | const |  640231 | Using where                 | 
+-------------+------+----------------+---------+-------+---------+-----------------------------+
1 row in set (0.00 sec)

A technique for choosing index column order

Sometimes it's hard to know which order your columns should be in an index, but there's an easy way to get a rough idea. Rewrite the query, removing all conditions, and selecting count(distinct column_to_index) for each column. So for our query, we'd do the following:

mysql> SELECT count(distinct language_code), count(distinct created_at)
       FROM conversations;
+-------------------------------+----------------------------+
| count(distinct language_code) | count(distinct created_at) |
+-------------------------------+----------------------------+
|                            21 |                     584089 | 
+-------------------------------+----------------------------+
1 row in set (1.90 sec)

From this it's clear that there are more distinct created_at values, so we probably want to index this column first. Note though that I said probably. When deciding on indexes, there are no hard and fast rules. Instead, we need to measure and analyse the queries used in our particular app, to ensure we are using the best indexes.

The next (and last) article in the series will go through some more advanced techniques, including when not to add an index, and how to spot redundant indexes.

Using indexes in rails: Index your associations

Many rails developers are great at building applications but have limited experience in database design. As a consequence, projects often have half-baked indexing strategies, and as a result suffer bad performance.

To try and improve this I've planned a series of posts on indexes, targetted at rails developers. In this first post I'll introduce indexes and how to index your associations, then I'll write about choosing additional indexes to improve query performance, and finally how to avoid redundant and duplicate indexes.

A brief overview of database indexes

Wikipedia states that 'a database index is a data structure that improves the speed of operations on a database table'. Unfortunately, this improvement comes at a cost.

For every index on a table, there is a penalty both when inserting and updating rows. Indexes also take up space on disk and in memory, which can affect the efficiency of queries. Finally, having too many indexes can cause databases to choose between them poorly, actually harming performance rather than improving it.

So while indexing is important, we shouldn't just throw indexes at our slow queries: we need to choose carefully how to index our data.

Indexing simple associations

By far the most common performance problem I've encountered in rails projects is a lack of indexes on foreign keys. There's no real excuse for this - not indexing foreign keys can cripple your app.

Take the following schema:

create_table users do |table|
  table.string :login
end

create_table conversations do |table|
  table.string  :subject, :null => false
  table.integer :user_id, :null => false
end

We can use this to map a one-to-many relationship between users and conversations, where user_id as the foreign key.

Here are the models to do that:

class User < ActiveRecord::Base
  has_many :conversations
end

class Conversation < ActiveRecord::Base
  belongs_to :user
end

With these models, to find all conversations for a particular user we'd use user.conversations, which in turns uses sql like this:

SELECT * FROM conversations WHERE user_id = 41;

I can run this query on a test database which I've randomly populated with 1,000,000 rows, to see how long it takes. Note, I've cut out the actual results as they are unimportant:

mysql> SELECT * FROM conversations WHERE user_id = 41;
12 rows in set (1.42 sec)

mysql> EXPLAIN SELECT * FROM conversations WHERE user_id = 41;
+-------------+------+---------------+---------+-------+---------+-------------+
| select_type | type | key           | key_len | ref   | rows    | Extra       |
+-------------+------+---------------+---------+-------+---------+-------------+
| SIMPLE      | ALL  | NULL          | NULL    | NULL  | 1001111 | Using where | 
+-------------+------+---------------+---------+-------+---------+-------------+
1 row in set (0.00 sec)

Although the query is simple, it took 1.42 seconds. The key column show the key or index that MySQL decided to use, in this case NULL as there are no indexes. The rows column is also relevant. It shows that MySQL will need to look at around 1,000,000 rows; that's a lot of data being loaded and compared.

What a difference just an index makes

If we then add an index on user_id:

add_index :conversations, :user_id, :name => 'user_id_ix'

And do the same select:

mysql> SELECT * FROM conversations WHERE user_id = 41;
12 rows in set (0.01 sec)

mysql> EXPLAIN SELECT * FROM conversations WHERE user_id = 41;
+-------------+------+---------------+---------+-------+---------+-------------+
| select_type | type | key           | key_len | ref   | rows    | Extra       |
+-------------+------+---------------+---------+-------+---------+-------------+
| SIMPLE      | ref  | used_id_ix    | 5       | const |  108    | Using where | 
+-------------+------+---------------+---------+-------+---------+-------------+
1 row in set (0.00 sec)

The difference is remarkable. From over 1.4 seconds to about 1 hundredth. Unless you have a cast-iron reason not to, index your foreign keys.

Indexing polymorphic associations

So for simple associations, we can add an index on the foreign_key column. For polymorphic associations the foreign key is made up of two columns, one for the id and one for the type. Let's add another association to our models to illustrate this.

add_column :conversations, :subject_id, :integer
add_column :conversations, :subject_type, :string

create_table :artworks do |table|
  table.string :title
end

class Artwork < ActiveRecord::Base
  has_one :conversation, :as => :subject
end

class Conversation < ActiveRecord::Base
  belongs_to :subject, :polymorphic => true
end

Here we've added an association between Artwork and Conversation, where an artwork can be the subject of a conversation. From an artwork, we can find the related conversation (if any) with artwork.conversation which will use the following SQL:

SELECT * FROM conversations WHERE subject_id = 196 and subject_type = 'Artwork';

Again the query takes around 1.4 seconds without any indexes. Now though, we have a choice on what to index. We can index either subject_type on its own, subject_id on its own, or both together.

Let's try each in turn, and measure the performance.

First, an index on just subject_type:

  
mysql> SELECT * FROM conversations WHERE subject_id = 196 and subject_type = 'Artwork';
12 rows in set (0.31 sec)

mysql> EXPLAIN SELECT * FROM conversations WHERE subject_id = 196 and subject_type = 'Artwork'
+-------------+------+---------------+---------+-------+---------+-------------+
| select_type | type | key           | key_len | ref   | rows    | Extra       |
+-------------+------+---------------+---------+-------+---------+-------------+
| SIMPLE      | ref  | sub_type_ix   | 5       | const | 89511   | Using where | 
+-------------+------+---------------+---------+-------+---------+-------------+
1 row in set (0.00 sec)

An index on just subject_id:

  
mysql> SELECT * FROM conversations WHERE subject_id = 196 and subject_type = 'Artwork';
12 rows in set (0.01 sec)

mysql> EXPLAIN SELECT * FROM conversations WHERE subject_id = 196 and subject_type = 'Artwork'
+-------------+------+---------------+---------+-------+---------+-------------+
| select_type | type | key           | key_len | ref   | rows    | Extra       |
+-------------+------+---------------+---------+-------+---------+-------------+
| SIMPLE      | ref  | sub_id_ix     | 5       | const | 204     | Using where | 
+-------------+------+---------------+---------+-------+---------+-------------+
1 row in set (0.00 sec)

An index on subject_id, subject_type:

mysql> SELECT * FROM conversations WHERE subject_id = 196 and subject_type = 'Artwork';
12 rows in set (0.01 sec)

mysql> EXPLAIN SELECT * FROM conversations WHERE subject_id = 196 and subject_type = 'Artwork'
+-------------+------+--------------------+---------+-------+---------+-------------+
| select_type | type | key                | key_len | ref   | rows    | Extra       |
+-------------+------+--------------------+---------+-------+---------+-------------+
| SIMPLE      | ref  | sub_id_and_type_ix | 5       | const | 5       | Using where | 
+-------------+------+--------------------+---------+-------+---------+-------------+
1 row in set (0.00 sec)

So subject_type compared ~90,000 rows in 0.31 seconds, subject_id compared ~200 rows in 0.01 seconds and subject_id, subject_type compared 4 rows also in 0.01 seconds. We should add an index to subject_id, subject_type as so:

add_index :conversations, [:subject_id, :subject_type]

Wrapping up

This should give a basic overview of indexes and the performance improvements they can give. Hopefully I've shown that foreign_keys should always be indexed, and how to index them. The next article (which I hope to publish later this week) will explain more about how to reason about indexes, and how to identify additional indexes (beyond those on foreign keys) to add.

Pimp my script/console

For a long time I've had an .irbrc file and a .railsrc file, setting up some simple helpers methods in my irb and script/console sessions. Today though, I wanted to add some more helpers specific to the project I'm working on. Specifically, I wanted to be able to use my machinist blueprints, to help me play around with some models.

Adding machinist isn't as simple as just requiring my blueprints though -- they require my ActiveRecord models, which aren't available when .irbrc is loaded. Luckily the solution is simple -- just add a couple of lines to the configuration in environment.rb:

Rails::Initializer.run do |config|
  if defined?(IRB)
    config.gem 'faker'
    config.gem 'notahat-machinist', :lib => 'machinist', :source => "http://gems.github.com"
    IRB.conf[:IRB_RC] = Proc.new { require File.join(RAILS_ROOT, "config", "console") }
  end
end

The key part is the line starting IRB.conf[:IRB_RC], which tells irb to run the given when the session starts. Luckily, this happens after rails has finished initializing. I've set it to require config/console.rb, to which I can add all sorts of configuration and helpers, knowing it will only get loaded in script/console sessions where I want these shortcuts, not in my general code. Here it is:

require File.expand_path(File.dirname(__FILE__) + "/../spec/blueprints.rb")

def tomafro
  Account.find_by_login("tomafro")
end

def bbi
  Client.find_by_name("Big Bad Industries")
end

Read ActiveRecord columns directly from the class

Sometimes you want to read just a single column from a collection of records, without the overhead of instantiating each and every one. You could just execute raw SQL, but it's a shame to do away with the nice type conversion ActiveRecord provides. It'd also be a pity to get rid of find scoping, amongst other goodness.

Enter Tomafro::ColumnReader:

module Tomafro::ColumnReader
  def column_reader(column_name, options = {})
    name = options.delete(:as) || column_name.to_s.pluralize
    column = columns_hash[column_name.to_s]
    
    self.module_eval %{
      def self.#{name}(options = {})
        merged = options.merge(:select => '#{column_name}')
        connection.select_all(construct_finder_sql(merged)).collect do |value| 
          v = value.values.first
          #{column.type_cast_code('v')}
        end
      end
    }
  end
end

Once you've extended ActiveRecord::Base with it, usage is simple. In your models, declare which columns you want access to:

ActiveRecord::Base.extend Tomafro::ColumnReader
 
class Animal < ActiveRecord::Base
  column_reader 'id'
  column_reader 'name'  
 
  named_scope :dangerous, :conditions => {:carnivorous => true} 
end

Once you've done this, you can access values directly from the class, respecting scope, limits and other finder options.

Animal.names 
#=> ['Lion', 'Tiger', 'Zebra', 'Gazelle']
 
Animal.names :limit => 1 
#=> ['Lion'] (Normal finder options supported)
 
Animal.dangerous.names 
#=> ['Lion', 'Tiger'] (Scoping respected)
 
Animal.ids
#=> [1, 2, 3] (Values cast correctly)

Automatching rails paths in cucumber

If you're using cucumber as part of your testing, you probably have a paths.rb file that looks something like this:

module NavigationHelpers
  def path_to(page_name)
    case page_name
    
    when /the home page/
      root_path
    when /the new client page/
      new_client_path
    when /the clients page/
      clients_path    
    # Add more page name => path mappings here
    else
      raise "Can't find mapping from \"#{page_name}\" to a path.\n" +
      "Now, go and add a mapping in features/support/paths.rb"
    end
  end
end

World(NavigationHelpers)

This let's us use nice descriptive names in our scenarios, but it starts to become a pain when we add more and more paths. So how can we make it better?

By automatically matching some rails paths. Here's the code:

module NavigationHelpers
  def path_to(page_name)
    case page_name
    
    when /the home page/
      root_path   
    # Add more page name => path mappings here
    else
      if path = match_rails_path_for(page_name) 
        path
      else 
        raise "Can't find mapping from \"#{page_name}\" to a path.\n" +
        "Now, go and add a mapping in features/support/paths.rb"
      end
    end
  end

  def match_rails_path_for(page_name)
    if page_name.match(/the (.*) page/)
      return send "#{$1.gsub(" ", "_")}_path" rescue nil
    end
  end
end

World(NavigationHelpers)

What it does is pretty simple. Given a page name the clients page (with no other matches defined) it will try and send clients_path. If successful, then it returns the result, otherwise nil.

Not the biggest improvement in the world, but it's made my cucumber tests just a little bit easier to write.