Infinite sequences in ruby

One feature of harmonia is tasks that recur on a schedule, e.g. every Thursday, or on the 30th day of each month. For these tasks we need to know not just when they’ll next occur, but also things like the next 4 occurrences, or all occurrences this month.

To do this we’ve used a technique more common in clojure: using an infinite sequence.

Defining simple infinite sequences

Ruby 1.9 and above let us define infinite sequences using the Enumerator class. A simple example is the sequence of integers:

integers = Enumerator.new do |yielder|
  n = 0
  loop do
    yielder.yield n
    n = n + 1
  end
end

>> integers.take(5)
=> [0, 1, 2, 3, 4]
>> integers.take(10)
=> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

Here’s how this works. The block passed to Enumerator.new defines our sequence. It takes a yielder argument with a special method #yield, used to return elements in the sequence. Whenever #yield is called, execution of the block stops. Execution only restarts if more elements are needed, making the sequence lazy. The Enumerator class handles this stopping and starting execution — we only need to worry about how to generate each element.

Most of the code above is concerned with looping and yielding values, not generating them. We can factor this out, giving us a method that makes it trivial to define new sequences:

def sequence(&generator)
  Enumerator.new do |yielder|
    n = 0
    loop do
      yielder.yield generator.call(n)
      n = n + 1
    end
  end
end

>> integers = sequence {|n| n}
>> integers.take(10)
=> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

>> squares = sequence {|n| n * n}
>> squares.take(10)
=> [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

When using infinite sequences laziness is extremely important, as it’s impossible to generate all members of an infinite list in anything less than infinite time. A sequence that outputs whenever a new value is calculated shows this laziness in action:

integers = sequence {|n| puts "Calculating result #{n}"; n}

>> integers.take(3)
Calculating result 0
Calculating result 1
Calculating result 2
=> [1, 2, 3]

Now we can define sequences, how can we use them? We’ve already seen that we can #take any number of elements from our sequence. We can also use #take_while to take elements until a condition is met, such as finding all square numbers under 250:

>> squares.take_while {|n| n < 250}
=> [0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225]

In fact all Enumerable methods are available, but we have to take care in using them. As our sequences are infinite, any method that iterates over all members has the potential to take an infinite amount of time. For example calling #any? will either return true (if a matching element exists) or never return.

Another big drawback is that when we call Enumerable methods, laziness isn’t preserved. Suppose we want the first 5 odd square numbers. We might try the following:

>> squares.select {|n| n % 2 == 1}.take(5)

Unfortunately this will never return. Even though we only want a finite set of results, the call to #select operates on the full infinite sequence before it returns. #take(5) is never called. The same problem exists with #map, #drop, #reject and more.

Preserving laziness across derived sequences

Without laziness preservation, our sequences seem of limited use. In ruby 2.0 we can use #lazy to make our sequences lazier, but in 1.9 this isn’t available to us. Thankfully we can get around this by generating lazy versions of Enumerable methods ourselves. Let’s take the previous example, finding the first 5 odd square numbers. We hit a roadblock because #select never returned. If instead of using #select we use a new Enumerator to do our selecting, we can work around this:

odd_squares = Enumerator.new do |yielder|
  squares.each do |square|
    yielder.yield square if (square % 2 == 1)
  end
end

>> odd_squares.take(5)
=> [1, 9, 25, 49, 81]
>> odd_squares.take(10)
=> [1, 9, 25, 49, 81, 121, 169, 225, 289, 361]

Our new Enumerator iterates lazily through our original sequence, yielding only odd values. We’ve chained two enumerators together to preserve laziness.

This is all a bit cumbersome as is, but we can turn this into a ‘#select’ method on a new LazyEnumerator class:

class LazyEnumerator < Enumerator
  def select(&block)
    self.class.new do |yielder|
      each do |value|
        yielder.yield value if block.call(value)
      end
    end
  end
end

def lazy_sequence(&generator)
  LazyEnumerator.new do |yielder|
    n = 0
    loop do
      yielder.yield generator.call(n)
      n = n + 1
    end
  end
end

>> lazy_squares = lazy_sequence {|n| n * n}
>> lazy_squares.select {|n| n % 2 == 1}.take(5)
=> [1, 9, 25, 49, 81]

#reject and #map can be chained in a similar way to #select:

class LazyEnumerator < Enumerator
  def reject(&block)
    self.class.new do |yielder|
      each do |value|
        yielder.yield value unless block.call(value)
      end
    end
  end

  def map(&block)
    self.class.new do |yielder|
      each do |value|
        yielder.yield block.call(value)
      end
    end
  end
end

#drop and #drop_while are slightly more complicated, but follow a similar pattern. The main difference being that they need to keep trap of how much to drop:

class LazyEnumerator < Enumerator
  def drop(n)
    self.class.new do |yielder|
      dropped_enough = false
      dropped = 0
      each do |element, index|
        dropped_enough ||= dropped >= n
        yielder.yield element if dropped_enough
        dropped = dropped + 1
      end
    end
  end

  def drop_while(&block)
    self.class.new do |yielder|
      match_found = false
      each do |element|
        match_found ||= !block.call(element)
        yielder.yield element if match_found
      end
    end
  end
end

Together, these methods give us a LazyEnumerator that can be chained in a great number of ways, giving our sequences a lot of power. #take and #drop let us select which members of a sequence we’re interested, while #select, #reject and #map allow us to build new sequences from existing ones:

>> integers = lazy_sequence {|n| n}
>> squares = integers.map {|n| n * n }
>> odd_squares = squares.select {|n| n % 2 == 1}
>> odd_squares.drop(10).take(10)
=> [441, 529, 625, 729, 841, 961, 1089, 1225, 1369, 1521]

Using only a few simple methods, we’ve been able to answer a complicated (though contrived) question, what are the second ten odd square numbers? This particular answer may not be interesting, but the technique of defining and deriving infinite sequences is much more general and useful. This is only a small sample of what you can do with them.

Deploying a rails app from scratch using recap

If you follow our company blog you’ll know that we’re working on Harmonia, our virtual office manager. I thought I’d explain how we use recap to deploy harmonia, to show how easy and fast recap makes application deployment.

Harmonia is a fairly standard rails application. As well as a web front-end, it has two other processes. A queue worker is used to send outgoing emails, whilst the core of the application is the ticker; a process which ‘ticks’ every minute, assigning tasks to team members. We use foreman to declare these processes in the following Procfile:

web: bundle exec unicorn -p $PORT -c unicorn.conf.rb
ticker: bundle exec rails runner script/ticker.rb
worker: bundle exec rake environment resque:work QUEUE=assignments VVERBOSE=1

All of these processes touch application code, so whenever we deploy a new version of the app (which we do frequently) they need to be restarted. Our app also has a database with associated migrations, uses environment variables like DATABASE_URL for configuration, and has a number of gem dependencies managed by bundler.

This is all handled by recap.

Getting started – adding recap to the project

Using recap with a rails project is simple. First add gem 'recap' to the Gemfile and run bundle install. Next run bundle exec recap setup, which will generate a Capfile, guessing values for the git repository and app name. You should check these values and change the server to point to your app server. As an example, the complete Capfile for harmonia is shown below:

require 'recap/recipes/rails'

set :application, 'harmonia'
set :repository, 'git@github.com:freerange/harmonia.git'

server 'bison.harmonia.io', :app

Applications deployed with recap need their own user, owning all files and processes. Assuming we can ssh into our server and are listed as a sudoer, we can create this user automatically running cap bootstrap. This will also add our own ssh user to the application group, allowing it to deploy the application.

Next we can set any environment variables we need for configuration. These are loaded in the application user’s .profile, so are available to all processes started by recap. In harmonia we set our smtp credentials, the server port, some api keys and more, using commands like cap env:set PORT=7000 and cap env:set SMTP_PASSWORD=secret.

The app is now almost ready to deploy. We can prepare it for deployment with cap deploy:setup, which clones the code repository, installs our gem bundle, sets up the database and precompiles our assets.

Finally, running cap deploy will start the app for the first time, launching each process defined in the Procfile with the environment variables we previously set.

Really fast deployments

While recap makes it very easy to get apps up and running the first time, it comes into its doing subsequent deployments. At Go Free Range we like to deploy apps we’re working on very frequently. One thing that helps ensure we do this is making each deployment as fast as it can be.

Using git as recap does is already a very quick way to get code changes onto servers, but recap takes things a step further. By testing to see which files have changed it knows which tasks can be skipped. For example, database migrations won’t be run if db/schema.rb has not changed; the gem bundle won’t be re-installed unless Gemfile.lock has been updated, and foreman process scripts won’t be exported if the Procfile is unchanged. In fact, if these files don’t exist, these tasks will never run at all.

The future

Using recap with Harmonia has made our deployment process very fast and simple. When the main harmonia server became over-burdened and we decided to commission a new machine dedicated to harmonia, recap made that process quick and painless. As well as harmonia, recap is also used to deploy the Go Free Range website, this blog, and a number of other small sites and projects where it has proven itself well. For larger projects, there are some features (such as more control as to what processes run where) that are missing, but I plan to add these in the next release. For all other sites, recap has proven itself a lightweight and capable alternative to the standard Capistrano deployment recipes.

Past, Present and Future

For me and the rest of Go Free Range, the big event this week was the public launch of Inside Government, the government focussed part of GOV.UK. Inside Government aims to replace the majority of government department and agency websites with a single place for all news, policies, publications and consultations. On Wednesday, the first two department websites (for the Department for Transport and the Department for Communities and Local Government) were moved across.

We’ve been working on this project for 14 months, from the initial commit, through the beta launch right up until today. I’ve enjoyed it, but it has been extremely hard work. And although there are hundreds of things I’d still like to change, there are hundreds more I’m proud of.

Harmonia

As well as Inside Government, we’ve also been slowly inviting people to Harmonia, our ‘virtual office manager’. In case you haven’t read the Go Free Range blog, Harmonia is a tool we use to assign all the tasks needed to keep our company running. Rather than use a schedule, each piece of work is assigned in a way designed to be both random and fair.

One variation or another of Harmonia has been helping us for well over a year, but now we’ve built a web app for everyone to use. It’s still a bit rough around the edges, but we’d love it if people signed up and gave us feedback. We’re building this because we want to learn how to build better products, for the future of Go Free Range.

Tip: Bundler with --binstubs

In a previous blog, I wrote how I’d aliased commands such as rake, cap and rspec to run either with or without bundle exec, based on the presence of a Gemfile. I gave up on that a while ago. Instead, I’ve started installing all my bundles like this:

bundle install --path .bundle/gems --binstubs .bundle/bin

I often use features like bundle open <gem> to debug and edit failing gems, so I like to keep each application’s gems isolated. The --path .bundle/gems installs them within an application’s .bundle directory. As well as isolating my gems, it has the added benefit that I can blow away the gemset with rm -rf .bundle

The --binstubs .bundle/bin option installs bundle-aware scripts for each command provided by a bundled gem. For example, a bundle including rake will generate a .bundle/bin/rake script. By adding ./.bundle/bin to the front of my environment PATH, the bundled version of rake will run when I’m in the application folder. I never have to type bundle exec!

Obviously typing that long bundle install command each time is tedious, so I’ve aliased it to bi:

alias bi='bundle install --path .bundle/gems --binstubs .bundle/bin'

I’ve been using these options for a few months, and so far I’m very happy with them.

Working inside government

Since September, I and the other Go Free Range guys have been working for the government. We’ve been helping the able and growing Government Digital Service to build whitehall, our code name for the Inside Government section of gov.uk. Yesterday we removed the password and opened our doors to the public.

For those who don’t know, gov.uk is the UK government’s attempt to not only consolidate government websites to a single domain, but also build these sites in the right way. That means public, open source, continual delivery and more.

While the first part of gov.uk focussed on helping citizens, Inside Government is about the business of government. What policies the government has, how it’s implementing them, not to mention the news and speeches, publications, consultations and other things the government does each day.

It’s only the first release of many, a small glimpse into what will one day be. How it ends up will be shaped by feedback from the people who use it.

Geohash toy: code released

A couple of weeks ago I wrote about a small toy app I’d written to explore geohashes. Now I’ve cleaned the code up a little, upgraded it to rails 3.1 and released it here on github. Enjoy.