Cron Jobs and Rails

Cron

If you’re reading this article, it’s probably because you’ve heard of cron jobs, cron tasks, or crontab. Cron is a piece of software written for *nix-type operating systems to help with the scheduling of recurring tasks. You may want to use cron to schedule certain recurring actions in your Rails application, such as checking each day for expired accounts or clearing out expired sessions from your database.

It’s pretty easy to start working with cron jobs. You can start editing your cron tasks using the crontab command:

1
crontab -e

This will open up a text file in your terminal’s default editor (probably vim or nano). You can change the editor that is used by prefixing the crontab command with an adjustment to the EDITOR variable:

1
EDITOR=nano crontab -e

Remember that your scheduled tasks will be run as the user that you use when you invoke crontab.

Once you’re in the editor, you can start creating and editing cron tasks. To schedule a task, you need to define when the task will run, then define the command to trigger that task. The scheduling syntax has five parts, separated by spaces (diagram taken from drupal.org):

1
2
3
4
5
6
7
# +---------------- minute (0 - 59)
# |  +------------- hour (0 - 23)
# |  |  +---------- day of month (1 - 31)
# |  |  |  +------- month (1 - 12)
# |  |  |  |  +---- day of week (0 - 6) (Sunday=0)
# |  |  |  |  |
  *  *  *  *  *  command to be executed

The parts represent, in this order, the minute(s), hour(s), day(s) of the month, month(s), and day(s) of the week to run the command. You can use the asterisk (*) to represent every unit of the part in question. So using * * * * * schedules your task to run every minute of every hour on every day of every month and on every day of the week. In addition to the asterisk, you can use comma-delimited integers and names of the week. You can also replace the 5 parts with a single predefined schedule, as explained in the wikipedia article on Cron. Once you have your schedule defined, you can follow it with any valid bash command:

1
2
3
4
5
# Every day at midnight and noon, quiely retrieve index.html from example.com, # using wget:
0 0,12 * * * /usr/bin/wget -O -q -t 1 http://example.com/index.html

# Same task, but just once every day
@daily /usr/bin/wget -O -q -t 1 http://example.com/index.html

Running Rails Application Code via Cron

It can be useful to run system and server tasks with cron, but a lot of times you’ll need to run Rails application code on a schedule. You can do this the hard way and set up a controller action (e.g. CronJobsController#some_task) that triggers application code to run, then set up a cron task to send a GET request to that action (e.g. /usr/bin/wget -O -q - -t 1 http://example.com/cron_jobs/some_task), or you can do it the easy way and run the code directly via cron. Rails has runners and rake tasks to facilitate this:

Rake Tasks from Cron

To run an existing rake task, you can change your directory to your application root, then call the rake task:

1
0 0 * * * cd /my/app/root && /path/to/bundle exec rake some_task

You can find the path to your bundle executable by typing which bundle from within your application’s root folder.

There’s no need to limit your rake tasks to ones that are already available via Rails or whatever framework and gems you happen to be using. You can write your own rake tasks and use those in your cron jobs as well. You can find more information about how to write your own rake tasks in the Rails Guides.

Runners

Another way to run code in your Rails app directly from cron is by using Rails runners. To execute rails code using rails runner, all you have to do is call rails runner "ruby code":

1
0 * * * * cd /my/app/root && /path/to/bundle exec rails runner -e production "Model.long_running_method"

The -e production sets the environment to “production”, and can be altered as needed. The Model.long_running_method portion represents a class method that can be called from within your Rails application. Using rails runner to run the code loads up your application into memory first, then evaluates the ruby code in the Rails environment. When the task completes, the runner exits.

Debugging Custom Code

To debug a runner or a custom rake task, you can print to STDOUT from within your code (using puts or similar methods). Within crontab, make sure to set the MAILTO variable to your email address so that you’ll receive that output.

1
2
MAILTO=email@example.com
* * * * * /usr/bin/some_command

As long as the server is set up properly to send outgoing email, you’ll get emailed the output.

The Whenever Gem

One difficulty with using cron is that it is awkward to maintain and store your cron jobs. The syntax can be cumbersome to beginners as well, and it’s easy to make mistakes in setting up your schedules and your executables’ paths. One way to overcome these difficulties is to use the whenever gem. Just add gem 'whenever', require: false to your Gemfile and run bundle install. Then within your application’s root directory, run the following command:

1
bundle exec wheneverize

This will add a schedule file to your config folder (config/schedule.rb). This file is where you set up your scheduled tasks. You can use regular cron scheduling, or you can use a more idiomatic DSL for scheduling. The following examples are take from the gem’s README.md:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
every 3.hours do
  runner "MyModel.some_process"
  rake "my:rake:task"
  command "/usr/bin/my_great_command"
end

every 1.day, :at => '4:30 am' do
  runner "MyModel.task_to_run_at_four_thirty_in_the_morning"
end

every :hour do # Many shortcuts available: :hour, :day, :month, :year, :reboot
  runner "SomeModel.ladeeda"
end

every :sunday, :at => '12pm' do # Use any day of the week or :weekend, :weekday
  runner "Task.do_something_great"
end

every '0 0 27-31 * *' do
  command "echo 'you can use raw cron syntax too'"
end

# run this task only on servers with the :app role in Capistrano
# see Capistrano roles section below
every :day, :at => '12:20am', :roles => [:app] do
  rake "app_server:task"
end

Once you’ve set up your scheduled tasks in config/schedule.rb, you still need to apply them to your crontab. From your terminal, use the whenever command to update your crontab:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# Change to your application's directory.
cd /my/app

# View config/schedule.rb converted to cron syntax
bundle exec whenever

# Update crontab
bundle exec whenever -i

# Overwrite the whole crontab (be careful with this one!)
bundle exec whenever -w

# See all the options for the whenever command
bundle exec whenever -h

There is a capistrano extension for whenever, so if you are using capistrano to deploy your application, make sure you take advantage of it.

Pitfalls of Crontab

While using the whenever gem can fix some of the issues with using cron to manage your application’s recurring tasks, there are several reasons you may want to avoid using cron tasks at all:

  • It’s easy to forget to set up cron tasks on servers.
  • If you are running multiple servers, you may not want them all to run the cron tasks — doing so can cause conflicts or can at least require you to try to handle semaphores/locks in your code. This can get messy fast.
  • Cron schedules are maxed out at minute level. Scheduling for multiple times a minute is impossible.
  • If your server goes down, it could miss a critical scheduled task, so it often becomes a necessity to run cron more frequently and have the code account for missed runs.
  • Cron tasks can be difficult to debug. It is common practice to suppress all output, but by doing so you are possibly taking away a lot of good debugging information.
  • Each time a runner or rake task is run on a Rails app, the entire app has to be loaded into memory, which can take a substantial amount of CPU time and a substantial amount of memory. If your cron tasks start overlapping, you can quickly run out of RAM and cause your server to go kaput.

Despite the above issues, there are some cases where running a cron task is perfectly acceptable and possible better than some of the alternatives. When it is not absolutely essential that the task be run consistently, a cron job can be a quick and easy solution that doesn’t require a lot of setup. Combined with a whenever schedule.rb file and a deployment strategy that maintains your cron tasks, it can be a viable strategy.

Alternatives to Crontab

Sidekiq and Sidetiq

In cases where cron tasks don’t quite cut it, there are alternatives. My personal favorite is a combination of Sidekiq and Sidetiq. Understand, though, that these aren’t simple to setup. They require a redis server and additional code. When you run sidekiq, your application is loaded into memory and runs tasks as you schedule them in your code. This is nice because the code is constantly running and can quickly pick up new tasks and process them without additional overhead. When multiple application servers point to a single redis server (or server cluster) endpoint, you can also have multiple sidekiq instances pointing to that same redis server and have them pick off tasks as they have availability. This is great for scaling and (mostly) eliminates issues of locking, semaphores, and race conditions.

The Heroku Scheduler

Another alternative to cron tasks is the Heroku Scheduler, if your application is running on Heroku. As is noted in the documentation, the Heroku Scheduler does not guarantee that tasks will be run, so you are encouraged to include a custom clock in your application to make sure that tasks are run. It also has a limit of running at most once every 10 minutes, so you may need to use a worker dyno and create your own looping mechanism in a long running task to make sure tasks are run as frequently as they should be.

Conclusion

While cron jobs are useful, and can get the job done in many instances, you should be careful in your decision of how to implement scheduled and delayed tasks. Cron jobs are best for tasks that are not highly critical or essential to your application’s functionality or performance. They should also usually execute in a reasonably short amount of time so that they don’t bog down your server(s). When you have scheduled tasks of a critical nature, or tasks that need to be run more than once per minute, tools such as Heroku’s worker dynos or Sidekiq are very performant and viable solutions.

Advertisements