JRuby: Asynchronous background job processing via Sidekiq using PhantomJS to create analytic screenshots

In this article I'll demo some code that uses Sidekiq for asynchronous background job processing. Since I like using headless browsers, I decided to create a worker that uses PhantomJS to create a screenshot for each page request, and a Sinatra front-end with a Google Analtyics-like JS include.

Install dependencies, as necessary:

brew install phantomjs
brew install redis

# and start redis
redis-server /usr/local/etc/redis.conf &

Define (J)Ruby version, file: .ruby-version

jruby-1.7.13

Create a Gemfile for Ruby dependencies. Install via "bundle install".

source 'https://rubygems.org'

gem 'sinatra'
gem 'puma'
gem 'sidekiq'
gem 'selenium-webdriver'

Create the worker class. file: worker.rb

#!/usr/bin/env jruby

# include dependencies
require 'selenium-webdriver'
require 'sidekiq'
require 'date'

class SeleniumPhantomjsWorker

  # set browser agent + custom string (to detect)
  BROWSER_USER_AGENT = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.153 Safari/537.36; SeleniumPhantomjsWorker"

  include Sidekiq::Worker

  def perform(url)
    return nil if url.nil?

    # define path to save screenshots
    save_path = './public/' + DateTime.now.strftime('%Y%m%d%H%M%S%L') + '_' + url.gsub(/\W/, '_') + '.png'

    # setup selenium, phantomjs, & browser user agent
    capabilities = Selenium::WebDriver::Remote::Capabilities.phantomjs("phantomjs.page.settings.userAgent" => BROWSER_USER_AGENT)
    driver = Selenium::WebDriver.for :phantomjs, :desired_capabilities => capabilities

    # got to url, take a screenshot, and quit
    driver.navigate.to url
    driver.save_screenshot save_path
    driver.quit

    true
  end
end

Create a Rack config.ru for the Sidekiq worker admin dashboard. file: worker.config.ru

require 'sidekiq'
require 'sidekiq/web'

run Sidekiq::Web

Start worker and dashboard via rackup

# dashboard
rackup worker.config.ru -p 9494

# worker
sidekiq -r ./worker.rb

Create the Sinatra web services. file: web_services.rb

require 'sinatra'
require 'sinatra/base'
require './worker.rb'

class WebServices < Sinatra::Base

  # helper method to create a random word
  def random_word
    (0...10).map { ('a'..'z').to_a[rand(26)] }.join
  end

  # redirect user to random page
  get '/' do
    redirect to("/page/#{random_word}")
  end

  # get page request with random word argument (for demo purposes)
  get '/page/:arg' do

    # check if request is coming from worker
    # without this, worker will generate an infinite loop of background jobs
    worker_request = SeleniumPhantomjsWorker::BROWSER_USER_AGENT == request.env['HTTP_USER_AGENT'] ? true : false

    params[:arg] ||= ""
    random_words = (1..10).map { random_word }
    erb :index, locals: { arg: params[:arg], random_words: random_words, worker_request: worker_request }
  end

  # get page request to simulate javascript analytics include (like Google Analytics)
  get '/analytics.js' do

    # call worker async method
    SeleniumPhantomjsWorker.perform_async request.referrer

    # return val/type:
    content_type :js
    "OK"
  end

end

Create a Rack config.ru for Sinatra. file: web_services.config.ru

require './web_services.rb'

run WebServices

Create Sinatra layout view. file: views/layout.erb

<!DOCTYPE html>
<html>
  <head>
    <title>Analytics</title>
  </head>
  <body>

    <%= yield %>

  </body>
</html>

Create Sinatra view for demo page, including JS include. file: views/index.erb

<h1>Page :: <%= arg %></h1>
<ul>
  <% random_words.each do |word| %>
    <li><a href='/page/<%= word %>'>Page :: <%= word %></a></li>
  <% end %>
</ul>

<% unless worker_request %>
<script type="text/javascript">

  var analytics_id = 12345;

  (function() {
    var a = document.createElement('script'); a.type = 'text/javascript'; a.async = true;
    a.src = ('https:' == document.location.protocol ? 'https://' : 'http://') + '127.0.0.1:4567/analytics.js?id=' + analytics_id;
    var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(a, s);
  })();

</script>
<% end %>

Start Sinatra web services

rackup web_services.config.ru -p 4567

Browse to Sinatra via http://localhost:4567, and click around on the randomly generated links. Each page request calls analytics.js as a JS include, which asynchronously calls the worker. The worker dumps screenshots into the public folder.

Browse to the Sidekiq dashboard to check out the job queues, http://localhost:9494

Sidekiq dashboard

Next steps: create an admin dashboard that shows the screenshots realtime via a websocket connection...