JRuby: Asynchronous background job processing via Sidekiq using PhantomJS to create analytic screenshots
In this article I’ll demo some code that uses Sidekiq for asynchronous background job processing. Since I like using headless browsers, I decided to create a worker that uses PhantomJS to create a screenshot for each page request, and a Sinatra front-end with a Google Analtyics-like JS include.
Install dependencies, as necessary:
brew install phantomjs
brew install redis
# and start redis
redis-server /usr/local/etc/redis.conf &
Define (J)Ruby version, file: .ruby-version
jruby-1.7.13
Create a Gemfile for Ruby dependencies. Install via “bundle install”.
source 'https://rubygems.org'
gem 'sinatra'
gem 'puma'
gem 'sidekiq'
gem 'selenium-webdriver'
Create the worker class. file: worker.rb
#!/usr/bin/env jruby
# include dependencies
require 'selenium-webdriver'
require 'sidekiq'
require 'date'
class SeleniumPhantomjsWorker
# set browser agent + custom string (to detect)
BROWSER_USER_AGENT = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.153 Safari/537.36; SeleniumPhantomjsWorker"
include Sidekiq::Worker
def perform(url)
return nil if url.nil?
# define path to save screenshots
save_path = './public/' + DateTime.now.strftime('%Y%m%d%H%M%S%L') + '_' + url.gsub(/\W/, '_') + '.png'
# setup selenium, phantomjs, & browser user agent
capabilities = Selenium::WebDriver::Remote::Capabilities.phantomjs("phantomjs.page.settings.userAgent" => BROWSER_USER_AGENT)
driver = Selenium::WebDriver.for :phantomjs, :desired_capabilities => capabilities
# got to url, take a screenshot, and quit
driver.navigate.to url
driver.save_screenshot save_path
driver.quit
true
end
end
Create a Rack config.ru for the Sidekiq worker admin dashboard. file: worker.config.ru
require 'sidekiq'
require 'sidekiq/web'
run Sidekiq::Web
Start worker and dashboard via rackup
# dashboard
rackup worker.config.ru -p 9494
# worker
sidekiq -r ./worker.rb
Create the Sinatra web services. file: web_services.rb
require 'sinatra'
require 'sinatra/base'
require './worker.rb'
class WebServices < Sinatra::Base
# helper method to create a random word
def random_word
(0...10).map { ('a'..'z').to_a[rand(26)] }.join
end
# redirect user to random page
get '/' do
redirect to("/page/#{random_word}")
end
# get page request with random word argument (for demo purposes)
get '/page/:arg' do
# check if request is coming from worker
# without this, worker will generate an infinite loop of background jobs
worker_request = SeleniumPhantomjsWorker::BROWSER_USER_AGENT == request.env['HTTP_USER_AGENT'] ? true : false
params[:arg] ||= ""
random_words = (1..10).map { random_word }
erb :index, locals: { arg: params[:arg], random_words: random_words, worker_request: worker_request }
end
# get page request to simulate javascript analytics include (like Google Analytics)
get '/analytics.js' do
# call worker async method
SeleniumPhantomjsWorker.perform_async request.referrer
# return val/type:
content_type :js
"OK"
end
end
Create a Rack config.ru for Sinatra. file: web_services.config.ru
require './web_services.rb'
run WebServices
Create Sinatra layout view. file: views/layout.erb
<!DOCTYPE html>
<html>
<head>
<title>Analytics</title>
</head>
<body>
<%= yield %>
</body>
</html>
Create Sinatra view for demo page, including JS include. file: views/index.erb
<h1>Page :: <%= arg %></h1>
<ul>
<% random_words.each do |word| %>
<li><a href='/page/<%= word %>'>Page :: <%= word %></a></li>
<% end %>
</ul>
<% unless worker_request %>
<script type="text/javascript">
var analytics_id = 12345;
(function() {
var a = document.createElement('script'); a.type = 'text/javascript'; a.async = true;
a.src = ('https:' == document.location.protocol ? 'https://' : 'http://') + '127.0.0.1:4567/analytics.js?id=' + analytics_id;
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(a, s);
})();
</script>
<% end %>
Start Sinatra web services
rackup web_services.config.ru -p 4567
Browse to Sinatra via http://localhost:4567, and click around on the randomly generated links. Each page request calls analytics.js as a JS include, which asynchronously calls the worker. The worker dumps screenshots into the public folder.
Browse to the Sidekiq dashboard to check out the job queues, http://localhost:9494
Next steps: create an admin dashboard that shows the screenshots realtime via a websocket connection…