Fetching/parsing iTunes RSS feeds with Nokogiri, storing the data in MongoDB, and displaying price sparks via Sinatra
Recently I’ve been tracking price drops in iTunes apps, so I thought I’d roll my own feed parser in Ruby and share the code. In this blog post I’ll demonstrate the following: MongoDB for document database storage, MongoID for the Ruby library, Curb for feed fetching, Nokogiri/Nori for XML parsing, Sinatra for a simple web server, and Google charts for a price spark image.
Installed MongoDB via Homebrew
# install
brew install mongodb
# started service
mongod
Created a new project Gemfile, file: Gemfile
source 'http://rubygems.org'
gem 'mongoid', '~> 3.0'
gem 'curb'
gem 'nokogiri'
gem 'nori'
gem 'sinatra'
gem 'googlecharts'
Installed gems
bundle
Created a MongoID config file, file: mongoid.yml
development:
sessions:
default:
database: itunes_feeds
hosts:
- localhost:27017
Created a mongo include file to define class structure for mongo objects, file: mongo.rb
require 'mongoid'
# load mongo conf
Mongoid.load!('mongoid.yml', :development)
class FeedItem
include Mongoid::Document
embeds_many :feedItemPrices
end
class FeedItemPrice
include Mongoid::Document
embedded_in :feedItem
end
Created a simple class to fetch, parse, and store the iTunes feed data, file: itunes_feed_fetcher.rb
require 'rubygems'
require 'curb'
require 'nokogiri'
require 'nori'
require './mongo.rb'
class ItunesFeedFetcher
def initialize
@feed_count = 300
@feed_url = "https://itunes.apple.com/us/rss/toppaidapplications/limit=#{@feed_count}/xml"
@feed_items_new = 0
@feed_items_updated = 0
end
def fetch
# curl feed url
curld = Curl::Easy.perform @feed_url
# convert rss feed xml to hash
nori = Nori.new(:parser => :nokogiri)
@feed_data = nori.parse curld.body_str
end
def process
return nil if @feed_data.nil?
@feed_data['feed']['entry'].each do |entry|
# get itunes id
itunes_id = entry['id']
# check if entry exists in database
existing = FeedItem.where(itunes_id: itunes_id).first
if !existing.nil?
# check if entry has been updated
if entry['updated'].utc >= existing['updated'].utc
# todo: update entry details
end
# get entry price, minus dollar sign
entry_price = entry['im:price'].scan(/[0-9.]+/).first
# create feed item price record
fip = existing.feedItemPrices.create({created: Time.now, price: entry_price})
fip.save
@feed_items_updated += 1
else
# get entry price, minus dollar sign
entry_price = entry['im:price'].scan(/[0-9.]+/).first
# remove entry price, will be embedded instead
entry.delete 'im:price'
# set itunes id to entry
entry['itunes_id'] = itunes_id
# create new feed item record
fi = FeedItem.new entry
fi.save
# create feed item price record
fip = fi.feedItemPrices.create({created: Time.now, price: entry_price})
fip.save
@feed_items_new += 1
end
end
end
def report
"New: #{@feed_items_new}<br/>Updated: #{@feed_items_updated}"
end
end
Created a simple sinatra website with 2 urls (“/” and “/fetch”), file: sinatra.rb
#!/usr/bin/env ruby
require 'rubygems'
require 'sinatra'
require 'gchart'
require './mongo.rb'
require './itunes_feed_fetcher.rb'
get '/' do
output = '<table>'
FeedItem.each do |fi|
output += "<tr>"
output += "<td><img src='#{fi['im:image'][0]}' /></td>"
output += "<td><b>#{fi['im:name']}</b></td>"
#output += "<td>#{fi['content']}</td>"
# collect feed item prices
prices = fi.feedItemPrices.collect {|fip| fip['price'].to_f}
# create google chart image url
chart_url = Gchart.sparkline(:data => prices, :size => '120x40', :line_colors => '0077CC')
output += "<td><img src='#{chart_url}' /></td>"
output += "</tr>"
end
output += "</table>"
output
end
get '/fetch' do
iff = ItunesFeedFetcher.new
iff.fetch
iff.process
iff.report
end
I then started the sinatra app:
./sinatra.rb
Browsing to http://localhost:4567/fetch fetches, parses, and stores the data in mongo. sample output:
New: 0
Updated: 300
Browsing to http://localhost:4567/ shows the feed items with a price spark image. As you can see I randomized the price spark data to make it more interesting.