Fetching/parsing iTunes RSS feeds with Nokogiri, storing the data in MongoDB, and displaying price sparks via Sinatra
Recently I’ve been tracking price drops in iTunes apps, so I thought I’d roll my own feed parser in Ruby and share the code. In this blog post I’ll demonstrate the following: MongoDB for document database storage, MongoID for the Ruby library, Curb for feed fetching, Nokogiri/Nori for XML parsing, Sinatra for a simple web server, and Google charts for a price spark image.
Installed MongoDB via Homebrew
# install
brew install mongodb
# started service
mongodCreated a new project Gemfile, file: Gemfile
source 'http://rubygems.org'
gem 'mongoid', '~> 3.0'
gem 'curb'
gem 'nokogiri'
gem 'nori'
gem 'sinatra'
gem 'googlecharts'Installed gems
bundleCreated a MongoID config file, file: mongoid.yml
development:
  sessions:
    default:
      database: itunes_feeds
      hosts:
        - localhost:27017Created a mongo include file to define class structure for mongo objects, file: mongo.rb
require 'mongoid'
# load mongo conf
Mongoid.load!('mongoid.yml', :development)
class FeedItem
  include Mongoid::Document
  embeds_many :feedItemPrices
end
class FeedItemPrice
  include Mongoid::Document
  embedded_in :feedItem
endCreated a simple class to fetch, parse, and store the iTunes feed data, file: itunes_feed_fetcher.rb
require 'rubygems'
require 'curb'
require 'nokogiri'
require 'nori'
require './mongo.rb'
class ItunesFeedFetcher
  def initialize
    @feed_count = 300
    @feed_url = "https://itunes.apple.com/us/rss/toppaidapplications/limit=#{@feed_count}/xml"
    @feed_items_new = 0
    @feed_items_updated = 0
  end
  def fetch
    # curl feed url
    curld = Curl::Easy.perform @feed_url
    # convert rss feed xml to hash
    nori = Nori.new(:parser => :nokogiri)
    @feed_data = nori.parse curld.body_str
  end
  def process
    return nil if @feed_data.nil?
    @feed_data['feed']['entry'].each do |entry|
      # get itunes id
      itunes_id = entry['id']
      # check if entry exists in database
      existing = FeedItem.where(itunes_id: itunes_id).first
      if !existing.nil?
        # check if entry has been updated
        if entry['updated'].utc >= existing['updated'].utc
          # todo: update entry details
        end
        # get entry price, minus dollar sign
        entry_price = entry['im:price'].scan(/[0-9.]+/).first
        # create feed item price record
        fip = existing.feedItemPrices.create({created: Time.now, price: entry_price})
        fip.save
        @feed_items_updated += 1
      else
        # get entry price, minus dollar sign
        entry_price = entry['im:price'].scan(/[0-9.]+/).first
        # remove entry price, will be embedded instead
        entry.delete 'im:price'
        # set itunes id to entry
        entry['itunes_id'] = itunes_id
        # create new feed item record
        fi = FeedItem.new entry
        fi.save
        # create feed item price record
        fip = fi.feedItemPrices.create({created: Time.now, price: entry_price})
        fip.save
        @feed_items_new += 1
      end
    end
  end
  def report
    "New: #{@feed_items_new}<br/>Updated: #{@feed_items_updated}"
  end
endCreated a simple sinatra website with 2 urls (“/” and “/fetch”), file: sinatra.rb
#!/usr/bin/env ruby
require 'rubygems'
require 'sinatra'
require 'gchart'
require './mongo.rb'
require './itunes_feed_fetcher.rb'
get '/' do
  output = '<table>'
  FeedItem.each do |fi|
    output += "<tr>"
    output += "<td><img src='#{fi['im:image'][0]}' /></td>"
    output += "<td><b>#{fi['im:name']}</b></td>"
    #output += "<td>#{fi['content']}</td>"
    # collect feed item prices
    prices = fi.feedItemPrices.collect {|fip| fip['price'].to_f}
    # create google chart image url
    chart_url = Gchart.sparkline(:data => prices, :size => '120x40', :line_colors => '0077CC')
    output += "<td><img src='#{chart_url}' /></td>"
    output += "</tr>"
  end
  output += "</table>"
  output
end
get '/fetch' do
  iff = ItunesFeedFetcher.new
  iff.fetch
  iff.process
  iff.report
endI then started the sinatra app:
./sinatra.rbBrowsing to http://localhost:4567/fetch fetches, parses, and stores the data in mongo. sample output:
New: 0
Updated: 300
Browsing to http://localhost:4567/ shows the feed items with a price spark image. As you can see I randomized the price spark data to make it more interesting.
