Fetching/parsing iTunes RSS feeds with Nokogiri, storing the data in MongoDB, and displaying price sparks via Sinatra

Recently I've been tracking price drops in iTunes apps, so I thought I'd roll my own feed parser in Ruby and share the code. In this blog post I'll demonstrate the following: MongoDB for document database storage, MongoID for the Ruby library, Curb for feed fetching, Nokogiri/Nori for XML parsing, Sinatra for a simple web server, and Google charts for a price spark image.

Installed MongoDB via Homebrew

# install
brew install mongodb

# started service
mongod

Created a new project Gemfile, file: Gemfile

source 'http://rubygems.org'

gem 'mongoid', '~> 3.0'
gem 'curb'
gem 'nokogiri'
gem 'nori'
gem 'sinatra'
gem 'googlecharts'

Installed gems

bundle

Created a MongoID config file, file: mongoid.yml

development:
  sessions:
    default:
      database: itunes_feeds
      hosts:
        - localhost:27017

Created a mongo include file to define class structure for mongo objects, file: mongo.rb

require 'mongoid'

# load mongo conf
Mongoid.load!('mongoid.yml', :development)

class FeedItem
  include Mongoid::Document
  embeds_many :feedItemPrices
end

class FeedItemPrice
  include Mongoid::Document
  embedded_in :feedItem
end

Created a simple class to fetch, parse, and store the iTunes feed data, file: itunes_feed_fetcher.rb

require 'rubygems'
require 'curb'
require 'nokogiri'
require 'nori'

require './mongo.rb'

class ItunesFeedFetcher

  def initialize
    @feed_count = 300
    @feed_url = "https://itunes.apple.com/us/rss/toppaidapplications/limit=#{@feed_count}/xml"
    @feed_items_new = 0
    @feed_items_updated = 0
  end

  def fetch

    # curl feed url
    curld = Curl::Easy.perform @feed_url

    # convert rss feed xml to hash
    nori = Nori.new(:parser => :nokogiri)
    @feed_data = nori.parse curld.body_str

  end

  def process

    return nil if @feed_data.nil?

    @feed_data['feed']['entry'].each do |entry|

      # get itunes id
      itunes_id = entry['id']

      # check if entry exists in database
      existing = FeedItem.where(itunes_id: itunes_id).first

      if !existing.nil?

        # check if entry has been updated
        if entry['updated'].utc >= existing['updated'].utc
          # todo: update entry details
        end

        # get entry price, minus dollar sign
        entry_price = entry['im:price'].scan(/[0-9.]+/).first

        # create feed item price record
        fip = existing.feedItemPrices.create({created: Time.now, price: entry_price})
        fip.save

        @feed_items_updated += 1

      else

        # get entry price, minus dollar sign
        entry_price = entry['im:price'].scan(/[0-9.]+/).first

        # remove entry price, will be embedded instead
        entry.delete 'im:price'

        # set itunes id to entry
        entry['itunes_id'] = itunes_id

        # create new feed item record
        fi = FeedItem.new entry
        fi.save

        # create feed item price record
        fip = fi.feedItemPrices.create({created: Time.now, price: entry_price})
        fip.save

        @feed_items_new += 1

      end

    end

  end

  def report
    "New: #{@feed_items_new}<br/>Updated: #{@feed_items_updated}"
  end

end

Created a simple sinatra website with 2 urls ("/" and "/fetch"), file: sinatra.rb

#!/usr/bin/env ruby

require 'rubygems'
require 'sinatra'
require 'gchart'

require './mongo.rb'
require './itunes_feed_fetcher.rb'

get '/' do

  output = '<table>'
  FeedItem.each do |fi|
    output += "<tr>"
    output += "<td><img src='#{fi['im:image'][0]}' /></td>"
    output += "<td><b>#{fi['im:name']}</b></td>"
    #output += "<td>#{fi['content']}</td>"

    # collect feed item prices
    prices = fi.feedItemPrices.collect {|fip| fip['price'].to_f}

    # create google chart image url
    chart_url = Gchart.sparkline(:data => prices, :size => '120x40', :line_colors => '0077CC')
    output += "<td><img src='#{chart_url}' /></td>"

    output += "</tr>"
  end
  output += "</table>"
  output

end

get '/fetch' do
  iff = ItunesFeedFetcher.new
  iff.fetch
  iff.process
  iff.report
end

I then started the sinatra app:

./sinatra.rb

Browsing to http://localhost:4567/fetch fetches, parses, and stores the data in mongo. sample output:
New: 0
Updated: 300

Browsing to http://localhost:4567/ shows the feed items with a price spark image. As you can see I randomized the price spark data to make it more interesting.

itunes feeds

Source code on Github