Setting up a Ruby on Rails project with faceted Solr search integration using Sunspot and acts-as-taggable-on

Avatar-eric-london
Created by Eric.London on 2012-04-10
Tags:
New Comment
 
Please note: the content on this page orginates from ericlondon.com.
In this article I'll show how to setup a Rails project with faceted solr searching integration. This code uses the following: sunspot gem for Solr integration, and acts-as-taggable-on for tagging and search facets.

RVM/Rails Setup

$ mkdir solrfacets

# create rvm gemset
$ echo "rvm use --create ruby-1.9.2@solrfacets" > solrfacets/.rvmrc

$ cd solrfacets

# install rails
$ gem install rails

# create new rails project
$ rails new .

# version control
$ git init
$ git add .
$ git commit -am "new rails project"


Add gems

# file: Gemfile, added:
gem 'acts-as-taggable-on'
gem 'sunspot_rails'
gem 'sunspot_solr', :groups => [:development, :test]

# installing gems
$ bundle


Create default scaffolding for a Post model

$ rails generate scaffold Post title:string content:text


Add tags property to Post model

# file: app/models/post.rb

 class Post < ActiveRecord::Base
   attr_accessible :content, :title
+  acts_as_taggable_on :tags
 end


Run acts-as-taggable-on migration

$ rails generate acts_as_taggable_on:migration


Setup/create database

$ rake db:migrate


Part 2, Random Data

The model is now setup to create Posts with a title, content, and array of tags. For demonstration purposes, I decided to create a rake task to populate the content attribute with lorem ipsum text, and the tags with random words from /usr/share/dict/words.

Modified the Post model to enable :tag_list as mass assignable

# file: app/models/post.rb

 class Post < ActiveRecord::Base
-  attr_accessible :content, :title
+  attr_accessible :content, :title, :tag_list
   acts_as_taggable_on :tags
 end


Added lorem gem

# file: Gemfile
gem 'lorem', :groups => [:development]

# installing
$ bundle


Created a ruby rake script to create 20 Posts with 20 random tag words

# file: lib/tasks/create_random_posts_and_tags.rake

namespace :db do
  desc "Create random posts and tags."
  task :create_random_posts_and_tags => :environment do
    
    # count the number of lines in the dictionary
    dict_word_count = `wc -l /usr/share/dict/words | awk '{print $1}'`.to_i
    
    # get 100 random words for the facets
    facet_words = 100.times.map{ `sed $(echo #{Random.rand(dict_word_count)})"q;d" /usr/share/dict/words`.strip! }
    
    # create 20 random posts
    (1..20).each do |i|

      post = Post.create!({
        :title => "Post #{i}",
        :content => Lorem::Base.new('paragraphs', 1).output,
        :tag_list => 20.times.map{ facet_words[rand(facet_words.size)] },
      })
      
    end
    
  end
end


Executed rake task to create posts

$ rake db:create_random_posts_and_tags


Part 3, Solr Sunspot

Generate default configuration

$ rails generate sunspot_rails:install


Add code to index Post data. In this code, I added ":stored => true" to each property to: 1. avoid querying Active Record on the search results page; and 2. to enable matches highlighting.

# file: app/models/post.rb

 class Post < ActiveRecord::Base
   attr_accessible :content, :title, :tag_list
   acts_as_taggable_on :tags
+
+  searchable :auto_index => true, :auto_remove => true do
+    string :title, :stored => true
+    text :content, :stored => true
+    string :tag_list, :multiple => true, :stored => true
+  end
+
 end


Setup Solr development server via Jetty

# start solr
$ rake sunspot:solr:start 

# index data
$ rake sunspot:solr:reindex


At this point, you should be able to browse and query the solr search results and verify the structure of the indexed data. Example URL: http://localhost:8982/solr/select/?q=*:*
Querying solr directly

Add a new Search controller

$ rails generate controller Search search


Revised search controller to be named route

# file: config/routes.rb

-  get "search/search" 
+  get 'search' => 'search#search', :as => 'search'


Define the search controller method. I set the controller to pass 2 instance variables to the view: @search and @hits. @hits contains the stored values, allowing us to query solr directly, instead of Active Record.

# file: app/controllers/search_controller.rb

class SearchController < ApplicationController
  def search

    # only search if keyword has been entered
    if params[:keywords].nil? || params[:keywords].empty?
      @hits = []
    else
      @search = Post.search do
        fulltext params[:keywords] do
          highlight :content
        end
        facet :tag_list
        paginate :per_page => 10
        
        # tags, AND'd        
        if params[:tag].present?
          all_of do
            params[:tag].each do |tag|
              with(:tag_list, tag)
            end
          end
        end
        
      end
      @hits = @search.hits
      
    end    
  end
end


Define the search view. This code contains the following sections: search form, search results (@hits with matches highlighting), and facets generation. I set the facets as an array, to allow the user to select multiple.

# file: app/views/search/search.html.erb

<h1>Search#search</h1>

<!-- FORM: -->
<%= form_tag search_path, :method => :get do %>
  <%= text_field_tag :keywords, params[:keywords] %>
  <%= submit_tag "Search", :name => nil %>
<% end %>

<!-- SEARCH RESULTS: -->
<% if @hits.any? %>
  <h2>Search Results</h2>
  <ul>
    <% @hits.each do |hit| %>
      <li>
        <%= link_to hit.stored(:title), post_path(hit.primary_key) %><br/>
        <% hit.highlights(:content).each do |highlight| %>          
          <%= highlight.format { |word| "*#{word}*" } %>
        <% end %>
      </li>
    <% end %>  
  </ul>
<% end %>

<!-- FACETS HTML: -->
<%
facets_html = ''
if not @search.nil?
  
  # check for existing tags in query string
  existing_tag_facets = []
  if params[:tag].present?
    existing_tag_facets = params[:tag]
  end

  facet_links_off = ''
  facet_links_on = ''

  @search.facet(:tag_list).rows.each_with_index do |facet, index|
    break if index == 10;
    
    # check if facet is selected
    if (params[:tag].kind_of?(Array) and params[:tag].include? facet.value)
      tag_facets = existing_tag_facets - [facet.value]      
      facet_links_on << "<li>#{link_to facet.value, :keywords => params[:keywords], :tag => tag_facets} (-)</li>"
    elsif @hits.size > 1
      tag_facets = existing_tag_facets + [facet.value]
      facet_links_off << "<li>#{link_to facet.value, :keywords => params[:keywords], :tag => tag_facets} (#{facet.count})</li>"
    end

  end

  facets_html << "<strong>Filter by tags</strong>"
  if facet_links_on.size > 0
    facets_html << "<ul class='search_facets_on'>#{facet_links_on}</ul>"
  end
  if facet_links_off.size > 0
    facets_html << "<ul class='search_facets_off'>#{facet_links_off}</ul>"
  end

end
%>
<%= raw facets_html %>


Browsing to http://localhost:3000/search now shows the search form. I entered "lorem" to get the following result. Note the asterisks around keyword "lorem" in the results. The tag facets are shown below with their associated result count.
solr search results with facets

By clicking on two tags, the facet counts and associated results decrease. The facet links can also be unselected. Great.
solr search results with facets selected

Comments

 
  • very useful. Any ideas on how to get full hightlighting in html
    Created by Anonymous on 2012-04-15
    Eric. This was very useful and easy to follow.

    Have you tried and had any success getting the highlighting to work within the html to allow for strong/strong or em/em or other character formatting like yellow highlights?

    btw.. It probably deserves a note that the rake db:create_random_posts_and_tags doesn't work on windows. After downloading and installing the gnu unix utilities for windows, the sed call was a problem.

    .. facet_words = 100.times.map{ `sed $(echo #{Random.rand(dict_word_count)})"q;d" /usr/share/dict/words`.strip! }

    • work-around
      Created by Eric.London on 2012-04-15
      I encountered the same situation. I wanted to surround the highlight with an html tag (em, strong, bold, etc), and it was automatically escaped. One work-around is to carefully use the raw method:

      
      <% hit.highlights(:content).each do |highlight| %>          
        <%=
          phrase = h highlight.format { |word| "***START***#{word}***END***" }
          phrase.gsub!('***START***', '<b>')
          phrase.gsub!('***END***', '</b>')
          raw phrase
        %>
      <% end %>
      
      • awesome!
        Created by Anonymous on 2012-04-15
        This works great. I added yellow and bold and it works for my needs. Thanks for the quick reply!

  • Great Example
    Created by hjoe on 2012-05-30
    Eric,

    Thanks for the great example. I've been looking forn an example like this forever. I can't quite get it working. For some reason when I implement it none of my "tags" show up for filtering.

    Here's a question on started on Stackoverflow
    http://stackoverflow.com/questions/10560583/select-multiple-facets-or-filter-data-simultaneously
    • coming soon
      Created by Eric London on 2012-06-27
      I have a working example on my photo gallery: http://pics.ericlondon.com/search?utf8=%E2%9C%93&keywords=eric

      Hopefully, I can document soon!