Drupal 7: Geospatial Apache Solr searching in Drupal 7 using the Search API module (Ubuntu version)

In this tutorial, I'll share my notes and code I've used to setup geospatial Apache Solr searching in Drupal 7 using the Search API module. For this tutorial I created a minimal Ubuntu server virtual machine. All the commands should be executed as a user with permission to modify files, or prefixed with "sudo".

The first thing I do with a fresh virtual machine is check for package upgrades.

$ apt-get update
$ apt-get upgrade

I find it cumbersome to type in a virtual machine window, so I'll install open-ssh and ssh from my Mac. If you plan to do so, you'll need to find your virtual machine's IP address using ifconfig. For this tutorial I added local DNS (/etc/hosts) to point "drupal7.vm" to my VM's IP.

$ apt-get install openssh-server

Install the LAMP stack. The following packages will install Apache httpd as a dependency.

$ apt-get install php5 php5-cli php5-common php5-curl php5-gd php5-mysql php-pear mysql-server

At this point, browsing to your VM/server's IP address will give you the standard Apache welcome message: It works! This is the default web page for this server. The web server software is running but no content has been added, yet.

Install version control.

$ apt-get install git-core

Create a mysql database for Drupal 7.

$ mysql -u youruser -p
mysql> create database drupal7;
mysql> grant all privileges on drupal7.* to 'drupal7'@'localhost' identified by 'somepassword';
mysql> exit

Install drush via Pear.

$ pear upgrade-all
$ pear channel-discover pear.drush.org
$ pear install drush/drush

Verifying drush is installed.

$ which drush
/usr/bin/drush
$ drush --version
drush version 4.5

Create an Apache vhost directory

$ mkdir -p /var/www/vhosts

Download drupal via drush

$ cd /var/www/vhosts
$ drush dl drupal
# rename folder (as necessary)
$ mv drupal-7.10 drupal7

Integrate drupal file system with git

$ cd drupal7
$ git init
$ git add .
$ git commit -am "initial commit of drupal7"

Install drupal via drush

$ drush site-install standard --db-url=mysql://dbuser:pass@localhost/dbname

Add Apache2 vhost

$ cd /etc/apache2/sites-available

# create new file, called "drupal7" with contents:
<VirtualHost *:80>
  ServerName drupal7.vm
  DocumentRoot /var/www/vhosts/drupal7
  ErrorLog /var/log/apache2/drupal7-error_log
  CustomLog /var/log/apache2/drupal7-access_log combined
  <Directory /var/www/vhosts/drupal7>
    AllowOverride All
  </Directory>
</VirtualHost>

# create symlink
$ cd ../sites-enabled
$ ln -s ../sites-available/drupal7 001-drupal7.conf

# enable apache2 mod_rewrite module
$ a2enmod rewrite

# restart apache2
$ /etc/init.d/apache2 restart

At this point browsing to your VM/server's hostname should show a Drupal installation.

Part 2, Tomcat/Solr

Installing java jdk and tomcat6

$ apt-get install openjdk-6-jdk tomcat6 tomcat6-admin tomcat6-common tomcat6-user

Browsing to your VM/server's hostname on port 8080 (ex: http://drupal7.vm:8080) will show the generic Tomcat welcome message:

It works !

If you're seeing this page via a web browser, it means you've setup Tomcat successfully. Congratulations!

Installing Solr in Tomcat

$ mkdir ~/downloads
$ cd ~/downloads
# Download the latest stable version of Apache Solr from:
url: http://www.apache.org/dyn/closer.cgi/lucene/solr/
# example:
$ wget http://www.motorlogy.com/apache//lucene/solr/3.5.0/apache-solr-3.5.0.tgz
$ tar -xzf apache-solr-3.5.0.tgz

Copy/rename java war file into Tomcat webapps directory

$ cp ~/downloads/apache-solr-3.5.0/dist/apache-solr-3.5.0.war /var/lib/tomcat6/webapps/solr.war

Note: copying the java war file into the Tomcat webapps folder will create this directory automatically: /var/lib/tomcat6/webapps/solr

Copy solr files

$ cp -r ~/downloads/apache-solr-3.5.0/example/solr/ /var/lib/tomcat6/solr/

Create Catalina config file to link war file to solr directory

$ cd /etc/tomcat6/Catalina/localhost

# create new file: "solr.xml", with the contents:

<?xml version="1.0" encoding="UTF-8"?>
  <Context docBase="/var/lib/tomcat6/webapps/solr.war" debug="0" privileged="true" allowLinking="true" crossContext="true">
  <Environment name="solr/home" type="java.lang.String" value="/var/lib/tomcat6/solr" override="true" />
</Context>

Setup Tomcat admin user(s)

# edit file: /etc/tomcat6/tomcat-users.xml, ensure similar contents exist:

<?xml version='1.0' encoding='utf-8'?>
<tomcat-users>
  <role rolename="admin"/>
  <role rolename="manager"/>
  <user username="eric" password="supersecretpassword" roles="admin,manager"/>
</tomcat-users>

Update webapps WEB-INF/web.xml file

# edit file: /var/lib/tomcat6/webapps/solr/WEB-INF/web.xml, update "solr/home" section to reflect solr path:

<env-entry>
  <env-entry-name>solr/home</env-entry-name>
  <env-entry-value>/var/lib/tomcat6/solr</env-entry-value>
  <env-entry-type>java.lang.String</env-entry-type>
</env-entry>

Download search api drupal modules that contain solr xml configuration files, and copy into solr conf directory

$ mkdir -p /var/www/vhosts/drupal7/sites/all/modules/contrib
$ cd /var/www/vhosts/drupal7/sites/all/modules/contrib
$ drush dl search_api search_api_solr
$ cp /var/www/vhosts/drupal7/sites/all/modules/contrib/search_api_solr/solrconfig.xml /var/lib/tomcat6/solr/conf/
$ cp /var/www/vhosts/drupal7/sites/all/modules/contrib/search_api_solr/schema.xml /var/lib/tomcat6/solr/conf/

Reset tomcat permissions, and restart tomcat

$ cd /var/lib
$ chown -R tomcat6.tomcat6 tomcat6
$ /etc/init.d/tomcat6 restart

You should now be able to browse to the solr admin java page. Example: http://drupal7.vm:8080/solr/admin/

Solr Admin Page

If things aren't working well at this point, check the Tomcat logs and look for SEVERE log entries, here: /var/log/tomcat6/catalina.out

In addition, the solr java module should be listed in the Tomcat Web Application Manager Ex URL: http://drupal7.vm:8080/manager/html

Part 3, Drupal code

Getting the solr-php-client library from code.google.com

$ mkdir -p /var/www/vhosts/drupal7/sites/all/libraries
$ cd /var/www/vhosts/drupal7/sites/all/libraries

# URL: http://code.google.com/p/solr-php-client/downloads/list
# File: SolrPhpClient.r60.2011-05-04.tgz
$ wget http://solr-php-client.googlecode.com/files/SolrPhpClient.r60.2011-05-04.tgz
$ tar -xzf SolrPhpClient.r60.2011-05-04.tgz

Downloading and installing contrib drupal modules

$ cd /var/www/vhosts/drupal7
$ drush dl entity views ctools facetapi
$ drush en search_api search_api_views search_api_solr search_api_facetapi entity views views_ui ctools facetapi

(Optionally) I install devel, admin_menu, and disable overlay/toolbar

$ drush dl devel admin_menu
$ drush en devel admin_menu
$ drush dis overlay toolbar

Add the tomcat/solr server to Search API configuration:

- URL: /admin/config/search/search_api
- click on "+ Add Server"
- server name: Solr 3.5.0
- Service class: Solr service
  - Solr host: localhost
  - Solr port: 8080
  - Solr path: /solr
- click Create Server

You should receive some confirmation messages: The server was successfully created. The Solr server could be reached (latency: # ms). If not, ensure tomcat/solr is reachable at the url you specified and the tomcat service is running.

At this point Solr is ready to send/receive data and index content, but there is nothing to index. For this tutorial, I decided to build off of user profiles and store latitude and longitude using the geolocation field module.

$ drush dl geolocation
$ drush en geolocation

Adding some user profile fields:

- URL: /admin/config/people/accounts/fields
- First Name | fieldnamefirst | Text
- Last Name | fieldnamelast | Text
- Geolocation | field_geolocation | Geolocation | Latitude/Longitude

I then added a bunch of users with latitude/longitude coordinates (URL: /admin/people/create). Note: I used Google Geocoding API to fetch the coordinates: http://code.google.com/apis/maps/documentation/geocoding/

Adding the search api index:

- URL: /admin/config/search/search_api
- click "+ Add index"
- Index name: People
- Item type: User
- Server: Solr 3.5.0
- click: Create Index

On the next admin page, you can select which fields to index. For this tutorial, I chose: User ID, Name, Email, URL, First Name, and Last Name. Unfortunately, at the time of writing this, the geolocation lat/lng fields are not exposed to the Entity API. I assume this is a temporary problem, and there are numerous patches in the geolocation issue queue. @see (for example): Property Info callback for Entity API - http://drupal.org/node/1366642 Fix for Search API not picking up the entity to index it's fields - http://drupal.org/node/1320564

I copied code directly from the issues queue, made some modifications, and created a custom module to expose the geolocation field data to the entity api module. In addition, I added a new property "lat_lon" that concatenates lat and lng together with a comma. @see: http://wiki.apache.org/solr/SpatialSearch

<?php
/**
 * Implements hook_field_info_alter()
 */
function MYMODULE_field_info_alter(&$info) {
  if (isset($info['geolocation_latlng'])) {
    $info['geolocation_latlng']['property_type'] = 'geolocation';
    $info['geolocation_latlng']['property_callbacks'] = array('geolocation_property_info_callback');
  }
}

function geolocation_property_info_callback(&$info, $entity_type, $field, $instance, $field_type) {
  $name = $field['field_name'];
  $property = &$info[$entity_type]['bundles'][$instance['bundle']]['properties'][$name];

  $property['type'] = ($field['cardinality'] != 1) ? 'list<geolocation>' : 'geolocation';
  $property['getter callback'] = 'entity_metadata_field_verbatim_get';
  $property['setter callback'] = 'entity_metadata_field_verbatim_set';
  $property['auto creation'] = 'geolocation_default_values';
  $property['property info'] = geolocation_data_property_info();

  unset($property['query callback']);
}

function geolocation_default_values() {

  return array(
    'lat' => '',
    'lng' => '',
    'lat_sin' => '',
    'last_name' => '',
    'lat_cos' => '',
    'lat_rad' => '',
    'lat_lon' => '',
  );

}

function geolocation_data_property_info($name = NULL) {

  // Build an array of basic property information for the geolocation field.
  $properties = array(
    'lat' => array(
      'label' => t('Latitude'),
    ),
    'lng' => array(
      'label' => t('Longitude'),
    ),
    'lat_sin' => array(
      'label' => t('Sine of Latitude'),
    ),
    'lat_cos' => array(
      'label' => t('Cosine of Latitude'),
    ),
    'lat_rad' => array(
      'label' => t('Radian Latitude'),
    ),
    'lat_lon' => array(
      'label' => t('Latitude,Longitude'),
    ),
  );

  // Add the default values for each of the address field properties.
  foreach ($properties as $key => &$value) {

    switch ($key) {

      case 'lat_lon':
        $value += array(
          'description' => !empty($name) ? t('!label of field %name', array('!label' => $value['label'], '%name' => $name)) : '',
          'type' => 'text',
          'getter callback' => '_MYMODULE_geolocation_entity_property_verbatim_get',
          'setter callback' => '_MYMODULE_geolocation_entity_property_verbatim_set',
        );
        break;

      default:
        $value += array(
          'description' => !empty($name) ? t('!label of field %name', array('!label' => $value['label'], '%name' => $name)) : '',
          'type' => 'text',
          'getter callback' => 'entity_property_verbatim_get',
          'setter callback' => 'entity_property_verbatim_set',
        );
        break;

    }

 }

 return $properties;
}

function _MYMODULE_geolocation_entity_property_verbatim_get($data, array $options, $name, $type, $info) {
  if (is_array($data) && isset($data['lat']) && isset($data['lng'])) {
    return $data['lat'] . ',' . $data['lng'];
  }
  return '';
}

function _MYMODULE_geolocation_entity_property_verbatim_set(&$data, $name, $value, $langcode, $type, $info) {
  // TODO
  return;
}
?>

I added this code to a custom module, renamed function calls (as necessary), and enabled. Update the solr index to add the new fields to the index:

- URL: /admin/config/search/search_api/index/people/fields
- Expand "Add Related Fields"
- Choose Geolocation, click Add fields

The above will expose the following fields now available to the index:

  • Geolocation » Latitude
  • Geolocation » Longitude
  • Geolocation » Sine of Latitude
  • Geolocation » Cosine of Latitude
  • Geolocation » Radian Latitude
  • Geolocation » Latitude,Longitude

Enable "Geolocation » Latitude,Longitude" and save changes.

Index the content, URL: /admin/config/search/search_api/index/people/status. Click: Index now. Note: if you had already indexed the content, you'll probably need to clear it first In my environment, I got the following confirmation message:

Successfully indexed 7 items.

I find it to be very helpful to verify the xml response from Solr directly after making changes to the index/schema. The following URL structure will query solr for all results and return all fields: Ex URL: http://drupal7.vm:8080/solr/select/?q=&fl=*

A sample XML document response.

<doc>
  <str name="f_ss_search_api_language"/>
  <str name="f_ss_url">http://drupal7.vm/user/3</str>
  <str name="id">people-3</str>
  <str name="index_id">people</str>
  <long name="is_uid">3</long>
  <str name="item_id">3</str>
  <arr name="spell">
    <str>nashua</str>
    <str>nashua@example.com</str>
    <str>nashua</str>
    <str>nashua</str>
    <str>42.933692,-72.278141</str>
  </arr>
  <str name="ss_search_api_id">3</str>
  <str name="ss_search_api_language"/>
  <str name="ss_url">http://drupal7.vm/user/3</str>
  <arr name="t_field_geolocation:lat_lon">
    <str>42.933692,-72.278141</str>
  </arr>
  <arr name="t_field_name_first">
    <str>nashua</str>
  </arr>
  <arr name="t_field_name_last">
    <str>nashua</str>
  </arr>
  <arr name="t_mail">
    <str>nashua@example.com</str>
  </arr>
  <arr name="t_name">
    <str>nashua</str>
  </arr>
</doc>

Take note the field name in the following XML, it is used in the next file edit.

<arr name="t_field_geolocation:lat_lon">
  <str>42.933692,-72.278141</str>
</arr>

Update the solr schema.xml configuration and add the geospatial fieldType and field data.

# Edit file: /var/lib/tomcat6/solr/conf/schema.xml

# Just prior to the closing "</types>" tag, I inserted: (around line 287)

    <fieldType name="point" class="solr.PointType" dimension="2" subFieldSuffix="_d"/>
    <fieldType name="location" class="solr.LatLonType" subFieldSuffix="_coordinate"/>
    <fieldtype name="geohash" class="solr.GeoHashField"/>

# And, just after the opening "<fields>" tag, I inserted:

    <field name="t_field_geolocation:lat_lon" type="location" indexed="true" stored="true"/>
    <dynamicField name="*_coordinate"  type="tdouble" indexed="true"  stored="false"/>

Restart Tomcat

$ /etc/init.d/tomcat6 restart

Since the schema and solr data types have been updated, the content will have to be re-indexed:

- URL: /admin/config/search/search_api/index/people/status
- click: Clear index
- click: Index now

Returning to the solr query above will now show updated xml: (note: no longer an array)

<str name="t_field_geolocation:lat_lon">42.933692,-72.278141</str>

Verify the native solr geospatial searching is working using the following query syntax: URL: http://drupal7.vm:8080/solr/select/?q=&fl=*&fq={!geofilt sfield=t_field_geolocation:lat_lon pt=42.933692,-72.278141 d=100} By putting a distance parameter of 100 (kilometers) and Nashua NH coordinates, I get 2 results: Nashua and Portsmouth, awesome.

Create a solr integrated view:

- URL: /admin/structure/views/add
- View name: People
- Show: People
- Create a Page [checked]
- Path: people
- Continue & edit

Note: at this point, you have full reign over view configuration. For this tutorial, I set the format to Grid, and added some fields:

  • Geolocation: Latitude,Longitude (indexed)
  • Indexed User: Email
  • Indexed User: First Name
  • Indexed User: Last Name
  • Indexed User: Name

Save the view when edits are complete.

Browsing to the view will show something like this: Ex URL: http://drupal7.vm.people

People View

The next chunk of custom code modifies the solr query executed and adds geospatial filtering. @see: hook_search_api_solr_query_alter(array &$call_args, SearchApiQueryInterface $query)

<?php
function MYMODULE_search_api_solr_query_alter(array &$call_args, SearchApiQueryInterface $query) {

  $lat = 42.933692;
  $lng = -72.278141;
  $distance = 100;

  $call_args['params']['fq'][] = "{!geofilt sfield=t_field_geolocation:lat_lon pt={$lat},{$lng} d={$distance}}";

}
?>

The above code will limit the view's results using the hardcoded coordinates.

People View 2

Clearly it works but there are loose ends to tie:

  • automatically fetch a user's coordinates to store in the geolocation field
  • add a search form to the people view page to allow the user to search for a location (instead of hard coded coordinates, blah)
  • translate the user's location search input to coordinates using an API

Hopefully I can find more time to elaborate on this tutorial in the near future! Cheers.

User comments:

Created by Martin on 2012-08-10:

Great tutorial that helped me get on track with SOLR. About the hook for proximity filtering, you can instead set the option "spatial" on your search query like this:

<?php
$query->setOption("spatial", array('lat'=> " 61.8",
'lng' => "12.916666999999961",
'radius' => "100",
'field' => "field_geofield:latlon",
'radius_measure' => "km"));
?>