Software engineer, data guy, Open Source enthusiast, New Hampshire resident, husband, father. Fan of guitars, hiking, photography, homebrewing, sarcasm.
Drupal 6: Using Tika Java library to index WebFM file attachments as Apache Solr documents
In this quick snippet, I’ll show some code that uses the Tika Java library to index the content in WebFM file attachments and add the data as Apache Solr documents. This code is designed to work with the Apache Solr Search Integration Drupal module, and piggyback off of the Apache Solr Attachments module. Out of the box, the Apache Solr Attachments module can index CCK file fields and node file attachments, but the WebFM module uses its own custom file tables and therefore the files are not indexed. This code assumes you have the Tika library already integrated with your Solr installation. Please review the Tika Getting Started documentation for more information.
After implementing this code, when nodes were set to be indexed by Solr, their webfm file attachments were separately processed, the content was extracted from the file attachments, and added as new Solr documents.