Software engineer, data guy, Open Source enthusiast, New Hampshire resident, husband, father. Fan of guitars, hiking, photography, homebrewing, sarcasm.
JRuby: Bulk index Rails model data into Elasticsearch via Sidekiq (Redis queue)
In this post, I’ll share some code to bulk index Rails models into Elasticsearch using Sidekiq. Sidekiq uses a Redis queue to allow for asyncronous jobs. I used JRuby to take advantage of all my CPU cores.
I installed MySQL, Elasticsearch, and Redis via Homebrew. Install as necessary:
I setup a basic Rails project with a single model.
Before bulk indexing the data into Elasticsearch, I needed to populate the Thing model with data (~1 million records). In line with this post I decided to use Sidekiq to populate the model.
Added Sidekiq gem, and Sinatra for stand-alone monitoring. Edited file: Gemfile, added:
Installed new gems:
Created a Sidekiq/Sinatra stand-alone monitor app via rake task. new file: lib/tasks/sidekiq.rake
Started Sidekiq/Sinatra monitor. Accessible at: http://localhost:9494
Created a new folder for Sidekiq workers: app/workers
Added a new Sidekiq worker to create Things. new file: app/workers/thing_creator_worker.rb
Added a rake task to queue the jobs to create Things. Edit file: lib/tasks/sidekiq.rake
Executed the rake task:
Start Sidekiq worker. Add “–verbose” flag to help troubleshoot issues. I started a few workers until all my CPU cores were pegged at 100%.
Progress can be seen via Sidekiq/Sinatra monitor: http://localhost:9494
When complete verify results via rails console:
At this point we’re ready to bulk index data into Elasticsearch. Add gems to Gemfile:
Install gems
Integrate Thing model with Elasticsearch. edit file: app/models/thing.rb
Created sidekiq indexer worker. new file: app/workers/thing_indexer_worker.rb
Created another rake task to load all the Thing [primary keys] and index to Elasticsearch. edit file: lib/tasks/sidekiq.rake
Execute the rake task to index the model data into Elasticsearch:
[re-]Start Sidekiq workers:
Verify results in Elasticsearch via rails console:
Or via cURL:
Screenshot of Sinatra/Sidekiq monitor with all CPUs pegged: