Software engineer, data guy, Open Source enthusiast, New Hampshire resident, husband, father. Fan of guitars, hiking, photography, homebrewing, sarcasm.
JRuby: Using Celluloid concurrency library to utilize full CPU multithreading and convert a large JSON file to CSV
In this tutorial I’ll demo some JRuby code that uses a fantastic concurrency library Celluloid, to utilize full CPU multithreading and convert a large JSON file to CSV. Celluloid really shines with implementations of Ruby that are not limited by the GIL like JRuby and Rubinius.
Created a Gemfile. Execute “bundle install” to install.
I executed the following script to create a 1GB file consisting of JSON hashes with a randomized order of defined keys.
I used the linux “split” command to chunk the 1GB file into 50k line files, prefixed with “_split_”.
The following class includes the Celluloid library and implements methods to read a file, parse the JSON, convert the data to an array of known fields, and write to CSV.
As you can see in the above class, there is no mention of fibers, threads, or java.util.concurrent; just: include Celluloid. In this example, I chose to implement a Celluloid pool which uses the futures method to queue tasks.
Execute the above script and you’ll notice via Java VisualVM (jvisualvm) and Activity Monitor full CPU core utilization. Get your money’s worth out of your multi-core system.