Creating a Batch API operation to parse a large CSV file

Avatar-eric-london
Created by Eric.London on 2010-01-06
Tags:
New Comment
 
Please note: the content on this page orginates from ericlondon.com.
In this code snippet, I'll show how you can parse a (large) CSV file using Drupal's Batch API. The purpose of batching an operation is to avoid PHP memory limits and time outs. Before you begin, I recommend reviewing the following two articles. Be sure to review the additional batch parameters outlined in the documentation, you might need to use them.

Batch API
Batch Operations

We can start by defining the arguments that will be passed into the batch_set() function. For this example, I added this code in an arbitrary page callback function.

<?php
function MYMODULE_callback_csv_import() {

  // define path to CSV file
  $csv_file_path = file_directory_path() . '/import_path/myfile.csv';

  // define a redirect path upon batch completion
  $redirect_path = 'admin/import-csv';

  // define batch array structure
  // NOTE: minimal parameters defined to simplify code
  $batch = array(
    'title' => t('Reading File'),
    'operations' => array(
      array(
        '_MYMODULE_batch_read', array($csv_file_path),
      ),
    ),
  );

  // set batch
  batch_set($batch);

  // process batch
  batch_process($redirect_path);

}
?>


Next, we'll define the batch callback function. This function will be called repeatedly until the $context['finished'] variable is set to "1".

<?php
function _MYMODULE_batch_read($csv_file_path, &$context) {
  
  // define batch limit
  $batch_limit = 100;

  // assume the batch process has not completed
  $context['finished'] = 0;

  // open the file for reading
  $file_handle = fopen($csv_file_path, 'r');

  // check if file pointer position exists in the sandbox, and jump to location in file
  if ($context['sandbox']['file_pointer_position']) {
    fseek($file_handle, $context['sandbox']['file_pointer_position']);
  }
  
  // loop through the file and stop at batch limit
  for ($i = 0; $i < $batch_limit; $i++) {

    // get file line as csv
    $csv_line = fgetcsv($file_handle);

    // NOTE: at this point, do what ever you'd like with the CSV array data!
    if (is_array($csv_line)) {
      // db_query(), etc    
    }

    // retain current file pointer position
    $context['sandbox']['file_pointer_position'] = ftell($file_handle);

    // check for EOF
    if (feof($file_handle)) {
      // complete the batch process
      $context['finished'] = 1;

      // end loop
      break;
    }

  }

}

?>


The batch operation will be called until the end of the CSV file is reached. The $context variable is passed by reference into the batch callback so you can maintain data through each iteration; in this case, the position of the file pointer. When the batch operation is complete, the user will be redirected to the batch_process() path argument.

It's important to read the full Batch API documentation so you can take advantage of its additional features: finished callback, init_message, progress_message, error_message, etc.

Comments

 
  • Thanks for the code, it was
    Created by Anonymous on 2010-03-24
    Thanks for the code, it was extremely useful and allowed me to meet an unexpected deadline. I made an addition to this script that would allow a csv file to be uploaded from a form instead of a hard coded file path.

    Create the drupal form.
    <?php
     function MYMODULE_form() {
       $form = array();
       
       $form['#attributes'] = array('enctype' => "multipart/form-data");
       
       $form['csv'] = array(
         '#type' => 'file',
         '#title' => t('Select a CSV File'),
         '#size' => 40,
       );
       
       $form['import']['submit'] = array(
         '#type' => 'submit',
         '#value' => t('Submit'),
       );
       
       return $form;
     }
    ?>


    Validate the form and make sure the file field is not blank.
    <?php
     function MYMODULE_form_validate($form, &$form_state) {
       $file = file_save_upload('csv');
        if (!$file)
          form_set_error('upload', 'You must select a valid file to upload.');
        else {
          // Manually add the uploaded file to the $form_state
          $form_state['values']['csv']['title'] = $file->filename;
          $form_state['values']['csv']['file'] = $file;
        }
     }
    ?>


    The submit handler will initialize and start the batch script
    <?php
     function MYMODULE_form_submit($form, &$form_state) { 
       $csv_file_path = $form_state['values']['csv']['file']->filepath;
       
       // define batch array structure
       // NOTE: minimal parameters defined to simplify code
       $batch = array(
        'title' => t('Reading File'),
        'operations' => array(
          array(
            '_MYMODULE_batch_read', array($csv_file_path),
          ),
        ),
       );
      
      // set batch
      batch_set($batch);
     }
    ?>


    next use the _MYMODULE_batch_read from the main post.
    • Thanks. Your code helped me
      Created by Anonymous on 2011-12-04
      Thanks. Your code helped me =)
  • Thanks for sharing this. Very
    Created by Anonymous on 2011-04-29
    Thanks for sharing this. Very useful, as I was stuck with my approach of putting the $file_handle in the context sandbox, which was then unreliably forgetting it after too many runs.