Drupal 6: Creating a Batch API operation to parse a large CSV file

In this code snippet I’ll show how you can parse a (large) CSV file using Drupal’s Batch API. The purpose of batching an operation is to avoid PHP memory limits and time outs. Before you begin, I recommend reviewing the following two articles. Be sure to review the additional batch parameters outlined in the documentation, you might need to use them.

Batch API

Batch Operations

We can start by defining the arguments that will be passed into the batch_set() function. For this example, I added this code in an arbitrary page callback function.

<?php
function MYMODULE_callback_csv_import() {

  // define path to CSV file
  $csv_file_path = file_directory_path() . '/import_path/myfile.csv';

  // define a redirect path upon batch completion
  $redirect_path = 'admin/import-csv';

  // define batch array structure
  // NOTE: minimal parameters defined to simplify code
  $batch = array(
    'title' => t('Reading File'),
    'operations' => array(
      array(
        '_MYMODULE_batch_read', array($csv_file_path),
      ),
    ),
  );

  // set batch
  batch_set($batch);

  // process batch
  batch_process($redirect_path);

}
?>

Next we’ll define the batch callback function. This function will be called repeatedly until the $context[‘finished’] variable is set to “1”.

<?php
function _MYMODULE_batch_read($csv_file_path, &$context) {

  // define batch limit
  $batch_limit = 100;

  // assume the batch process has not completed
  $context['finished'] = 0;

  // open the file for reading
  $file_handle = fopen($csv_file_path, 'r');

  // check if file pointer position exists in the sandbox, and jump to location in file
  if ($context['sandbox']['file_pointer_position']) {
    fseek($file_handle, $context['sandbox']['file_pointer_position']);
  }

  // loop through the file and stop at batch limit
  for ($i = 0; $i < $batch_limit; $i++) {

    // get file line as csv
    $csv_line = fgetcsv($file_handle);

    // NOTE: at this point, do what ever you'd like with the CSV array data!
    if (is_array($csv_line)) {
      // db_query(), etc
    }

    // retain current file pointer position
    $context['sandbox']['file_pointer_position'] = ftell($file_handle);

    // check for EOF
    if (feof($file_handle)) {
      // complete the batch process
      $context['finished'] = 1;

      // end loop
      break;
    }

  }

}

?>

The batch operation will be called until the end of the CSV file is reached. The $context variable is passed by reference into the batch callback so you can maintain data through each iteration; in this case, the position of the file pointer. When the batch operation is complete, the user will be redirected to the batch_process() path argument.

It’s important to read the full Batch API documentation so you can take advantage of its additional features: finished callback, init_message, progress_message, error_message, etc.

Updated: